 I want to know what you guys want to know about this. OK. OK. Thanks. Thanks. Right. Hi, SingaporeJS. My name is Chikin. I'm from the Mortem Consulting Group. So currently, I actually don't use JavaScript very much for my day-to-day work because they require me to use another language, which is JavaScript Slice04. Despite that, I use it for my own pet projects. So when I heard recently that this thing came out of, OK, it's not out of experimental zone yet. It's not out of experimental API stage yet. But we can actually now do proper concurrency in Node.js. We can actually spawn threads. So what I've tried to do is actually rewrite some microservices at this large stock exchange that I work at currently. So let's take a look. What do I mean by concurrency in Node.js? Just a quick quiz. How many of these functions yield control back to the main thread immediately, straight after calling them? How many would say the first one? Straight after you call HTTP, the main thread gets back control. Well, I mean, if your main thread doesn't get back control straight after a HTTP get request, then we're in trouble. Every time you make a get request to some other API server, your main thread is going to get blocked. How about write file? Same thing. So console log is a bit controversial. This thing is synchronous now, but sometime ago the implementation is not standard. So you might see asynchronous versions of it. I don't know. Set immediate, same thing. Your set timeout, set immediate. These things even process next thing. So actually all these functions except for console log, they yield control back to the main thread. This is asynchronousity. So let me skip ahead a few slides. So asynchronousity is like this. So just some background for JavaScript, whatever that we write, runs, although we say single-threaded, it runs in this thread over here in the JavaScript thread. Any IO operation or certain native modules like B-crypt runs in its own separate thread pool managed by the LibUV ASIO library. That's written in C++. So that's asynchronousity, which means that if you end up doing something like this, that's going to block. So this is a simple loop. It's just going to increment a number up to 10 to the power of 8. And then after that console log, it's going to block. What happens if you decide to be smart? OK, I don't want it to block. Let me just put it inside an asynchronous function. Let me just put it inside, wrap it in a promise. And then I'm going to call that promise. Sorry, I'm going to call that function, run async. I'm going to say, OK, in this promise, resolve with the 10,000 prime. And then after that console log, the prime out. Will HelloTalk.js appear before or after the prime? Before? After. It actually appears after the prime, because even though you wrap that thing up in a promise, this is not an IO operation. So imagine Node.js is what's it going to do? Which driver is it going to call? Which library is it going to call to actually handle the processing? It's not going to call LibUV. This is not an IO operation. OK, so Node.js, poor thing, has to handle itself. It's going to block. And as a result of that, the HelloTalk.js line only comes after your prime. This is a little bit trickier. I'm going to call the same function, run async on that promise, and then I'm going to call the same function again after that promise resolves. Then I console log the result. So in this case, when does HelloTalk.js appear? It actually appears after the first promise resolves. And then it console logs the prime after that. So it kind of appears in the middle of this chain. So we've seen asynchronicity. Well, when might this not work? Much of what we do in backend work is actually what we call ETL. You extract some data, you transform it, and then you load it or you save it somewhere. So Node.js is really good for the extraction and the loading part. Your database operations is going to tell whichever database driver you have, go save this or go retrieve this. I'll be down here servicing other HTTP requests while you are doing your stuff. And that's fine. But what happens to the T in the middle, the transforming part? If that starts to take too much time in Node.js, it's going to block the thread. So where this becomes a problem is if you have messages coming from a queue, ordinarily when the rate of arrival of messages is fairly slow, it's fine. You extract one message, your Node.js guy on the right picks a message, does stuff with it, saves it, it's fine. And what happens if you have a highly distributed system and you get many, many, many, many messages in a queue, then you start to form, you start to get this back pressure. Then you have to make decisions. For the messages that are now stuck, do I discard them? It may not be very good. But what else can I do with them? How can I handle them? Wouldn't it be good then if you have a lot of producers or messages on the left, maybe I can have more Node.js workers on the right. So actually, we've been able to do this for a while. If you're familiar with something called PM2, Process Manager, you actually spawn new Node.js processes. You can do the same using the cluster module in Node.js. It's a Node call library. But the thing about processes is that it's really, really heavy. How it looks like, we've always been able to do this. So like I said, if you use the cluster module, you can fork a child process and then you can send data between processes. But if you're on the front end, then this is equivalent to opening a new browser window each time you want to serve to a different web page. No one does that these days. So this is a very expensive way to parallelize operations. I'm not sure if I have a demonstration here. Can it capture that? Is that big enough? So just to show you how expensive this can get, let me run this particular script Node counting. I forgot to check what's my upper limit first. Okay, so this will look like this. Can you guys see it? So in this little script here, I'm just going to create two processes and I'm going to count all the way up to, in this process, what's my upper limit? With two workers, I'm just aiming to count up to 10 to the power of four, which is just 10,000. Let's see how long that takes. It takes four seconds to just increment that thing up to, well, okay, 100,000 here. This is a clustered count. So that's what happens if you try to communicate between different processes. Just to give you an idea of how slow that actually is. If I were to do this instead, this is just a simple count. It's only 100 milliseconds of count all the way up to, I think this is like 10 to the power of eight. This is in a single thread, just everything locally. Just that one process, one thread. So let's not speak of multi-processing here from now on, because we're going to see this new thing multi-threading. So this is analogous to web workers in the front end. You can spawn worker threads. They still run a separate instance of Node.js and its own libUV event loop. So there is still a high degree of isolation normally threads still cannot access each other's variables unlike in other languages, but you can still pass messages between them. Unfortunately, the default API is still post message, post message, post message. It's really, really primitive. You can post messages from one thread to another from the main thread to the worker thread, and then you want to get anything back, you get the worker thread to post the message back again. The good thing is it's already much faster right now than it was, I think about four months ago or so. When I first tried this out, actually this API was about as slow as what you saw just now with the clusterfog example. But right now, if I were to run it, I have a similar example here. So if I run npm counting where it was mine, counting threaded. Instead of four seconds, it should take, yeah, it takes three seconds. Hey, it's still about 25% faster, still faster. But really, post messages. Another way to pass data around is when you spawn the worker thread, you can spawn it with some data to begin with, and the main, I don't know if you can see here, but in the constructor for the worker here, I've actually passed in this worker data something. And that string, something, actually goes into the worker thread over here. This is the script for the worker thread. I kind of gotten lazy and put the entire script as a string and passed the entire string into the worker thread to be evaluated, which is what the eval true option here says. So when the worker thread spawns, it's going to take this string and it's going to evaluate it as JavaScript. And when it evaluates it, it imports the worker data, which is this string with a value of something. So that's one way to, sorry, that's another way to pass information to a worker thread. But this is a bit limiting. To be frank, it doesn't actually solve the performance problem. Because the whole point, sorry, the whole problem with performance when it comes to posting messages to and fro is that each time you do that, if you're posting, say, a very large JavaScript object, V8 has to serialize the thing. You're effectively calling JSON stringify on whatever you're posting to the thread. And then you're passing it through the, well, worker threads don't really use a socket, but you're basically passing a stringified version of the thing. And if you're going to pass the message back from the worker thread, you're going to call JSON stringify on it again, effectively. You don't need to do that. JavaScript, Node.js actually clones it for you, but then it's going to serialize it and pass it across to the thread. It's actually the same thing with worker data. So if you're going to spawn a thread with the data, well, it's going to serialize it, pass it to the worker thread at construction time. Still, you've moved this little performance hit to when the worker thread is created. So if you are smart about it, you can actually live with it because your server is going to run for a very long time. You can spawn all your threads at the beginning and you can just keep reusing it, right? Using a thread pool. Okay? I might have neglected to mention it, but threads are actually still not very lightweight when it comes to spawning. You can see a noticeable performance hit if you keep spawning threads constantly, okay? And I think as of the middle of last year, there is actually another way to share data. This comes at a very high cost, not a performance cost. Actually, out of all the methods of sharing data between threads, this is the fastest. When I talk about cost here, I mean cost in terms of complexity because what happens now is that the script for the worker thread is up here in red. It's highlighted here. But when I spawn it, I'm going to spawn it with this thing called a buffer, a shared array buffer. There are array buffers and there are shared array buffers. Shared array buffers are basically regions of memory that you can share between threads so that multiple threads can actually see what's in the buffer. They can access. They can change what's in the buffer, right? And the way you do that is you create a buffer and then you create a typed array over it. So in here, I'm using in32array, 32-bit integer. There's a very good reason for using 32-bit integers. If you are working with strings, you might want to use an in8 array instead because that's what strings are normally serialized to if you use UTF-8 encoding. But in32array actually allows us to use a certain other method, a certain other function that we will see in a short while to control, to orchestrate the different threads. So what's going on here? Oh, actually you can already see one of the functions here. So okay, I'm passing the data view. I'm passing this array, which is a view over the shared buffer into the worker. I'm spawning the worker. So the worker spawns and immediately imports the worker data, which is the shared array buffer and the data view. I'm just referring to the data view as data view. And this function, atomic, waits. It comes from the atomic class. It basically tells the thread to wait on this typed array if the value at index zero, which is the second argument here, has the value zero, which is the third argument. So since I spawned the shared array buffer and I didn't do anything with it, by default all the values are zero. So this thread is going to wait. It's not going to actually proceed to this console log over here. Meanwhile, what's happening in the main thread is that I'm calling a setTimeOut and over here, after one second, I'm going to set the value at the zero's index to 1337. After doing that, then I call atomic to notify. I'm going to notify any thread that is watching the data view array at index zero, and I'm going to wake up exactly one thread. If you don't specify the one here by default, it wakes up all the threads that are waiting on it. So once I call this, you will see that then the console log goes ahead and you will see 1337. So this is how you can actually pass data to and fro between threads without actually going through post-message. It's actually really, really fast in this way. But like I said, this performance comes at a great cost, because I don't know how many of you have ever stayed late into the nights because your programs ended up exhibiting some bug, which sometimes happens, sometimes doesn't happen. It's very fun. Sorry, not fun at all. When it happens, you are going to sit in the office until good knows when, especially if the application is really huge. Where is this happening? So what's wrong with this? This is actually perfectly fine if it's just one thread. You have a data view, you have an array. This is an in32 array, and I'm going to increment the value at the zero's index by one, console log it up. With one thread, that's fine. But think about what happens if it's two threads. This particular part here, incrementing the value by one and then console logging it, that's not so straightforward because if you have two threads, what can happen is this. I'm not saying it always happens, but you will see it happening at some point. That's actually a few operations. First, the thread has to load the value from the array. Then it has to increment the value by one, and then it has to console log the value. It has to store the value back into memory. That's actually three operations. What happens if in the other thread, these three operations interleave in this way? So what happens is these two threads, first thread reads the value zero, second thread reads the value zero, zero in both threads, and then the first thread increments the value by one, so now it's one, second thread increments the value by one, or you don't get two, you get one. And that sometimes happens, sometimes it doesn't happen because the way in which the threads interleave the operations is not deterministic. So yeah, it's not fine. We can actually see this in action. Let me find an example here, counting threaded. I don't know if you can see the number here. Notice that the number that's returned, I'm actually trying to count up to 10 to the power of five, or 10 to the power of six, whatever the case is. It's not supposed to give you this funny number. If I run it again, it gives me a different number. What's going on here? I'm not getting back a consistent number, and that's because the script that I'm running is actually doing this. I'm actually running two workers, and the only part that's really relevant here is this string here, because what the worker is doing is that is looking at this shared array, and it keeps incrementing the value by one. So it keeps loading the value, incrementing it by one store bag, load the value increment by one store bag. In fact, it's probably not even doing all that because the compiler optimizing this is probably going to just keep incrementing the value without storing it back first, which is why you end up seeing values that are very much lower than what you want. And how do we remedy that? Okay, so where did that thing go? Whoops. Okay, so if you want to remedy that, what you want to do is this, instead of directly incrementing it, use Atomics. So I'm going to use Atomics.Add. I'm going to add it to this. I'm going to add something to this particular shared array. At index zero, I'm going to add a value of one to it. Okay? And if I were to do this now, I get the expected value every single time. But you notice this has a performance cost because right now it's taking 120 over milliseconds to finish, whereas previously it was taking about 70 milliseconds. So hopefully this illustrates, concurrency is not a free performance increase. In fact, it's quite a bad idea if you have to keep operating on the same data from different threads and then you have to keep locking the thing using Atomics. And that's going to slow down your program a lot. All right? Okay. So when it comes to Atomic operations, I suppose you can read the API from MDN or the Node.js documentations. Out of all this then, since we don't have very much time left, what I have to say is add and subtract probably self-explanatory, right? You're going to add a value to the shared array at this particular index or you're going to subtract a value from it. You can load a value from the shared array. You can store a value there. You've also seen wait and notify. These are most likely what most people use on a day-to-day basis when you work with multiple threads. Exchange, what this does is that it exchanges a value in the shared array. Let's say the value at the shared array at this particular index. I want to change the value. Remember, because of multi-threading, I can't take out the value and then put it back because when I take out the value, somebody else might have put another value inside right after that before I put this value in and that's going to lead to data inconsistencies. Atomic exchange, what it does is it makes the exchange of the values atomic. That means either both of these operations happen together or they both fail together. I wish I had more good news about this. Compact exchange would have been really good if you want to write locks for threads. That means you want to write mechanisms such that a particular thread can lock while other threads carry on and unlock itself when certain conditions happen. You would want to use this particular method, but I can show you that this method actually has a chance of failing. It does not actually do what it's supposed to do sometimes. What it's supposed to do is it's supposed to check a value at a particular index in the shared array. If that value matches what you think it is, it exchanges the value with something that you give it. Otherwise, it leaves it alone. This is what happens when I ran it I think about 50 odd times. Sometimes you see this. The numbers you see here on the left are actually the thread ID. It's the number of the threads. All I'm doing is I'm exchanging the value of 0 in the shared array with the thread ID. Then I'm exchanging it back. The thread with ID 2 sees a value of 0, takes it out and puts the value 2 into the array. Compact exchange. If it's 0, put my thread ID inside. Then I take the 0 out. Right now, the value at index 0 is 2. Then thread 3 comes along and sees what's at this index in the array. It sees the value 2. It's not 0, so it doesn't do anything with it. There's no exchange. That's why the log is still at the value 2. Thread 2 then goes back and changes it back. My thread ID is inside, so it's value 2. I'm going to change it to value 0. What's this about? Thread 3 now retrieves a log value of 4, but didn't thread 2 just set it to 0? That's not doing what it's supposed to do. Something's gone wrong here. At least so far, this is the only atomic operation that I've seen that can fail in this manner. The scary thing is that it doesn't always do that. In fact, it will actually run properly most of the time until it doesn't. That's probably not what you want. As far as atomic operations goes, that's all I have to say about it. Atomic load. How many threads do you use when you want to use concurrent Node.js? If we had more time, I would have shown you. If you spawn 500 threads, you're going to make your application very much slower. So, generally, keep in mind what your threads are doing. Optimally, one thread runs on each core, so if you have a quad core machine, spawn 4 threads. More threads is not always better. Unless you know that your threads are going to be idle most of the time, then maybe you want to spawn another thread to do something else. So, passing data to threads, these are what we saw. You can either use worker data to pass it at creation time. You can either use post message to pass it at runtime. But if you really want it to be fast and you can design your application such that you avoid concurrent reads and writes all the time, go ahead, use the shared ray buffer. We didn't have time to see text encoder, text decoder. But if you are working with strings to pass it from one thread to another, you might want to consider using text encoder and text decoder to serialize the thing to the buffer. So, finally, designing for concurrency is hard. If you are talking about the same tasks and the same data like what you saw just now with the counting, use a single thread for it. Don't waste resources. Don't keep creating threads. You actually have to terminate threads. Otherwise, they'll stay around. So, don't waste resources. Use a thread pool for it. Use post message instead of serializing objects too often. Post message actually does the serializing for you and it's very, very fast. Use atomics to synchronize access to shared ray buffers. As far as thread pools, well, how a thread pool might be implemented could be something... I don't think we can show the code here, but you can find some packages in NPM. What this does is it allows me to do things like this. So, at the start here, I create a pool. Initial size 4, let's say I can set the maximum size to 50. I can set the initial buffer size and so on and so forth. And then what this HTTP server does is that each time a request comes in, if it needs to do a lot of heavy processing for... If it needs to do a lot of heavy processing in order to service the request, to send a reply, it actually tells a worker to do this instead, right? Then it sends the reply and it releases the worker back to the pool. So, actually, when I use this thread pool, I'm only ever going to have maximum 50 threads at any one point in time. And, yeah. Unfortunately, I didn't... 30 minutes is up. Yeah, unfortunately, I can't show very much else. But if you want to see performance metrics, feel free to look for me later or something, okay? Yeah, so that's all I have. Thank you. Thank you.