 I am Matteo Collina at Matteo Collina on Twitter. So please follow me just so that you know. And I'm trying to hit 10,000 followers by the end of the year, so please help me to reach that milestone. So I'm here today to talk about Node.js streams and the future of Node streams and how you can help or maybe what you can provide feedback on. So a couple of things first. I work for a company called NearForm. We have a nice booth in the booth crawl. We are also doing some raffle thing with some smart watches and AirPods, so come by. We've donated some money out of the auto-acharity after the booth, after that. So please come by, swing by, it would be interesting. So we are a professional services company that do all things JavaScript. So if you have some problems with your JavaScript and you need some help, and your team needs some help, please pass by. We have a lot of things to talk about, so probably worthwhile, a bigger conversation. So let's go into streams. How many of you have used Node streams? How many of you have liked Node streams? OK, a few maybe. I mean, do a few understand Node streams? OK, I'm not sure I trust you. So a stream is like an array, but over time. Essentially, instead of having a big chunk of data and of memory which we're rating upon, you're receiving data along the way, and you're processing them over time. This is fantastic because it enables us to crunch an insanely high amount of data with a limited amount of memory, which is fantastic. That's why we are using them for file processing and a lot of other things. So they are a key part. The problem is that Node streams are really complicated. And most people don't really understand Node streams at all, like even some Node core contributors struggle a little bit into when it comes to reviewing those Cree's PPRs. It's a really complicated code base that evolved a lot over time. And that's it. In fact, it emits a number of events. So Node streams are based on event emitter emitter, and they emit a lot of events. You can have the data event you probably have used. You have used that it's a readable event, which probably you haven't used much because that's not very popular. There is close. There is some meaning. There's end. There's some other meaning, different from close, by the way. There is finish, which close and finish. They kind of mean the same thing. I know they're three completely different, three completely different meaning on streams. And then there is destroy to turn the full thing down. And I have not mentioned error because you probably have known that emitting every node is a really big thing. So there is also that to take into consideration. Just to recap, data and readable are for reading data. And the close and finish are emitted when streams are ending to some extent. The meaning of end, it's on the readable side. The meaning of finish on the writable side. So when a readable stream is done or the writable stream is done successfully, successfully only. So if there's an error or an abrupt operation that's not emitted, close is emitted when the underlying resource is brought down. And it will be emitted also in the case of errors. So once upon a time, the only way to interact with a stream was by using the data on data event. You will find a lot of these examples everywhere. These I call these the wild horse example. Because essentially you are receiving as much data as fast as the source can provide you. Which turns out to be a very bad idea because you cannot really, you know, it's really hard to pose the source. To some extent, enter the A please, low down, because this is too much data. I don't know what to do with that here. Also taking into account that this is posted the source of one of the worst bugs ever. So if you put a sink here, you are making a lot of popcorn, so don't put a sink there. James has a talk tomorrow about broken promises. Go to his broken promises talk because it's a really good talk to go. And the pressure with that on data model is really hard because how do you stop? There's a pose method on streams. But pose is only advisory. So what? So at some point, it was introduced in another API called the readable event to actually provide a pool based model. So instead of having these five rows of data, OK, let me just call you and notify me when there is data available, and then let me read that data from you. So instead of a five rows of information is more a pool model. So streams can have a push model and a pool model implemented in the same code base. Does that sound? That's horrible to maintain. But this is actually probably the best way to interact with a stream because you actually can pull data from it. So you use the internal buffer in the most optimal way. Still, again, this is not really readable, to be honest. It's also more performant because in this way, you only read the data that you need. So it's a little bit more gentle in the buffering of the streams. Error handling is still super complex, though, on streams. And it's probably one of the source number one of memory leaks and file descriptor leaks. File descriptor leaks are really fun to diagnose. So this is how to not stream a file over HTTP. Don't do this. OK, how many of you have done this? OK, a few people. Don't do this at all. These will create a memory leak. Immediately. Why? Can you see? Why can this create a memory leak? Can't you see it? It's super obvious, right? It's such a simple API of what can write bad here. Anyway, so if you look at this, you see this pipe press. So if the response errors or quits and the other side close the connection, the read stream is not closed or teared down or destroyed. So it stays alive and stays alive forever because then there is nobody to consume that stream. And that's a big problem. So essentially, if in your code are using pipe without having some special error handling magic, you probably have a memory leak somewhere in your application. So don't use dot pipe. What you need to use is pipeline, which is a new operator that we have added in Nodecore, I think, 10 or something, or 8, maybe even 8. I don't know. 10. Anyway, it's available as an open source model under the name pump. And it's also released inside the readable stream module. Anyway, there is pipeline. And pipeline is actually really cool because it actually teared down the streams. So if one of those errors, it closed all the others. So basically, it makes sure that there's no memory leaks and no leaks of any sort. So please use pipeline. Don't use pipe. You might want to ask, why can't you change the behavior of pipe to actually do the right thing? Well, backward compatibility. We cannot break everybody. There's a lot of code that relies on that behavior out there. So we can't really change that. That's bad. You can also use a framework, by the way, to serve files. Probably the best option is actually to use a framework because there are a bunch of things to be done. So don't serve manually. You just use a framework. It will do it for you. It will do the right thing, actually. So use a framework. Note that all the best features of Node.js are based on streams. So as good as best that it sounds, it's an aging code base that has a lot of history. And all Node.js depends on this. FS depends on this. HTTP depends on this. HTTP, too, of course. And a bunch of other stuff. So standard input and standard output depend on these things. So it's really important to understand and evolve this code base in the best way without breaking everybody, because there will be biggest problem with developing Node Stream in Node Core, and somebody of you who is a Node Core collaborator, you can probably ask, probably ask. We'll tell you, if you touch those, you can really break everybody. So don't break everybody. There is a one big problem, though. Because the JavaScript community uses a sync await everywhere now. How many of you use a sync await? Good, everybody. Is there somebody that do not use any sync await at all? We can probably have a beer afterwards, right? Anyway, so everybody has moved to a sync await. The biggest problem with streams and a sync await is that the two things do not really mix well together. And if you try to do that, you probably have seen some new promise. And then doing some shenanigans to get actually the thing to do what you want to do. I don't know. You probably, somebody is nodding. I hope, I'm eating some nerves here. But how to do it is not clear. It's not easy. And it's not straightforward. So essentially, we are not doing a good way of serving you folks as node core contributors into this. By the way, I'm part of the team that maintains the node streams, so also node core collaborator and TSC member. Anyway, so we cannot really change the things that are already there without breaking everybody. Because everybody depends on these things. So for example, we cannot really change how the streams work because then it will break whatever web framework that you are using. Those used streams. Or you can change them if you break some of the file processing that you do. So we can do some surgical changes, but we can't really do much. So then the question comes, how can you improve the usability of this? Usability of streams, considering the fact that people love a sync await. So the number one mistake that people do with streams is on data sync. Don't do that right now. I'm working on a fix. Go to this. But I'm working on a fix. I did a PR to help with that, but to actually provide some safety net. But the net result would be, you will have a memory leak if you try to do that, and you have a memory leak. So the thing that I've been working on for the last few years has been to provide a sync iterator support on streams. How many of you knows about a sync iterator? Hey! Fantastic, guys. They're the best feature that has been put inside one of the recent versions of Node. They came into Node inside Node 10. So they're basically after December, by the way, Node 8 goes out of maintenance at the 31st of December this year. So stop using Node 8. So since January 1st, all supported version of Node.js will have a sync iterator sync, which is fantastic. So we can use them everywhere, depending on whichever Node version we are using. How does this look like? Well, you have a bunch of things to say. So first one up is you can have a sync generator. This is actually a very specific way to write a function that can spit out things that you can iterate on, like this, with for a weight. And you pass an iterator. Now, what you probably don't know is that there is a full protocol under that thing to power that. So you can write your own sync iterators without using the function syntax, for example. More or less like you can write your own iterator, normal iterators for other objects. So if you want to use for off on an object, on any object, you can actually provide that type of capability. Because there is a protocol. So your object need to implement certain methods to do things. So this is pretty interesting. Can we use this with streams? Well, maybe. When this was being standardized, we were working with TC39 to actually validate that we could actually use them with streams. And we could. So things are compatible. And compatible right now. So you can use this code right now. So you don't need to write any, turn on any flag and so on. This code is just works. So you can create your stream, and then you can for a weight and iterate over that stream. Pretty cool, right? And you can just process those chunks. And that's it. And when the for loop ends, it ends. Hey. I don't need to use any crazy events. I don't need to do any shenanigans. I don't need to do anything crazy. I just can't just do my thing and be done with it. And note that if I use break from my for loop, it just tears down the streams automatically. Again, I don't have to do anything about it. So it just end of the very 80% basic use cases that you need and use the other APIs as low level components. Also in node 12, but also node 10 now. These are shipped on node 10 as well. I need to update these slides. In node 10, in all version of node, but it was just recently ported to node 12, so like the last release of node 10, we have added readable dot from, which essentially you can pass an iterator or an async iterator to readable dot from. And it will automatically convert you to a stream, which is pretty fantastic. So you don't have to do anything now. I can just use this to convert to a function or an object or an array to a stream. So it simplifies a lot of testing, for example, because you need to test your streams. I hope all of you write unit tests. If you don't write unit tests, go start writing unit tests. So you can then use pipeline, always use pipeline. Now, you can also go and write some crazy concept of inception. So you can start with an async generator on function, converting to a stream, and then iterate over the stream. Now, these are some little side effects. One of the key concept of streams is data buffering. In order to stream, in order to be perform, I need to will buffer data. And so your async generator will get cold a little bit more up to fill the stream's buffer. So it has a slightly different behavior than these will be ever slightly different behavior that are normal if you just iterate over the async generator. Steel is doable, and you can use these things to actually level the field up. It's pretty nice, because then you can just accept a parameter if it's async iterable and just async iterate over that. And you can use a pass-in async iterator or any other objects that implement that protocol. And your code will actually be the same. So it's pretty powerful. It's pretty powerful. OK, so it's demo time. So hopefully the demo gods will be with me, because that's an important piece. So I have seven examples. I'm not sure if I can go through all of them. We'll see. So this is the first example. So this is old school streams. So you implement the red method, and with read, you just iterate over a big array and just push all those values in, and then push null. This is how you would do implement these things a long time ago. So you run this, and you just spit out all the code. Now this was how you would do this thing. I changed this, and this will not run on Node 6. In Node 6, this won't work. By the way, we have fixed some stuff. Node 6, this won't work, whatever. Don't try that on Node 6. Of course, that is long gone, so we don't have to worry about it anymore. So now let's go and look at another version of the same code. So this is just using readable from. Now that we still use onData to process the data, however, in this case, we just pass the array through and we just get a stream back. This is really powerful for testing, for example, because in this way, we can be compatible and so on without breaking a sweat. So we can run this code, and you will still see, you can still work the exact same. Now, what you can do is you can also go into the fully inception mode to some extent, and you can pass in a generator, a generator function. Note that this is all synchronous. A generator is synchronous. A sync generator is asynchronous. It says in the name, but anyway, you can do this and you can still run it, and you still get the exact same output as before, but now you're generating it from a function instead. This is pretty useful, again. We can, of course, use those with, we can go from an iterator to an async iterator using this API. So we can pass, we have our generator here, we pass it to readable from, and then we async iterate over it. Why? Because it's all pretty much generic. So we can still run the code and still spitting out 1,024 numbers. I have a lot of fantasy. So as I said, we can also go fully inception mode and do the async iterator to a sync iterator thing. Note that if I do, let's show this. So this still runs. Note that if I do instead, it will still do exact same thing. We still work the exact same. There is, as I said, there is a little bit of difference related to how much data is fetched from the sync iterator function. So there is that thing to take into consideration. But that's it. So now you can also, given that, so those are streams, though, so you can also call all the full APIs. So for example, if you have a file, if you want to read it from a file, and you want to read as text, as strings, you need to set the encoding. And you set the encoding to a UTF-8. And it will still work. Now let's run six. It spits out the same file. A lot of hard magic here. And note that when you do this, then you can do whatever processing you want in here. So for example, if you do, by the way, always use promissify if you want to get a promised version of anything, don't do it manually. The chances if you make a mistake doing manually, or some of your colleague will make a mistake after you've done it correctly, six months later, it's very high. So don't. Use promissify. It's there for a reason. And it's also really fast. So we generate our sleep function with setTimeOut. There are a lot of nasty things that you can just forget. If you just forget our return in your new promise thing, you will just create a lot of popcars, and memory leaks, and very bad things. So avoid. Again, broken promises. And we can do a quick await here of this. So you see, now it's taking some time. And note that this is interesting. Do you know why it's a single chunk? Yes, because it's read as a full chunk from the disk. Interesting, right? So if you do the same thing in here, if you do the same thing in here, you will see that now it does what says on the tin. Note that if you do it in this two, this was the one, correct one. Note that, yeah, it's fine. I'm going forward. So anyway, last thing is that I want to show you. It's an example on the server. Note that when you're using a sync await on the server, you always need to put a catch handler in there. Make sure of that. If you don't put a catch handler, you're going to create a memory leak. Every single module called make promises safe, the name implies the name is exactly what it implies, or you crash on an handle rejection. This is a big conversation, and it's not the focus on this talk. But if you run your code with no an handle rejection handler and you don't crash on handle rejection, you probably have some memory leaks in your code. Very likely. Anyway, just always put a catch handler when you are moving from a non-async function to a promise-based function or a sync function. So what we do here, we just count the chunks and count the length. And then we just here, we just use the request module to pipe things. So I'm just running this up. And you see, this is all working fine. And it's actually really fast. Note that using a sync iterator is actually really fast, probably faster than using other things in Node. It's as fast as any other way of processing streams in Node. So use that. That doesn't have a big performance over it. We have optimized async iterators to the bone probably. So the code is really nasty in there. I've wrote the things. Anyway. So let's talk a little bit on what is missing. So the first one that is missing is how to write a transform stream using async iterators. So that is, for example, something that you could use to actually process the data along the way. This is actually a very nice idea because we can have transform by. And you just pass in a sync iterator, a sync generator function that accept a sync iterator as an input. So it has this homomorphic input and output. And then you can just forward await the thing and just yield the results. So it's a pretty nice pattern to write a transform, a way to transform data. Pretty cool, right? I love this. So and then if you convert it to a transform stream, then you can use it with pipeline and interact all the streams ecosystem modules. Note that this is currently being worked on. So hopefully this PR will land again. It's been got stuck at the beginning of October. So I'm hoping to unstuck it this weekend. We have the collaborator summit in two days. So hopefully I can get that unstuck and you can get that landed. Eventually, I would love to be able to create, to avoid creating a transform completely. So you can just pass in a sync function, a sync generator function, that essentially implement that pattern for you. So in this way, you avoid creating another stream. You can just compose function as much as you like. It's pretty interesting. This is not implemented yet. This is fantastic code. I'm living in Wonderland now. Hopefully this will hit your screen maybe on node 14. Hopefully. We'll back port it though. Now I've talked about readable. I've talked about transform. Now I'm talking about writable. You know. I haven't talked about writable at all at this point. Well, I don't know what to do with writable. That's the end story. And if you want to write something, you need to consume a sync iterator. However, you want to have an operation that actually returns a freaking promise. The problem, the fact that our current write thing does not return a promise. And it's very hard to create the behavior to return a promise. Why? Well, in this code, I would like to be able to write this code. Look at this. This is we have write, we have build write, and then we do a way to write chunk. Now, this code is correct. However, all the magic is in build write. The problem is in the error-ending model of promises versus streams. Streams emit an error event as soon as it happens. Promise is like quantum physics. So they will error only when you look at it. That's the core part of the problem. And this is why using promises with streams is really hard. Because one thing is error straight away. The other thing, error only when you look at it. So those two things are really, really hard to mix and match. And you can see this code that does this kind of shenanigans to make this happen. So essentially, we are registering an error event and caching the error. I'm caching the error so that whenever you and we return with this write function, we return a promise that when you look at it, will error. It's really bad. It's really bad code. So I don't have anything better than this. So if you have any ideas, any promise expert that wants to give it a shot, I am really welcome to brainstorm things. This is still open. So how does it all work? Well, all of this is implemented in JavaScript. So you can use all of this stuff in your code, learn the pattern, and use it for other things. So asyncIterator, as I said, I'm running a little bit out of time. So whatever. Hope you don't kick me out. So this is an extract from Nodecore. So essentially, we had implemented the symbol, a dot, asyncIterator primitive that returns a function. And essentially, we just call our generator for that, our builder for that, and we get it on an asyncIterator. So essentially, in order to build something, you just need to provide that. Note that if you want to build an asyncIterator, you should use this type of pattern. If you don't want to use an async generator function, that is nice and really cool to use. But if you want to do some of the hard stuff that we are doing or providing the mapping that we do in Nodecore, what you need to do is that you need to create an object which has the prototype of the asyncIterator and implements a method, like next and return. When next is called to get the next element, when the return is called, when you exit. That's it. There are some other few, but they are less important. These are the key two ones. So internally, in Nodecore, we wrap the readable event. These are some consequences, and one of those things that is there, it's really performant. So that's a good site. Also, it handles well buffering and all the things. This is a long link, sorry, it's very small, is the source of the implementation. Now, I'm going to open it up for you. And I'm going to open it up and just do a quick scroll. It's 200 lines of code. Not that much, but really a lot of very ugly stuff in here. Sorry. A lot of other cases. So there is no performance penalty in using asyncIterators over the other methods. So that's the key part. Which if you talk about a promise-based API saying there is no performance penalty, this is pretty big. And I'm trying to undersell it a little bit. Typically, promise-based stuff is lower than call-back-based stuff. This thing is on par. Just so that you know. One last thing before I finish, there is what would you streams. Now, what would you streams as what you're using in NodeFetch? Sorry, not in NodeFetch. I'm using in Fetch in the browser. Not in NodeFetch. Node does not have what would you streams. Now, this creates a bit of a miscompatibility between the two things. This is how what would you streams work, which has this pipe-to-method. It's really complex again. The key part there is streaming is really streams-based API are really complex. And if you want to implement those, you need to implement this thing, which has an upper controller and a lot of other stuff, which is really complicated again. So not really easy. And I just want to skip through this. But we would like to have those in Nodecore. And to make it more consistent, you've probably seen my stock. So it makes some sense to have this in Nodecore. However, we want to be compatible with the node ecosystem, which only use node streams. At what time are you here, the next one? OK, so I came. Sorry. However, what would you streams are going to be a sync API iterable? So that's cool. They implement the same pattern. So I hope one day to be able to do this type of thing, which we start from a what would you readable, and then consume is just using an async generator function passed to a node writable. And it will all work fine. So where all the interaction is being managed via the async iterator protocol, which will be pretty fantastic. Do you want to get involved? Please reach out. I will be at the booth. So reach out to me the next few days if you're interested in any of the stuff and any of those questions. Again, we are near form. Again, there's a raffle, blah, blah, blah. And thank you.