 My name is Arrange Kumar. I'll be talking about a new Haskell library called Streamly. So this is built for concurrent Dataflow programming. So Streamly is, if you have listened to the previous talk, it was about shell programming using shell, but using pipelines connecting different processes using shell pipelines. So we go a little bit more fine grained here. We go inside the program, not at process granularity, but inside the program we will be connecting different, you can say functions using a pipeline instead of processes. So we get with the Streamly ergonomics of shell just like what we saw in the first presentation before this. Safety of Haskell, right? The shell programming doesn't have any safety at all. It's not typed. The main thing here is speed of C. That's the USP of Streamly. So you get the speed of C. We'll see some of the examples that run at the speed of C. And concurrency is inbuilt in the streaming paradigm. So you can make streams concurrent. You can evaluate the actions in the stream concurrently, very efficiently. So about me, I have done C programming for the most part of my career. I switched to Haskell in 2015 to solve the scalability problem of engineering that we were facing where I was working before. So it was like millions of lines of C code and we were trying to refactor the code, but it doesn't usually work because it comes in the way of regular engineering. So you should be doing things right from beginning. So I started looking at, if I have to write this kind of system from scratch, so what will I use? So I looked at a lot of languages and then finally decided Haskell is the language that I like. And what is good about Haskell is purity. And purity enables equational reasoning. The compiler can reason about the code and the programmer can reason about the code. You can substitute equivalent code very easily that you can't do in languages which don't have purity. So I founded a company called Composwell Technologies which is into making software engineering better to all kinds of things. So this is one of the projects that we are working on currently. So this is our focus right now, how to get the core fundamental things right and then build upon that. So simplicity is something that we are obsessed about and we spend a lot of time trying to make things simple. Simple for the user, simple for the programmer who is using this library. So things have to be as simple as they can be and it takes a lot of effort bringing about simplicity in something. So we are obsessed about small things, very small things. Okay, how to do this right? And it takes a lot of time just to iron out that small crinkle and make it simpler. So as Dijkstra said, simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse, people think that complexity is like when they see complexity, it is something great, right? And what we say is that achieving simplicity is what requires a lot of time and effort. So what are streams? So a lot of people think when the term stream comes to their mind, they see it is some different paradigm of programming to solve some particular set of problems in programming but we say stream is a very general model of solving problems. So a stream is a sequence of same type of items. Combinators to process those sequences are what this library provides. So if you look at the Haskell lists, so they are also, I will say they are pure streams. They are not effectful and what we call streams, they are effectful lists. So we'll call them effectful lists because the items that you are generating in the sequence, they are dynamically generated using some action. So you can keep on generating the items, you can keep on generating your list. In pure lists, you have everything already generated, you process that pure sequence. So that's the only difference between streams and lists. So I call in imperative terms, if you say, okay, you need a loop to process a sequence. In functional programming, we need streams to process a sequence. So streams are nothing but modular loops if you compare that to imperative paradigm. So a lot of people ask this question, do I really need streams? So I say the imperative version of this question is do I need loops? If you say I need loops in imperative programming, then you need streams in functional programming. Only that the ergonomics of streams stops people from using them because they think, okay, this is such a heavyweight tool to solve such a small problem. But so what we are trying to do is make it as simple as possible so that people always reach out for that tool rather than doing something manually, rolling out your own streams. So I'll say that streamly is efficient, composable and concurrent loops. So if you need looping in imperative programming, you can equate that to, okay, I need, I can use streamly combinators to do this processing. So you can model parallely with, if you have imperative, if you think about your problem imperatively, you can parallely model it in functional programming on streams. So who should be using this library? Who is the target audience? So all programmers can use this. If you need loops, then you need streams. And can you say that, no, I don't need loops in my programming. So this is a general purpose framework. It provides you declarative concurrency, a very high performance, it nearing or beating C in some cases. So C is as fast as you can go if you optimize it well, but we can match that in most common cases. So you can use it as in server back ends reactive programming, real time data analysis, and what people call streaming data applications in general. So let's take a look at the library. It will just scratch the surface of it. It's a huge library. You can do all kinds of things. And if you go through it, you will learn new things every time you go and try to look deeper into it. So what are the goals that we have when designing this system? So in terms of ergonomics, we want like Python and Shell kind of high-level programming and efficiency. Like programmers, when they think of a problem, they immediately can reach out for it and prototype something very quickly. Like in Python or Shell Edge, Arithya was talking about in the previous talk that I reach out for this shell programming when I have to do something quickly. So that's what we aim for in terms of ergonomics. Performance, we aim for C-like performance, composability of FASCAL, and the safety of FASCAL. And we want to keep things simple. So we use only basic FASCAL. We don't use very advanced features of like type-level programming and things like that. We try to restrain ourselves to basic things that everyone can just grasp easily. We compile up to 7.10.3, GHC 7.10.3. So that also means we are not using any new features after that. So composability and performance, these are two conflicting goals. And it's very hard to achieve both of them at the same time. So that's what we are trying to achieve with this library. Usually when you get composability, you don't get performance. If you get performance, then you lost your composability of your program. So this library aims to provide both. So we'll be using extremely 0.70, which I released just tonight yesterday. In some of the examples, we will be using internal APIs, but don't think that they are something very experimental. They are stable APIs, but we haven't decided their names and in what module structure they should be. So we haven't exposed them yet. So we'll fit them into the module structure later on. So they will come to life, but currently they are internal. All the examples are available, and extremely examples will pose a tree on GitHub. So let's talk about fundamental operations that you can do on stream. So there are three fundamental operations in the library. So you can categorize in these three categories, all the combinators. Generate, transform, and eliminate. So generate means you generate a stream from some non-stream value from a seed. You generated a sequence. Transform means you transform a sequence to another sequence. And eliminate from the sequence, you fold it into a single value. So these three categories are almost all combinators you can put in these three categories. So these are three modules. We have unfold module. There is a preload module and fold module. So these are three types in those modules. These are the fundamental types that we have in Streamly. Unfold generates a stream. The stream type is it can be used for unfold, fold, and transformation, all three. But unfold type has been used for high performance in some cases, where it can fuse very well and provide you good performance. And similarly, fold. So composing generators efficiently, stream generators efficiently, you need the unfold type. And splitting streams into multiple folds, you need the fold type. So that's why these types exist. Otherwise, stream can do all kinds of operations. But we say generation is unfold, transformation is stream, and elimination is fold. These are the three main modules that you have, one for unfold. The real abbreviation is UF, and stream as S in examples, and fold as FL. In examples, you will see combinators from these modules. So there are two fundamental functions for generation and folding. So the unfold function takes an unfold data type and a seed and makes a stream out of it. The fold function does the opposite. It takes a fold data type in what way you want to fold. The fold data type represents in what way do you want to fold the stream. And fold data type and the stream, and it creates a single value. So if you unfold, you generate a stream, then fold eliminates that stream. So finally, it just becomes an action. So if you see A to stream MB and stream MB to MC, so when you compose it, it's just an action. So as I said, these are loops. So you create, this is like an imperative programming. You can equate it to a loop. So you unfold it, generated a sequence, you process it in a loop, and then you create something, you fold it into a single value. We'll be using three in the previous examples. If you see, we use the stream as the type, but that's a general conceptual type. We don't have a type like that exposed. The types, concrete types are serial T, async T, ahead T. So these are like different types of streams. Serial T is the serial stream. Because we have concurrency support, we have types for concurrent streams. So async T is one of the concurrent types, and ahead T is another concurrent type, different behaviors of concurrency, which we'll see later. So how do you use a simple, this is the simplest possible example. How do you use the APIs? So you unfold. So uf.formList is an unfold. So uf is the unfold module. In that, we have the combinator form list, which generates. So this list is the seed. It generates a stream out of the seed. So what it does is it generates a stream, which is elements from 1 to 10. And then we immediately call fold on that. And which fold we call? The fold name is sum. So that fold basically sums all the elements in the stream that are coming in. This fold combinator will just sum the streams. So finally, you will get the sum of the elements. So you have serial T, identity int is the output from the unfold. That's the stream in the identity monad. And then when you fold, you get the int value in the identity monad. You can take out that value from the identity monad. That will be your sum, sum of these values. So I showed you using unfold and fold, but we have combinators which directly work on streams instead of using unfolds. So we can just define it like this. FormList is unfold uf.formList and sum is fold fl.sum. So it's exactly the same thing. So you can write it like s.formList. So in the stream module, we have these already available convenient definitions which we can use. Instead of using unfold and fold, we can directly use these combinators to work on streams. So s.formList will directly create a stream from this list of one to 10. And s.sum will eliminate it. So you don't have to call unfold and fold in most cases. But in some cases unfold and fold will be required and we'll see when you compose generators or when you split a stream. When you merge streams or when you split streams, then you will need to compose multiple generators or to split a stream into multiple folds. Then you will need unfold and fold. So let's come to more real life examples, more complicated examples. We looked at the pure stream till now. Let's look at effectful stream. So we'll work with these modules. File system related combinators are in the file system hierarchy. We have a handle module and a file module. So handle works on file handles or file descriptors if you like in C, right? So that's wrapped into a handle. The file module works directly on file names rather than handles. The socket module similarly works on sockets, socket handles. The Inet.tcp module works directly on addresses just like you would work directly on files instead of handles. The Unicode stream module handles the Unicode encoding, decoding, et cetera, the text processing. So let's write a cat program, right? So this is all the code to write a cat program in stream. So you take input file, which is called in file, called file.toChunks, will basically create a stream of chunks and fs.putChunks will put those chunks to standard output, right? So you create a chunk stream from file and you fold that stream to standard output using putChunks. So that's all. That's the definition of cat. If you want to write Cp, so instead of standard output, you say, okay, file.toChunks.outFile. So this, basically, from chunks, it folds it into a file name called outFile. These examples are available in the GitHub repository steamly examples, so you can run them and they have same performance or in some cases, better performance as the core utilities that you will find on your system. So this is an example of word counting program. So WC minus C counts characters. You will use two bytes. So we created a byte stream from the input file using two bytes and then we use the length fold on it to see what is the length of this stream. So that's WC minus C. Now if we want to count lines, now again, we create a byte stream and we write a fold. So here, if you will see, where is my pointer? All right, so let's look at count L. So this is a fold. This is a accumulating function of the fold. So it takes the existing int, which is the existing count till now accumulated. You take the next byte and then you increment this count if you encounter a new line, right? That's how folds work. So we created a fold out of it here from a pure function. So count L is a pure function. So to create a fold from pure function, we use fl.makePure. Count L is the step function of this fold. So basically what do we do when we get a new item in the stream? That's what count L increments the accumulator. The initial value of the accumulator is zero. So initially the count was zero. Count L increments the count. And finally we extract the final count using the ID function. So the last value of the accumulator is our count, final count. So we created a fold using that, which is called n lines and then we fold the stream using n lines fold. So you get the line count in the stream. If you have to do word count, it's WC minus W. It's a little bit more complicated because you need to see whether, so you need to transition from space to non-space characters. So this is the code to do. So count W is the code to count the words. So you, in the state you have the count of words and whether the previous character was a space, right? So this is all the state that you need to count words. Initial state we start with zero words and space true. So as soon as you encounter a non-space you will increment the count. And finally when the fold is done, you use FST means the first value of the tuple to get the final count. So we use this fold S dot fold n words on that input stream, stream of bytes to get the count of words, right? Now let's come to splitting a stream. So splitting a stream means you use the same stream to do multiple folds in parallel. Simultaneously you are pushing that stream into different folds. So what is an example of that? It's WC minus CLW. So you want to count characters, lines and words at the same time. Our previous programs they counted only character or only lines or only words, right? So how do we compose this? So we basically use the same folds that we created earlier and compose it using applicative composition. This is what the folder library does. So the same thing we do with streams. So we fold it, the stream using this composed fold. So this applicative composition creates another fold which has these constituent folds. One counting lines, one counting words and one counting length, right? The stream is pushed to all these three folds simultaneously and finally you collect the results which is int comma int comma int, right? Three tuple of the three counts, lines, words and length. So this is what we say composability. What is composability, right? In Haskell. So you can't do this kind of thing in C. This kind of composability you won't get in other programming languages. So there is another example here. So this is a T. So if you want to send the output to multiple files, right? So you create a stream again and you use the T fold which is a composition of two folds, one writing to out file one the other writing to out file two. You compose that fold and fold the stream using that composed fold. So your stream will be going to both the files. So this is the whole program that can emulate the T utility on your UNIX system. Now this is a little bit more complicated example. It demonstrates that within a stream you can confine a state to a certain part of stream. So you can introduce a state at certain part of stream and use that state for transformation within that part of the stream. So what you're doing here is again the input file you convert it into a byte stream. We'll lift in our lifts the inner monad to a state T monad. It basically introduces a state. And this is a experimental API as of now. It is something very new which I created just a few days ago, a different kind of fold. That's why it's called chunks of two. I couldn't come up with a good name yet. So it takes chunks of 64 and this new handle API it creates a new handle using the state T monad because you want to fold parts of the stream to individual files. So you need to create different names of files. I want to create in file. I want to create out file one, then out file two, then out file three, then out file four. So I will keep creating until the whole file has been split into many files. So this basically splits a file into many files. So this state T monad generates that state creating new handles, creating new file names and handles for them and then writing the chunks to those files. Eval state T starts, so the state starts with nothing. And then it keeps track of like which sequence we are at from zero, one, two, three, four, which part of the file. So this is pretty simple. This is the example of a split program, the split utility that you will see on your Unix system. So let's look at transformation pipeline. So these were folds. We talked about folds till now. How do you fold a stream? Generate a stream and fold it in different phase. You can compose folds. So that gives you the full composability in splitting the streams. Now we are coming to how do you transform one stream into another stream? You can filter, you can map. You can go look at all the combinators there but we'll look at small thing here. So this is a word classifier example. This was, original example was by Patrick Thompson for Streaming Library. I adapted it to Streamly. So this is pretty simple. So you need to classify the words in a text file. So you basically want to count how many times, what are the words which are most frequently used in that particular file. In a descending order you want to print the words. So what we do here is, we create a byte stream from the input file. We decode it using Latin one decoding. We can use decode UTF-8 if we want. Then we use tool over so that we are not case sensitive. So if a word has a capital letters or small letters we'll count it as the same. Then we use the words combinator to get the words out of it. This fold, tool list is a fold. So S dot words takes a fold. It will use the word, words that are coming in the stream. It will make a list out of it. I mean a string. So you will get a serial TIO string here from care. So you got a word. Now these words we filter them using is all characters or all characters in them alphabetic, alphabets, right. So is alpha we use for that filtering and then we fold it to a hash map. That folding to hash map code is here. So we use an IORF to be more efficient. It's a mutable count so that we don't generate too much garbage in hash map modifications, right. So we just use a mutable count there. So this is all of the code to do to write the word classifier. So then you can just this hash map you can convert it to a list and print the sorted and print the top entries from that. This is another example to create a word size histogram. So again you create, you convert the file into a stream of bytes. Get a byte stream. Then use the length fold here to get the length of the words. To get the length of the bytes. No, the words, sorry the S dot words converts that into a word and then you provide the length fold to the word. So each word is converted into a length. What is the length of the word? And then the bucket function that you see above that just puts them into different buckets up to nine length. It puts, if it is greater than nine it puts in the same bucket. If it is up to nine it puts in different buckets. So we count like all the long words in the same bucket. And then we fold using the classify fold which basically nothing but converts the key value stream to a map. And then we take, so in this in each bucket what stream is incoming in each bucket we apply the length fold. So we are looking at how many words we got in each bucket. So we get a histogram. So when we print this map we get a histogram of lengths encountered. So this is just a simple four line program to get the histogram. So that's how you can use streams to model real life problems in a very simple, stateless paradigm. If you want to debug a pipeline there there are trace and tap combinators which you can use to, if you want to print or log to a file you can just do that in the middle of a stream here we used print. So you can say s dot trace print within this pipeline. So by the way if you don't understand what this ampersand means this is basically from first, this is just like the shell pipe, right? So the stream that comes from the first stage is fed to the next stage. So this is basically that is the first one is given as an argument to the next function. Output that comes from the first is given as argument to the next one and so on. So whatever output comes from that is given to the next one. So this is opposite of the dollar operator. So we look that combining folds and transformations. Now let's look at how we compose on folds or generating streams. We have looked at how to generate a single stream but now if you are generating multiple streams and you want to combine them into one stream so that's what we are talking about here. So if you want to append n streams. So there are simple combinators to merge streams or zip streams or append streams. You can look at the library over there. You can quickly do that but we have a little bit more complicated example here. So how do you append n stream where you don't even know n dynamically. So this is if you are familiar with the shell, right? So cat dir slash star greater than out file. So you just cat a number of files and then you concatenate all of them and they go into one single file. So you combine this is opposite of split, right? We saw split to split a file into n chunks now this does the opposite. Read all those files, all those chunks of the file and then combine them into a single file. So what are we doing here? So in the dir module you have a two files combinator which will get all the files within that dir in that directory. So dir name is the directory. Now we got a stream of file names. We use the concat unfold function. So file dot read is an unfold. This is an unfold it from a seed it creates a stream. So you will get file names here and then you will create those file names. File dot read will read those files and concat unfold will concat those into a single stream and then from that stream from bytes you write to the out file. So we'll go a little quick here because we are getting out of time. So this is a better replacement for list T and logic T. If you can do logic programming using this. This is just nested loops if you know about nested loops in imperative programming so list T is just nested loops. So declarative concurrency if you, this is an example of concurrency. So if you want to look up words, so this is a serial example. You want to look up meanings of words cat, dog, mouse. So you do a map fetch on the word list and fetch is defined here which gets it from a server. Here the server is Google search. So serially how will you do it? So you take those actions from the meanings list and you map show on them. So this from list TAM basically fetches all the meanings so it basically executes those actions and the results. Now we map show on them and then do put strings and it puts it on your standard output. If you want to do it asynchronously you just do a sync link. You just treat this combinator and all of them will be fetched concurrently but the results will be given to you as they come, first come, first serve. Another combinator you do speculative concurrency. So instead of a sync link you can say aheadly. So this is kind of look ahead. The ahead name comes from look ahead. So you do speculative concurrency. Basically you run them all concurrently but you present them in the same order as you specified. In the same order as steam. They are ordered just like serial but the actions are concurrent. So this is the whole, this is a word lookup server which is concurrent. So if you look at, so this is the control flow starts from here. This is your main program. You accept connections on 80, 90. So it gives you a stream of sockets. Now this is serial up to this point. Now from, we have specified a sync link here. So from this, this mapM is a sync. So you mapM a function called serve which is defined here, asynchronously on this. So all the sockets are now processed. You serve from each socket concurrently. So you read data and you serve data. So what you call here is lookup words on each socket in this serve function. What lookup words does is it reads the stream from the socket. It decodes using Latin one. You convert it into words, using the words combinator that we saw earlier. Then, so you do mapM aheadly. So this aheadly applies to this mapM and this serially applies to everything that came before it. So this mapM fetch, so it will fetch all of the words that are coming in that stream in that same order. So you want to present to the user in the same order. So we use the aheadly combinator. You can also do asynchronously if you want to present in a different order as they come. Then we just intercalate that with new lines. We sprinkle new lines between the results and then we write these results to the socket. So that's all. So this is fully concurrent. So this happens lookup words. All the words are looked up concurrently to the server they go concurrently and you can specify how many threads you want, how much buffer do you want, how many words should be looked up concurrently using max threads and max buffer combinators. So other than that, you can also control the rate. If you want to say, I don't want to put so much load on Google, has an API, a list, it has a limit on how many requests you can send per second. So you can just specify that using max rate. Similarly, you can do connections per second. You can merge live word streams together using concat map with, that's a very powerful combinator. You can go look up, you can run these examples from the examples repository. You can do recursive directory listing. This is the whole program. These are whole programs. This is the whole program to recursively list a directory concurrently. So this works concurrently. See this aheadly combinator and this concat map with ahead here. That does all the magic. So concurrency is demand scales, demand scale. So no threads if you are not consuming anything. If the consumer consumes at a faster rate, more threads will be created and it will walk at a faster rate. So max threads and max buffer can be used to control how much concurrent you are. So you can write concurrently to multiple destinations. Let's say you have a stream coming in and you want to send to multiple different servers the same stream. For example logs, you want to send to two different processors, two different nodes. So you can do that easily using tap async. So tap async is a concurrent folding combinator. So list T is also concurrent. You can do non-determinism concurrently and in different ways using a head or W-serial using a sync or W-sync. There are multiple combinators. So you can interleave the loops in many different ways like you will do in OpenMP. In C, we have more facilities than OpenMP in fact. So streaming plus concurrency is reactive programming. So you can write GUIs games. We also have a acid rain game example in the package. This is circling square example which shows the animation. You can just say if you want to refresh at 40 Hertz or 60 Hertz, so you can just create frames and use max rate 60 or min rate 60, whatever you just need to use one combinator and you will keep refreshing at that rate using the stream. So performance. So we have production deployment at just pay. And they say they got 20x performance increase. So they moved from 40 into eight core machines to a single 16 core machine after using Haskell and Streamly. Right, so earlier they were using PySpark. So they use real time and batch log analytics for millions of payment transactions every day. And it's just a few underlines of code that is doing all the real time analytics. And the good thing is they can change the code fearlessly every day. I mean whenever they want something they immediately do it, put it in production. Same day itself. So that's the good thing about Haskell type safety. So comparison, let's do some performance comparison with, this is using GSE rate one. We are comparing with lists. So Streamly streams are lists as well. So pure streams are lists if you use identity monad. So we compared them with lists. And these are all the operations that we did. When you compose operations many times lists are pretty slow. So this drop map into four. Four times we did drop map on the same stream. So the performance is 151x times worse than Streamly. So Streamly is worse in two operations, concat map and append. So in that lists perform better than Streamly. All other operations Streamly performs better. This is comparison with the streaming libraries that are currently available in the Haskell ecosystem. We compared with streaming, conduit and pipes and in all operations Streamly is faster. In some cases these libraries are slower by like 1000 times Streamly. These benchmarks are available in the streaming benchmarks repository. If you find any problems with the benchmarks you can always raise an issue and we'll take a look into that. These are micro bench marks by the way. We are measuring each operation and mostly the problems are worse when you compose multiple times. When you do the same operation multiple times or you combine scan and map and filter and all these operations together then it becomes worse in other libraries. With Streamly that works pretty well. This is the memory comparison. Streaming has the same memory consumption as Streamly. Pipes and conduit have some cases where they take a lot of memory compared to streaming and Streamly. Comparison with C. So this is a C program to count words. So this slide is all the state that it maintains this program like line count, word count, last character was space, et cetera. And this is the logic. You have two nested loops, right? Outer Y loop is reads buffers and the inner loop which processes that buffer to get the counts, line count, word count, and character count. This is the same program in Haskell using Streamly. This is all the state that you have in the upper box. And this is all the logic. Logic is pure. There is no state here. So you just use the update counts function, start with initial state and you use a fold. So this is all the program. So this is you unfold and then fold using update counts, right? Let's look at the performance. So this program, yeah, I'm out of time. So I'll just finish in one minute, right? So this is 2.34 second C, 2.17 second Haskell. So Haskell is faster in this particular case. The same program and these correspond exactly to each other in terms of what we're doing. So can Haskell be as fast as C? You can go through the slides. It has details whether it can be. So we have seen that it can be faster. GSC can perform global optimizations because of the equation reasoning and purity is the source of that because the language is pure. So it can do global program optimization because of purity. So what are the downsides? So there are some downsides as well because it doesn't work as reliably. We have written a GSC plugin for Fusion to work reliably in cases where we want the whole loop to fuse and eliminate all our constructors so that it works just like C. So we have created a plugin, Pranay here, he has created that plugin so that we mark the constructors that they are part of the stream constructors and we then make sure that the function GSC in lines that function irrespective of the size of the function inside, the join point basically, the join binding that it creates, it has to in line irrespective of the size and then those constructors can fuse. These are some stats about the project. There are 25K lines of code, 16K documentation, 95 files. It's very high quality tested production capable. We have a lot of tests and benchmarks. So benchmarks is our business. We are into performance, so there are a huge number of benchmarks that we have, everything we measure. All combinators are measured. There is some work in progress, stream passes, concurrent folds, splitting and merging of transformations. In the roadmap we have shared concurrent state so the concurrency that we showed was shared nothing kind of concurrency so we'll have the shared state concurrency as well, persistent queues vector instructions, support SIMD, distributed processing and a lot more stuff. So this is just, there's a lot more in the library. This was just an overview. So you can just create pipelines the way you want, the way you imagine. You will have all the facilities. If you don't find anything, we'll create it.