 So, thanks for welcoming me here in India. It's a pleasure to be here and to present you this library. So we'll talk about the machines library, which is Haskell libraries to do data streaming. Just a few words about myself. So I'm contributing to some open source project. You might have seen my name in some Scala stuff, like ScalaZ, or in Haskell, where I wrote a few tools. I'm working in Switzerland for a company called Best Mile. And we are building an operating platform for autonomous vehicles. So it's basically a way to take these autonomous vehicles and make a transport service out of it. So if working in Switzerland sounds interesting to you, just get in touch with me. We are actively hiring. So yeah. So what we will do today, we will explore that fascinating design space for data streaming. In the context of purely functional programming language, which give interesting challenges. First, we'll see a little history of what were the different solutions developed over the years, what are the trade-offs between these solutions. And then we'll see how to use a machine in practice. So the story is to motivate why I did choose to use machines, because there was a lot of different alternatives in the design space. So the fundamental problem that we have is that we want to express a program that processes a stream of data while interleaving effects, being constant in space and time, and achieve strong compositivity, reusability. It's actually quite hard to keep these three things together. So maybe to understand more what they mean, we'll see a few examples where we just remove one of them, and we see what we can do with just two. And in Haskell, obviously. So if we remove the interleaving of effects, we can use Haskell lists, which provide both constant space time and strong compositability. So here there is an example. We create an infinite list of integer. We have to function one ink, which simply increments the integer by one, another one deck, which just decrements the integer by 10. And what we want to do is basically, so then we compute the result of our operation, right? We take that list of integer, we fmap on it with deck, then we ink, and then we just take the 10 first element. So it works pretty well. Even if the excess list is infinite, the program actually terminates thanks to lazy evaluation, right? And it actually fuses the different phases together. And we have nice compositability, right? We see that we can just chain this different fmap together, and we compose our operation. That's great, but we cannot mix effect into this, right? All these functions are pure. There's no IO there. So OK, we have these two, but we don't have interleaving of effects. So what if we put interleaving of effects, but we lose constant space and time? So we can use the reason in the Haskell core library some function to work with IO that helps you interleave effects into it. One is mapM, another one is foldM. So we take back the same example, but this time our increment and decrement function work in the IO monad because they are doing a side effect. Here, I actually don't do any side effects. I just lift it in the IO monad using return. But I could, for example, print a line on the console or do any other IO operation, right? So in order to do that, I basically have to use mapM as we see in the results definition. And I can simply call mapM to time. Then I call takeTime, and I get the results. I keep having a nice composability, right? I can chain my two steps together. Well, that's nice, but sadly, the program won't terminate. If you run that, it will get stuck at mapM in excess because it will try to, it will basically force excess, which is an infinite list, and it will never terminate. So that's not good, right? So now let's see what if we drop the composability aspect. So we can do that by writing a highly-specialized loop. So I just keep the definition of ink deck and I access it the same as the previous slide. And now we have to write the loop by hand. So basically here, we do a recursion write, and we actually encode the logic that we had previously. Here we see that we can compose our two functions together, but here we have to call it in the loop and then do the recursion here. So we really lose composability. We're still able to compose these two functions here, but this is how we have to do the take, right? It's really highly-specialized. But at least it terminates and it works. But it's not ideal. We would like to keep composability as well. So what are the solutions then, you know, to solve this problem? And there is a lot of them. So the first one which was proposed is the... Well, actually, at first there was lazy.io. You might be heard of that. It's a way to process big files in Haskell without having to, you know, while still being a constant in space and time. But there is a lot of issue with this approach. First, you have unpredictable resource and link. If you ever use lazy.io, you probably had this issue where you read content of a file in a lazy data structure, and then the next operation just closed the file, but you haven't yet get all the data from that resource. And when you actually try to use it, it breaks. You get an exception. Because basically it breaks a question of reasoning. It makes us feel that we are using a pure data structure, but in fact it is not because that lazy byte string, for example, will actually trigger a read on the file system. And that's a side effect. And as Brian was showing earlier, that breaks the question of reasoning. So the first solution that appeared was in 2008 called ITERATIS and their dual generators. Then we seen a Conjurit library which has a lot of combinators and is quite used. Then there was pipes and then machines. Since I only started coding in Haskell in 2014, I had to make a choice between all these different possibilities and I will try to explain you why I did choose machines. So first let's talk about ITERATIS. So it was a first practical solution to this problem developed by Alec Kisloff. The thing I don't really like about it is that it contains specialized abstraction for the different stages and that makes it harder to compose. For example, you have ITERATIS to consume data which is equivalent to a sync in the other libraries. Generators to produce data, and ITERATIS to transform the data. But there is no common types between these specific things. So it makes harder to compose them together. As though in other aspect, the Russell-Sundling is arguably not optimal in the implementation. So there was a few alternatives developed, Numerator, ITERAT, IO, and others. But they solved some of the problem but they are not really convincing. So then a Conjurit appeared. It's designed by Michael Snowman for the Yezad framework. So it has really a strong emphasis on bringing practical for that framework by having, for example, exception handling, Russell-Sundling. So the library evolved quite a bit during other years and I will say it took some of the stuff from pipes and machines to put them back into it. Like initially there was a mutable state to make it performant but then there was able to rewrite it to use CPS. My concern with it is that the core abstraction makes really a lot of different concerns. Some of them that I think should not be built here and we'll see later how this can be done. So then there was another take on it from Gabriel Gonzalez which is called Pipes which is really elegant because this time he really unified all the different types under a single one making a composition really nice because actually all these abstraction form a category so you can compose them like you compose your function or like you compose anything. But unlike on Duit, it does not handle termination and sadly the way the core is built it's not something you could add later. You simply cannot do it. And then come Machine. So Machine was designed by Edward Metz, Wenner and then Del. The thing with Machine is that it does not try to handle Russell's management. He lived that to the user and I think that's a good idea. Give you a bit more work but I think that today we don't have a good answer for that problem. So unlike any of the other solutions I've seen before, this one allow you to actually deal with complex topology. For example, having multiple inputs in your stream, multiple outputs and stuff like that. It does that by nicely parameterizing the input language. We'll see a bit later what it means. Conceptually it's also a different metaphor. Edward Metz was saying that a pipe metaphor does not mean that it processes data so he preferred to use Machine and actually a Machine metaphor conveys the thinking that you can deal with multiple inputs. Something to know about Machines though that the design is not really complete in a way. There is some abstraction which are currently not unified and we know it's in theory possible to do it. We tried but we still don't have a good solution for that. Still I think that compared to what others library give you, it's a really neat and simple design for such a really complex problem. So in summary, if we take out a Russell Sandling and I think we should because it makes things nicer, there is nothing you can't do with Machines that you could do with any other libraries. Then as I said before, the Condit Library makes too many things together and so the main advantage of using Condit today is that you have a massive ecosystem around it. So we'll find a lot of combinators to use, integration with different libraries but it's most actually temporary. It's just mean that if people start contributing it will get better with Machines and hopefully you might contribute after this talk. So let's get in the library and yeah, here comes the types. So let's see what are the core abstractions defining to Machines and then later we'll see how we can use them. So the machine is really split in two steps, machine and plans. First we'll see the definition of what a machine is and then we'll see the other one. So the core algebra, which is actually what you use to run the thing, right? It's simply a step where you have three type constructors, stop, which means the machine has finished this job, yield, which means that you produce an output value, the O and await, which means that you await for an input value, which is a K in this case. So I won't explain all the type parameters for now. You can, you know, you don't have to understand all of them. We'll see maybe later what they mean. So then this algebra is used by the root of all of our types in machine, which is machine T, which is a monad transformer. So you actually have M, which is the monadic context on what you want to work in. K is the input language. So unlike all the libraries, K is actually a higher kind of type, but we can hear that for now. And O is the output type of what the machine should produce. And then we see that basically running the machine is getting a step in that monadic context. So we'll have to run that monadic context to get the step out of it, to know what will be the next things to do. There's a lot of aliases in machines, which is pretty nice, but Sally GHC does not do a good job at using them to show you our romancages, but one of them is machine WeaveLBT, which allow you to, you know, work on any monad and it use a universal quantification just to do that, to hide, you know, that specific. So now let's see what are the other type available to you. So here we see they're all defined with type aliases, which I think is quite neat. So we have a source, which discard the input language K, and just output value of type B, right? So it's something that you get data from. Then you have another type, which is a process. It restricts the input language to a single value. So it actually, you know, it uses is a type equality to make you encode just a single type instead of having a higher kind of type and you restrict your input to just that single type. So basically a process take values of type A as input and will produce values of type B as output and you have also a monad transformer to work with. So where is the sync? Actually there's no sync define. You could define it like this, but in practice it's not so useful because it makes it harder to pass it around and if later you want to transform it to something else it might be a bit harder. Still there is a sync type defined in one of the combinator library which is called Machines IO, which help you to deal with everything related to IO. And there is a sync IO which, you know, specialized on monad IO, which is typically used if you want, you know, to send data to a file or stuff like that. So now here comes the plan, which is this dichotomy between plan and machine is what I was mentioning before about the library that is not completely finished because in theory you could unify these two things. But here, well, we have to do it. So a machine is usually constructed from a plan which ideally you would like to think about a plan like this. So it looks a bit like the step algebra, right? You have a dam which give you a final result. It's the results of the whole process, you know, what you get at the end, yield which produce a value and give you what you will have to do next. So the plan to continue your work, await as before it give you a value and also the plan to continue working and otherwise you can actually fail. But it is not written like this in the source code. So don't be scared. We'll just see how it is really written but because it is written using continuous passing style. So it is actually written like that. It can be a bit scary to understand but when you use the library really, you don't see all this stuff. This is just to explain a bit the design behind it. So without, you know, understanding all these implementation details, what is important is that you can simply use this function to create your plan if you want to yield a value or a rate. So why there is this separation? It's actually quite critical to achieve the best performance possible. Something I did not mention yet is that in terms of benchmarking, the machine is the fastest kid in town and this is due to this split. Trying to avoid it is really an interesting subject of research. There was a paper lately called Reflection without remorse from a leg which show a nice technique which we thought will be useful here but actually it doesn't work. So, but we still have hope to do that. What it mean in practice to you, this dichotomy, it mean that a plan cannot change its programming logic during its execution. You can think for example about a non-context free parser where you have to understand the context to know what you have to do next. So you have a state that you have to keep in your process and that make your logic change along the way. If that's a bit hard to understand, you can simply follow this simple advice. Always start by building a plan. You have a much friendlier API. It's much nicer to use. And if you run into one of this limitation, you will realize it quickly and you can then switch to writing directly a machine. So now let's see how we can use all of these types to do actually useful stuff. So there is all the code for the data we'll show here available in a Git repo and you can simply clone it if you want to follow it. So let's start by rewriting our initial example that I shown at first. In order to do so, we'll need a few combinators. So let's see what they are. We have taking, which is basically like take for list, but it is for a machine. So what it does is that it will simply... So you give a number of elements you want to take, right? So we'll simply await and yield for that number of times and then it's done. Okay, then we have a nice and useful function which is called utterM. It apply a magnetic function to each element of a process. So basically it take a function from A to MB and lift it in a process TMAB. So yeah, it's a nice way to take any function and just make a machine out of it. Then another combinator which is very useful is a source one. So all of these combinators are in the core library. Source, just simply take a foldable and construct a source out of it. So it will yield each element of that foldable along the way. So these machine are constructed using a plan actually. You know, they use a weight and yield which produce a plan, but then we have to transform it to a machine. Process is a machine, right? So we have to construct. Construct basically take a plan and build it or we have repeatedly, which does the same, but we'll continue repeating the same plan over and over, I'll show the definition later. So let's say we can rewrite that sample program, right? So we have our ink deck access function and now we are creating our pipeline or pipeline of operation. So here we use this operator. This is the operator to compose machine together. You can have it in one direction or the other. It's up to you. Not sure why I like to write it like this because I like to see the data flowing this way until it get there, ultimately. So here we source our access list, right? So we transform that foldable of infinity integer into a source. Then we add our incremental function. Here we use utter M, right? So utter M will transform that into IOIN function into a process T working in the IO manada. We do the same for decrements and then we call that taking function before and yeah, it works. We get the free thing we wanted, right? We have composability. It's constant in space and time. Yeah, we have everything we want. So now let's see a slightly more sophisticated example. In order to do that, sorry. Oh yes, let's see actually how we are transforming a plan to a machine. So this is the two combinator we've seen earlier used in the await and sorry, in the definition before. Maybe I can just show them again. So here construct and repetitively, right? They were used to construct this machine out of plan. So what are they doing? So construct basically, you call that run plan T function, but if we get back to it, basically when you want to run plan T you have to provide how to deal with each of the Ks, right? If I have to wait, if I have to yield something, you have to provide all the function to deal with that. It's basically like if you give your interpreter for it. So here it is. Basically we convert the plan algebra to the machine algebra for each of the constructor. And then repetitively is basically simply constructing a plan forever. So it will do it and once it's done you will redo it again, infinitely. So now let's see how we can answer a boring inter-requestion using machine, the typical fizz buzz. So here we create a process from end to string, right? You will get an intent as input and produce actually one or multiple strings, right? Because so if you're not familiar with fizz buzz, the idea that if a number is divisible by three you have to print fizz, if it's by five you have to print buzz, but if it's buff you have to print fizz and buzz, right? So you could say, oh, I could just put a string with fizz buzz in it, but here it's neater, right? Because it means that for one integer we might produce more than one string, right? And that's where the DSL actually comes handy to express this. So let's see how we do it, right? So first we call repeatedly because this is a process that is infinite, right? There's no reason we want to stop our machine. So what do we do? First we await for a value which will be an int, so we get it, then we check if it's divisible by three or five and then we use simply the one function from Manad, which allow you to run a Manadic action only if a specific predicate hold. So here we say, okay, if mod three I yield fizz, if mod five I yield buzz, and if not any of the other, I just show the number. And I think it's a pretty nice way to express this problem. So then again we have an infinite list of integer. We simply source our access like before, we put fizz buzz in the pipeline and then we can just simply put a print to see what happened. If you run this, it will never terminate, but it works because it will fizz buzz until the infinite, right? But it's pretty neat, constant time memory and it nicely composed. You have a good DSL to express what you want. Yeah, I think it's pretty neat. So let's see a few more quite useful combinators, there's a bunch of them. You can, so the documentation is not really advanced, but still, by looking at Hackage you can guess some of the stuff. One which is a particularly useful is asparts. For example, if you have a process which produce a list of something, it will flatten it. It means that then if you have a process that just take one element of it, it will take it. So basically you could replace this by a list of A and A, right? So you take a list of A and just produce one element after each other. Then there's another combinator which is the largest. You can think of it, it's the max equivalence but for streaming, right? And in case you want to work with file contents, there is another library called machine.io that I mentioned before that provide you a few more combinators to work with that. One is a source handle, which was actually, take the handle of your resource and feed the data in a source. So something just to note here is that it takes a handle directly, right? So you have to take care of closing the handle later or stuff like that. That's where, you know, machine is not mixing resource handling with other stuff. But that's actually fine because as we'll see later we have other ways of dealing with resources. And actually you have to say how you want to get your data. It can be by chunk or by line and there's even other combinators for you. So now let's see how we can use this new combinator we just seen to, well, let's say find the largest world in a given file. So here comes the resource handling. To do the resource handling we'll use with file which is provided directly by the core library and that will ensure that the handle is closed once we finish. It will open the handle for us and it will close it for us. So we said, okay, I want to work with this file. I just want to read it. And then what we provide is a function that we see the handle and then return something and that something will be given to us at the end, right? So then what we say that, so then we prepare our pipeline, right? So here we see the combinators I mentioned before. We say, okay, I want to source that handle by line. I give the handle. It will give me a source of strings. Then I can directly use the function words from a data text that will break the line into words. But that function take one string as input but return a list of string. So we have to use asparts here to flatten that list into just single elements. Then we can use a length to count the number of elements and finally we use largest to just keep the biggest one. And yeah, and it works. So it's quite nice. I think it's not bad to keep the word source handling out of the streaming library. So now we'll just see another library that give you a bunch of combinators still to work with files called machine directory. So I won't show the implementation here. We'll show the implementation of one of them later. It might be a bit more tricky. So basically you have, for example, files. So it's a process that work in IO that take file path and produce file path. It will enumerate all the files in a given directory. Well, in a list of directory. Every directory will give an input. It will produce as output the list of files in them. Then you have the same for directories but instead it is directories. Directory contents will list actually all the content that you find in a given path. And then you have directory walk. Directory walk allow you to, you know, recursively go through a directory and enumerate all the things in it. You know, it's a way to, for example, process all the files in a given folder. So we'll get back to the implementation detail later. So let's see now we can try to write a complete program with all the other things we've seen before. So here what we'll try to do is to write a program that will take a list of folders, enumerate all the files in that folder and find the largest world into it. So here, first we can see that this is exactly the same code as before at the bottom, right? So this is the pipeline to find the largest world in a given file, so it's here. And here we'll use so, let's start at the beginning. First we get the arguments from the command line so we expect them to be a list of file path. We source them and then we use that directory walk, a combinator, that will go through all these directories and list all the files in it. Then, directory walk, it returns you everything, right? Directory and files, but you just want the files. So we use files that will just filter this type of content. And then we can simply lift our function which is build again with a machine and again use largest to compose it and get the result at the end. So yeah, here we see that we can quite nicely compose these things together. You can, here we see that we even run the machine inside and then get the result back and everything fit together. So yeah, so everything we've done so far was by using combinators and putting them together. We did implement something a bit more sophisticated here where we actually build a plan, right? But we were always working with that plan. So as I was saying, once you need to adapt your logic, based on your inputs, you cannot use plan anymore. So here I just want to show you a quick example. Maybe without explaining all the details, just to see when it can be useful to directly use a machine. So when we work a directory, what we need to do, if we want to do it in constant time and memory, because obviously you could, for example, list all the files, all the directories, and then list all the files of all the directories. But that wouldn't be incremental. So we have to switch between two models of operation. One is actually listing the directories and the other one is yielding the contents. And we don't want to list all the directories and then yield all the contents. We want to do it step by step. So in order to do that, we have to write directly a machine or a raw one. So as you can see, it's not as nice as writing using a plan. The key here is that when using these syntax, we can actually give what we want to be the next step. So for example, yeah, so for example, we can define that F function that keep a two list in the states. One is the list of the directory we enumerated and the other one is the list of file we found. So we say if both are empty, well, we have to call us again because we want it to be recursive. If we have some files, which is here, right, we have some files, we'll simply yield them. If we don't, but we have some directories, we'll actually list the content of the directory and add them in that state so we can later enumerate them. So yeah, as you can see, it's a bit more complex, but hopefully you won't have to go that far to do your logic. It's good to know that machines contain a lot of facilities to write state machine, and it can be used that way as well. So that's it. You can find all the code online. A few references from where I got some good information. There's the PIPES tutorial by Gabriel Gonzalez that really explained a lot of, yeah, he explained really nicely how he designed PIPES and some of the reason about the design choice. There is a blog post from Michael Snowman that really described the differences between PIPES and condrits. He explained why he made some decision and really motivates the Rosso Sondling, which can make sense in his context, but it's not my taste. And then obviously, this is code of machines which, yeah, shows some great stuff. So I hope that gave you an intuition about what you can do with that library. And yeah, I mean, if you have questions. Yes? Ha, that's a very good question. No, it doesn't. So sorry, the question was, does it handle back pressure? So it does not by design. So something that machine doesn't do as well, it doesn't deal with concurrency, right? But it's actually not a big deal to achieve it. In a nutshell, what you can do is, basically there is in machines IO, a few combinators that allow you to easily build a sync out of a channel, for example, out of a structure that you use in the STM. And by using, for example, a queue in the STM, or it could even be a T var. Even a T var can give you back pressure because then you bind your stream together and one will wait that the T var is empty before going back. So there's no direct way of doing it with machines but you can achieve it by using other libraries, especially the STM, Software Transaction Management. You have a question, yeah? Sorry? Closure language? Oh, yes, yes, yes. Yes, it's very similar. So the question is, how does that compare to a closure transducer? So I remember the time transducer come out, there was a few discussion on Twitter and from what I got, I think transducer are a slightly specialized version of machines. But it's very close. It's a really similar concept. Yes. Yeah, so the question is, as we've seen before, where was it? Here. We have to create that combination of staking to work, to have the same functionality as take on a list. So the question is, do we have to rewrite everything that we have for a list, but for strings, right? It depends. For example, in a lot of cases, you could just use auto-M, or there is also auto-without the M, that just take a pure function and lift it as a process. So there is stuff you can do like that. But for everything that you will implement on the list itself, no, usually you need specific combinators. It's relatively easy to find them because usually you have take, taking, you know, it's like that, it's a pattern. So you can hopefully easily find them. And usually you have, for example, for filter you have filtering. That's it, thank you guys.