 Functional effects and streaming systems in Clojure. I'm going to explain what functional effects systems are about. We're going to implement one from scratch. Then I'm going to explain why it matters. I'm going to talk about processes, supervision trees, streams, signals, and then I'm going to present missionary which is the Clojure implementation of these concepts. I'm going to show what makes it unique and how to use it to solve actual problems. This is the problem space. We want to make reactive programs. Reactive programs maintain a continuous interaction with their environment at a speed which is determined by the environment, not by the program itself. So we're going to have to deal with the imperfection of the real world. We want to stay in sync with it and we want the computations to be triggered by events from the outside. So this is very close to what Rich used to call situated programs, and this is what he was saying eight years ago about usage of functional programming in this context. We actually need a program that's more like a machine and we're trying to use tools that are for functional programming to do that and it's a bad fit. So that's exactly what we're going to do. We're going to use functional programming to make situated programs. I'm going to show you that Rich is wrong about it. Functional programming is actually a very good fit for situated programs and we're going to see why. So why is a functional effect system? We're going to start with an example. So we have a service, a remote service. So it implements a rest API and it provides information about products. So we can give it an ID and it will return information about the product. This is the imperative way so we write a function. The function takes the product ID and then we construct the request and we call the service. We wait for the result and we return the result. So to get information about a product we just call the function with the product ID and at some point we get the product information. So it's an impure function. It's impure because it performs effects and that's imperative program. This is the functional version. So we make two changes here. First we remove the bank and then we added this character and now it's a pure function. So when we call this function we have references and transparency. So what do we get when we call the function? So we get an effect and an effect is a description of something to be done. That's the essence of functional programming instead of doing with describe. So in this case we represent the effect as a function. It's a zero-argument function. It's a sync and functions are immutable values. So we can do transformations for the effect and you can use pure function to transform effects. So let's define operators. Pure and vine. This is the bread and butter of Askel programmers. Pure text and arbitrary value turns that into an effect. So it returns the effect that always terminates with this value. Bind is about sequential composition. So we want to do something and then do something else. So it takes an effect in a function. The function is called the continuation and we're going to construct an effect. The effect will first run the input effect. We get the result. We pass the result to the function. The function computes the next effect. We run that effect and the result of that effect becomes the result of the bind. Here is an example. So now we want to call two services one after the other. So we have a product service and a supplier service and we want to get extended information by the product. So we're going to call the product service and we're going to use bind and we're going to pass a continuation that will extract the supplier ID from the result. We're going to construct the effect to get the supplier information and we're going to use bind again and the continuation will return a pure operation. Pure means there's the more work to be done. We can compute the result. And in this case we want to merge all the results together to construct the extended information about the product. This is another operator, par. Now we want to do things concurrently. So we take an arbitrary number of effects and we want to construct the effect that will run all of these effects simultaneously and complete when all of them terminates with the vector of results. This is a naive implementation. It's not a good implementation. We're going to see why later, but it's good enough now. So we can take the sequence of things and map it yearly over future code that will run all of the effects in a new feature. And then we can map the result again with the ref to join all the results. So here's an example. Now it's the same as before, except now we want to call the supplier service and the warehouse service. So we have two services. They are independent so they can be run simultaneously. That allows us to save time. So we construct the two effects and then we use par to aggregate this effect and return the composite effect. And then the same as before, we use bind and the continuation, we're going to distract our vector and merge all the results together. So that's it. That's a functional effect system. What we've done here is replace actions by descriptions. Instead of doing, we describe. So we represent effects as values and we define pure functions to transform these effects. And we build our program as a succession of effects, as a composition of effects. So now you want to ask what's the point of that? What are the benefits? And to understand the benefits, we need to talk about processes. Processes are about doing. It's about using resources to compute the results. So when you ask the operating system for what's currently being done, what you get is a list of process. That's how the operating system reasoned about resource consumption. And actually what we get is not a list of process, it's a tree of processes. And that's very important. That's what we're going to talk about now. So we're going to look at the previous effect, the composition of services. We're going to see what happens at runtime when we run the effect. So here boxes represent processes and a process is a value that is currently being computed. So the top level process is a bind. And a bind runs in two parts. So we're going to spawn a child process to run the first part of the bind. So we get the product information. At some point, it will complete with the results. And now we can run the continuation with that result to get the second part of the bind. And at this point, we can spawn a new child process. And this child process is a nested bind. So again, we can run the first part of the bind as a child process. And now it's a parallel composition. So we're going to spawn several child processes. So we get the supplier information and we get the warehouse information. And at some point, we're going to get one of the results. And now we need to wait for the second one. And now we can construct the vector of results and now run the continuation in that. And now it's a pure computation. So there's no work to be done. So we can provide the result to the parent process and then again to the top level process. Okay, so everything went well here. But now the interesting part is what happens when things go wrong. So we're going to go back in time and focus on the state of the process when we are running the two processes simultaneously. And we're going to assume the warehouse service is down. So it's going to complete with a failure. So what should we do here? Well, we need to propagate the failure. We need to report it somewhere. So we're going to propagate it to the parent process. But we cannot do that immediately because we still have an opening process. We are still trying to compute the supplier information. And it turns out at this point, it's not in any more. So it's a waste of resources. The whole process is crashed. So we are wasting resources for nothing. So we need to conserve this process. And the process at this point must release its resources. So at some point, it's going to terminate with a special result indicating it could not made it to the end because it was interrupted. And only now can we propagate the error to the parent process. And then there is no point during the second part of the bind. So we can propagate the error to the top level one. And now we are ready to make something meaningful with the error. So that's the behavior we want. And actually, that's not what our implementation of PAR is doing. Our implementation of PAR is not doing the right thing because it doesn't discard the pending process when the other one fails. So in imperative programming, it's very easy to not do the right thing in face of failures and supervision. We could write a version of PAR that does the right thing. And in fact, that's exactly what we want. We want to defer the supervision strategy to the operators. That's why we want to use functional programming. Functional programming is about building supervision trees. The idea of supervision is each process must have an explicit parent. A process that interacts with the real world may crash at any point. So when it happens, you want the error to be processed by another process. And this process is the supervisor. And it will be in charge of making a decision about this error. And there's always a right thing to do with the error. At the very least, if you don't specify anything, you want to propagate the error to the parent supervisor and so on. And maybe crash the entire thing at you. Effect combinators are supervisors. When you combine two effects together, the combinator must define the supervision strategy. That's something we don't have in imperative programming. Imperative programming makes it very easy to spawn new processes. And it's very easy to lose track of the supervision hierarchy. In functional programming, it's impossible to do that. So you get structured concurrency by default. That is, you are forced to build your program in such a way that the supervision trees is properly structured. So all effects need supervision. In practice, there are many kinds of effects. But they have some common properties. What we've seen is elementary IO actions. So it's a short-lived effect that terminates spontaneously. In practice, most useful effects are long-lived and they produce multiple values over time. We're going to talk about event stream and signals. All of these are effects that consume resources and interact with the real world. So we want them to have a life cycle, to eventually terminate and release resources. We want them to be able to report errors. And we want them to be cancelable. And we want resources done. That is, we want process to be able to know how to clean up themselves. Okay, so event streams. So a stream is a succession of similar events. An event is a piece of data that described something that happens in the real world. And it's that is currently being processed to be eventually stored to some kind of knowledge system. An event stream is not defined when events don't happen. It's discrete time, so it's only defined when an event happens. It has no intermediate states. Losing an event is really bad. You want to process all the events. If you miss an event, you will end up with a partial knowledge of the world. And in the end, you will make a bad decision. So to represent stream's effect, the effect representation must implement back pressure. If you don't have back pressure, you're going to overflow the buffers. Your machine is going to crash and you're going to lose events. Some examples of event streams, mouse clicks, login trees, database transactions. All of that has no intermediate states. There are things that need to be processed and there are order level time, but there are discrete time essentially. They don't exist between events. Signals. Signals are completely different. The signals represent the state of an identity. As closure programmers, we are usually familiar with this pattern because it's the basis for the epochal time model. So we have reference types in closure. We have atoms, agents, reps, and vars, which are their local database. And they represent identities. And an identity, they all share common properties. They share the read API. So all of these references, you can derive them at any point. They are always defined. So at any point in time, you can take a snapshot and get an immutable value of the state. You can also watch them to get the successive values of the state. Derived computations share the same properties. If you derive a computation from a signal, what you get is also a signal. It's also defined on every point in time. Only the latest value matters. When the state changes, then the new state is the new reference. So you can discard the previous state. It doesn't matter anymore. If you want to represent signals as effects, you won't lazy sampling. What it means is, when the state changes and some readers want to react to that state, then you don't have to send the new value of the state immediately to the reader. You can just tell them the value has changed without sending the value. And then the reader can decide when is the right time to sample the value. And the reason we can do that is because it continues, you know it will still be defined in the future. So we can introduce laziness and that allows us to save work. Very interesting. Some examples of signals. The mouse position. A mouse is always somewhere, so you can always ask for its position. Physical time. You can always ask what time is it. Spreadsheet cells. If you want to know the state of a cell, you can just look inside. Databases. Databases are signals because they always have state. That's why we use database. We want to accumulate knowledge about the world. And we don't know in advance how the knowledge will be used. So we want to be able to poll the state in the future. So we want the database to be always defined. Derived computations are signals as well. So the queries of the database are derived computations. So they are always defined. All right. So now let's get more practical. So this is machinery. It's a closure library that works in closure and closure script. And it's a collection of purely functional operators that work in effects. And there's two kinds of effects. There is tasks and flows. Tasks are effects that produce a single value and flows are effects that produce multiple values. You can use flows for discrete time or continuous time. In discrete time, we get back pressure. In continuous time, we get lazy sampling. Language extensions. So that's how missionaries solve the callback hell. So this is a pretty simple example about sequential composition. We see the beginning of the callback hell here. There is too much nesting. There is only two successful tasks and it's already too much nested. So if we increase the complexity quickly enough, it will become unmajable. And it's a syntax problem. So there are many solutions to this problem. All of them are variations about the idea of inversion of control. So we want to bear a syntax to express this thing. So the solution is to extend the language. So we extend closure with another operator. That's the idea of async await. But now it works in functional effects. So we have a parking operator that allows us to split the computation. There will be syntax rewriting that will turn this code into a chain of callbacks. So we pass a task to this operator and we're going to park the computation until the task terminates and we get a result. So it's a functional version of async await and you have all the features of the language. You just have an explicit operator to park the computation. It works pretty much the same as in Go blocks. But the underlying abstractions are not the same. But the syntax rewriting is the same idea. Now can we await on flows? That's the interesting part. Remember a flow is an effect that produced multiple values. So what does it mean to await on something that produce multiple results? Well it's been done before but it's surprisingly unpopular which is a shame because it's a very interesting idea. So here's an example. Now we want to reuse the effect from before that aggregates information about products. So it's this one and now we want to have a variable input. So the product ID will be stored in an atom and we're going to change the state of this atom. What we want is to keep the product information in sync with the product ID which is the variable input. So we're going to use an AP block. We're going to build the flow that produces successive values of the atom with watch and we're going to use a forking operator, this one and at this point the rest of the computation will be run multiple times. Whenever we get a new state on the atom we're going to run the rest of the computation. And what we want is if we get a new value of the state and the previous value is still being computed we want to interrupt this previous computation and start the new one. We are only interested in the latest value so we want to discard the previous computation. So we're going to call the extended product information effect here and we're going to park on it and we're going to wrap all of that in a tri-cache to recover from interrupted exception. In this case interrupt exception are not failures they are part of the normal process. So we want to recover from them and in this case we want to produce nothing. So that's the point of this operator it's to return nothing and at this point we're going to backtrack to the forking point which is here and run the computation on the new state. And that's it. So it's a functional version of a wait. You it's a language extension so you can use all the features of the language you have macro expansions you have tri-cache we have conditionals you have loop recur and all of that is wrapped in a functional expression it returns a discrete flow so it's a purely it's a reference return parent expression and if you use this flow in a pipeline you you're going to get back pressure you will also have error handling for free you will have cancellation so you get all the benefits of supervision due to functional programming and you have all of that in a syntax that looks like closure so that's very powerful well there's more operators but these ones are more standard and we are running out of time so we're going to see that another day so that's it this is a functional effect and tuning system for closure and closure script what makes it unique is language extensions it's an alternative to monads it's as much as powerful but it's much more expressive so you can express the same thing but without callbacks without losing track of the control flow so that's much more maintainable much more efficient discrete versus continuous time we need both many streaming frameworks support only one of them usually only discrete in practice you want both because some effects are inherently discrete and other ones are inherently continuous and they don't have the same requirements so you need to be able to switch from one to the other and it's useful to have an effect representation that supports both so that's it try it give feedback spread the word currently the API is stable the current effort is about smoothing the refuges and improving documentation it's part of the hypothetical stack we've been using it extensively in the last year we've been implementing a fully reactive dialect of closure on top of it that also works distributed we are very exciting about our findings so we're going to communicate more about that in the future so stay tuned