 This talk today is about really my journey into distributed programming. I talk about distributed programming on a high level, not on the low level of implementing distributed system, but what I'm going to talk about is really using existing frameworks like Storm and Aka and so on. I'm gonna focus on real-time distributed programming. So if you've heard of batch programming, how to do these kinds of things where you use MapReduce to do big jobs like that, that's called batch processing. And I'm not gonna go there, I'm gonna focus on real-time processing. And you will see what that means as I go along. And the other thing I'm going to do and leave for the end is what is the relationship between distributed and functional? If anything, I'd like you to understand Bruce's t-shirt by the end of this talk because this is really capturing the essence of what I'm gonna talk about. It seems almost like a reckless thing to say coming from the old paradigm, but it's actually a very sensible and responsible thing. So we're gonna be talking about very abstract things in a way, so I'm gonna ground it in a very practical, simple example. And in fact, this is the kind of thing that I did. This is the first distributed thing I did. So what I'm gonna talk about is I have an old, I have an existing journal's database. By journals I'm just giving it a bit of domain texture, think of it as financial transactions in an organization. And I want to turn through those journals and analyze them. I want to transform them, analyze them, put some things in an analytics database and put some things in the search database. So I'm really wanting to transform my data. And to give it even more texture, imagine I'm doing this. I'm taking for every journal in this journal database. I'm enriching it in some way. Maybe I'm running off to another database, getting a bit of data and just adorning it with more data. Then I'm gonna transform it and I'm going to save it to the analytics database and index that journal to a search database. So that's the simple use case. So think of this long list of journals that we're gonna turn through and we're gonna move it over to other databases. If there were, let's say, a few hundred thousand or so on, I wouldn't be talking here today. But let's say that you have millions. Let's say you have a billion journals. Then running it in sequence is just not going to, so you're gonna, you can write it very easily but you will spend much longer waiting for it to finish than you did to write. The first thing you could possibly do to make it more efficient is to go parallel. So in languages like Clojure and Scala, you can actually quite easily have a parallel map that will just parallelize your for loop really over multi-cores. So that's actually relatively easy to do. Assuming I'm assuming here that the order of execution of journals doesn't matter at this stage. So I'm saying I've got this billion journals but if I wanna, let's assume that I can parallelize this process and do the analytics and search independently. So in the extreme case, imagine I'm doing all billion at the same time but more likely I'm just going to be able to do maybe 10 or 50 or 100 or 1,000 maybe at the same time. So it's not exactly going to do all of them at once. But parallel execution is great. It's much better than sequential. I mean it's much faster, not better. But it doesn't solve anything yet if you want to do a billion journals. You're not gonna run it on a multi-core. It's just not gonna solve your problem still. So what you're gonna want to do is you want to parallelize it but on many machines. So I just want to set the scene like this and say this is what I mean by distributed programming. We're going to take this job and fan it out over a whole bunch of virtual machines. Quite likely it will be a cloud setup where I can easily create new virtual machines and spread the job over that. So we're gonna talk about how to do this in two ways. The first thing we're gonna look at is storm. So you might have heard about storm and if not, it's all great, I'll introduce it. This is the tag line. This is kind of the selling line, the selling points on the left of storm. It's a real-time processing system. So real-time means it's not like Hadoop MapReduce which does a massive job but it has an ending. It begins and it ends. Real-time means it doesn't end. So why would you want real-time? Why would you do things that never end? For instance, if you saw that talk from INDICS yesterday where they were doing quite serious stream processing, they were working with continuous streams coming in that never end. So imagine financial transactions, journals just keep coming in. They keep coming in. They're not gonna end, your business is running. And so that's what I mean by real-time. This thing runs continuously. The other selling point of storm on the front page of storm you will see something like fault tolerant. And so for now I'm just gonna say what it means is it has a plan for when things go wrong. So you're actually gonna tell your software if things go wrong, deal with it like this. And when I say that I mean you actually have to go into quite fine granularity and say if this happens then do this, if that happens then do that. So that's what fault tolerance means. Fault tolerance doesn't mean you're not gonna have faults. It means you're going to be able to be more resilient to some classes of faults anyway. And the third thing is scalable. So storm would say this is a scalable solution to the distributed real-time problem. And what that typically means is this thing can be spread over multiple machines, not only multiple cores. So if I wanted to take my very simple little job of enrich, transform, save, index. I could represent it, I could draw this little picture and I think that picture should be quite self-explanatory. I can draw this sequential piece of code like a graph. So I can go next journal, get me the next journal. Enrich it, transform it. And at that point I decide as a design decision I can save to the analytics and the search in parallel. So that's why I've forked from transform. So I'm literally just drawing that piece of sequential code which you guys have been, all of us have been writing tons and tons of sequential code. So I'm just drawing it like a graph now. And having done that in strong, this is the kind of the things you have at your disposal to actually draw this kind of graph. So in strong, the source of data, so I have a next journal which is just gonna spew out one at a time, pop out a journal. It's like this little fountain that just pops up this billion over time sequentially journals. Of course I might have, let me pause there. So I have this spot and then it only, the spot in storm says I'm going to emit stuff. So it's this little thing for now. It's this little widget that can emit data. And then in storm you have a concept called bulbs which is just like a spot but it can take something in and emit something. So our graph now looks like that. We have one spot and the rest are old bulbs. And in storm the data model looks like this. What happens, what gets passed between these little blocks of code is called a tuple. It's really just tuple as in any language you know, it's a fixed length data structure. So you're saying, so I might, in this case, I might design it and say in my tuple I want the journal ID and then I'm going to have some journal data in a hash map object. And so that's what you see on the right there. You can see that after I've enriched it, I've got a new amount there and I've, so I've done something to it already. It's a new kind of thing that's being emitted there. And then after the transform, there's a new value there that I've added. So this thing keeps changing and what those tuples looks like is my design decision. In this case, I leave it very simple like this. Tuples can be any primitive type as long as it's serializable. And if you, you can write your own serializers for your own structures. This is really, I just want to show two closure slides. This is really meant for anyone in here who knows closure, but if you don't know closure, what I want you to see is just how it works and that it's really just a little piece of code. It's a function. It's not a peer function, but it's a function. So I define a spout like that is special DSL for that. Storm is on the JVM, you can use any JVM language. I'm using closure. Storm is quite a good deal of it is written in closure. So it's a nice language to use for storm. What I'm doing here is the spout I'm saying, so the storm runtime is going to keep running the spout to pop out these little journals from the fountain. And that's my next tuple part. The next tuple part there is just saying, every time the runtime runs me, run this piece, and I'm just going to emit, I'm going to emit journals. So I'm going to leave it at that. The bulk looks like this. So it's just a piece of code again that the storm runtime will, every time it receives a tuple, it will call the execute block, and the tuple will, the bulk will emit the transform tuple. That transform tuple there is just a regular piece of code that you would write as is, even in the sequential things. Yes, I'm going to give you a very short answer and just keep moving and we can pick it up later. The sleep what I'm doing is I'm creating a pulse because the storm runtime is going to continuously run this thing. So it will, I want to create a little bit of a pulse so that I can have a more predictable flow. So it's quite common to create a little pulse. So it's a fountain, but I'm just saying sleep a bit so that, in fact, you would more likely sleep after the emit, but okay. So you can create a pulse quite easily like that. Very idiomatic in storm. So that was our bolt. And that I really just wanted to show you to give you some texture of, it's just a little piece of code, nothing special. So in storm then I have this graph and I have to arrange it into a, what is called the topology. The topology is the graph. It's the connection of these spouts and bolts. So that is, so this is something you have to specify. So the first thing that makes, that is quite different to what we're used to is in storm I can say that P stands for parallelize. So I'm saying the next journal, I want to run just one spot, but run three enriched bolts for me because enriched is a heavier piece of code than simply emitting a journal. And so this is a design decision and it's a major thumbs up also. It's not obvious what parallelization is optimal for your problem. It's trial and error and experience. Again I'm saying transform is a tough job, run three of those in parallel. And the others I'm saying run five of them in parallel. Now the moment I do that, if you just think of from next journal to enrich, if the spot has three bolts that it can emit to, so there's three bolts now, there's not just one. The spot can emit to three bolts. Which one should it emit to? So this is called grouping. So now you have to say, how do you want to distribute over these little bolts in parallel? So I'm saying with shuffle is one way to distribute over a few bolts. So I'm saying from the spot, next journal, shuffle randomly over the bolts, over those three, so that I get an even distribution. So the runtime, I'm giving the runtime maximum freedom to run this optimally. There are other groupings instead of shuffle, you can say, take all journals with an ID less than a thousand, I mean this is a terrible example, take all customers in this class and keep sending them to that bolt. And take all customers with this class and keep sending them to that bolt. So you can really, you have to design the affinity per bolt. The moment you do that, you're actually making the parallel execution less efficient because you're making it harder for Storm to maximize throughput through all the bolts. So given the data, let's say all customers with this kind of flag should go to this bolt, given how many customers you have, your data will actually shape the flow through the distributed network, through the graph, and that could actually go quite pay-shaped. So you have to be quite aware of what is the relationship with my data and my grouping. With shuffle you don't have to think about it and I'm gonna park it there, but with Storm you can do quite rich in memory crunching with using other field groupings. It's a big topic that I'm emitting here but I want to show you more things. So now I have one spout running, I have three in rich bolts, three, five, five, and now something goes wrong in one of the analytics saves, one of them bombs. So this is where we have to start talking about fault tolerance. What do you do? What should Storm do when something goes wrong? So something can go wrong in distributed programming. Interesting thing is that if something takes too long, it is considered an error. And you how long you must decide. So I might say that if save analytics doesn't happen in 500 milliseconds even, fail, something probably went wrong. So these are design decisions. So a timeout and a hard fail, a bug in your code or a timeout actually could look quite similar. And generally in distributed programming, often you can't tell the difference between a real failure and just not hearing back from something. Now if I'm, so what's gonna happen here with Storm? If that failure happens is Storm is gonna rerun the whole from the source little tuple for that journal that's gonna rerun the whole thing. And you can say rerun this a few times. So obviously if there's a code, a bug, you can rerun this until you're blue in the face. You don't get fault tolerance, you're gonna get nothing from it. So this is, this kind of rerun strategy is useful for when you've simply lost a connection or a path to another server for a bit and the rerun will actually solve it. So this is the kind of resilience that you get when we speak about fault tolerance. Now if you're letting Storm the runtime rerun things, this is out of your control. You've actually relinquished control of this. Then of course you have to be very careful with, so if I do analytics save, if I, when I write analytics save or search any of these things that I write, any of these spouts of bolts, well the bolts particularly, I must make them idempotent. I must make them so that if they rerun, they don't redo something or, so to do that can be quite complex. You need unique identifiers quite often for that and that might not be so simple. So, but in my case, in my use case, it was pretty trivial to make sure that things were idempotent. You just make sure that you don't save something again if it is there already. Now with Storm, I've made like, we can actually parallelize the whole thing. But in some modeling, you will find that some pieces of what you're gonna do, you want to sequence. You don't want to parallelize it. When I say sequence, you want to, you have all this parallelization and then you wanna join them all through one single pipe again. And so, so earlier I said Storm will rerun your, your code as much as you, as much as you like. So you'll typically rerun it a few times in a given time period. But, so that's at least once semantics. It's called run at least at once. Meaning it will run a few times potentially, but not less than once. Storm is gonna keep trying to run it at least one time. Now that, if you can't afford that kind of thing, if you can't have idempotence, then you can also move into run once semantics. So you can set up Storm to say, run this only once, don't run it a few times. And you can choose to have more strong ordering in parts of your topology. So the moment you do run once, you're reducing performance because it's literally like, you're making it more expensive for the run time to make sure that it only runs things once. So this is a choice. It's a design choice. It's a dial your turn and the more transactional you go, the more sequential you go and less, to some extent. Zooming out a bit, what you typically do with Storm is you would have a bunch of different kinds of topologies and you would connect them with queues. And so that's kind of your meta-building block, is a topology, and you will just arrange topologies in a meta-topology in a way. Stream computing and Storm in particular but stream computing is typically pitched at this kind of problem space. So running through a billion journals and doing something to them and moving them over is what I would call stream processing. If I wanted to, Storm was originally written to do Twitter, in memory, trending and top 10 and what's trending, those kinds of calculations. And that would be real-time in-memory computation. And then you can also do distributed RPC. You can actually, if you have a web app with a very heavy job, you can get a HTTP request, take that thing and just fan it out over a whole bunch of virtual machines and it will call back with a single result and you can return it to the client. So that's quite a powerful thing in some cases. Zooming out, I started using Storm about two years ago. I used it for one project, but at the time and since then, it's just exploded, the space is exploded. I'm sure some of you know as many products that are on this that I haven't added, there are many, there is so much happening in this space. So this is obviously something to pay attention to. This is a problem that we're really trying to figure out now, collectively. And what's something that has come up kind of crystallized is the lambda, what's called the lambda architecture named and coined and phrased by the author of Storm, Nathan Mas. And the basic idea of it is this, that real-time processing is all very well, but there are some things that you, the main bulk of the work. So this lambda architecture strategy is you do the bulk of your work in a batch job. So new data goes into the batch job, it gets mapped, reduced that style of processing and you feed it into the real-time system. And so you have the real-time system has the latest provisional entries and the batch system will have everything but it'll be stale, it'll be a bit stale. Because real-time is more real-time than batch, because batch takes a while too. So as long as the batch process is running, that's how stale it will become. And then, so this is the strategy you split between real-time and batch, and then you have a merged view to do your queries. So you don't query only on the batch, you query on the batch and real-time and you somehow have to create a merged view and over time the real-time stuff gets fed into the batch so everything eventually gets into the batch. But there's always a bit of real-time potentially that hasn't been batch processed yet. This is a deeper topic, I'm not gonna go anything further into it. By the way, I met Siva yesterday and Lambda Architectures made it to the cover of Healthy Code for October's edition. So check it out. I'm gonna leave the storm discussion there and look at Aka. And you get this similar kind of terms being thrown around. Aka is real-time, it's fault-tolerant, it's scalable. So real-time, like storm, it runs continuously. You could use it differently, but it can totally be used like this. It's fault-tolerant. The fault-tolerance is now, this is where the difference comes in. And it's using the active model. The active model was published first in 1973, it's been a while. And then it found its way into Erlang and Elixir is then a further-done descendant of that. And then Aka is a really, it's become quite a strong contender for the active model space. And scalable, same thing, the terminology used here is scale up into cores, scale out over VMs. And Aka is also quite elastic. You can throw in VMs and it will fill the space. So as big a space as you give it, it will fill. So to describe the active model, I'm gonna start with OO as we know it. So, imagine we have a class account and it's got a balance and I can add or remove, deposit or withdraw, that's the thing. It's just a class that you've seen a million times. And imagine that piece of code I'm gonna add, add, and then remove. Now, if I wanted to run this in a parallel distributed way or just in a concurrent way for starters, let's say that these ads and removes are just happening in an indeterminate way, really. You don't know which one is gonna hit first. And so obviously, as you can see, classes are not made for concurrent access like that. And the moment you get concurrent access, you need to work for it. You need to explicitly, manually make that happen. Now, what we really wanna do is we wanna sequence these ads. We don't want in a concurrent setup for deposits and withdrawals to be happening in random orders. And we don't want them, we don't want to protect that balance variable with some kind of mutex or semaphore. We wanna outsource that. So that's where the active model comes in. So an actor is, that's the actor version. This is Scala. Sorry, I didn't mention that. Acca is written in Scala and it uses quite broad and deep aspects of Scala. So it's really nice to use it in Scala rather than anything else. So the class account becomes a Scala class account, but it extends the actor. I still have a balance. It's still a variable. That's interesting. So we have mutation in a sense. We have that variable that we had in the object in the class. Now, instead of an add method and a remove method, we have a receive method that responds to an add message and a remove message. So instead of with an object, I can call an object and say, I can call the add method with an actor. I'm gonna send it the add message. And so that's how I would model it in the actor model. The thing with this now is, I've changed the syntax there. I've said account bang add 100. Add is a message can really be anything, absolutely anything immutable. Must be immutable. So already you can see you need to do this kind of thing in a language that can safely guarantee immutability when you need it. So I'm saying account send message add 100, account send message add remove and so on. These get queued into the actor's mailbox. So where with the class or the object, I could just access it raw and there was no protection for the class. If I would have to explicitly write the protection for it. In this case, all the messages are actually being queued. So there's this little queue waiting for the, every actor has a little mailbox it's called and it just receives, you can hit it from a thousand different ways and those messages are all gonna be nicely lined up for that actor. So if we wanted to do this kind of thing in Aka that we did in Storm, I'm gonna show you how to do that. Now, before I do, I just wanna say that you wouldn't do it like this. You absolutely wouldn't do it like this. This is a naive solution, but I wanna show you the low level way of addressing this kind of thing in Aka and then we can talk a bit more about how one could perhaps do it. So going from the journal, the next journal, the journal generator to enrich. So if I wanted to mimic that idea of I've got one source for journal and I have these three enriched pieces of code and I wanna randomly distribute over them. This is how I could do it. I'm gonna just give you a high level of the code. It's not important to get the detail but we have an actor for our spout. Our spout has become an actor. I'm choosing, instead of just saying shuffle, I have to say, okay, I'm gonna do, I need a router because I'm distributing to several actors down the line and then here I'm saying last line receive the next journal message and distribute it randomly to the enriched actor, to an enriched actor. Let's leave it like that. And then further on, the enriched actor is gonna receive an enriched message with the journal and it's gonna do the enriched. That enriched might be exactly the same code that we used in the storm, same kind of code that we used in the storm topology and then I'm gonna send a message to the next thing in the next actor and I might there just every enriched actor will send to or create its own transform actor. And so then I, so at every step of the way I can choose now in the actor model exactly how do I want to create this topology and for a bit of crazy that would be the whole thing I might decide to randomly distribute over enriched each enriched creates its own actor and they do in a round robin fashion distribute over the actors further down the line. So to just keep this high level here you get the same kind of rooting. So where we had grouping we have rooting now. I can build this hierarchy instead of a graph. I have a hierarchy. This is a hierarchy starting with next journal with kids with kids with kids. And so I have a hierarchy of actors and things will go wrong. So let's say again that analytics save bombs for one of the actors down the line. Now there is a concept of supervision. So the transform actor might be the supervisor. Any actor can supervise is responsible for the actors it creates. And so transform is a supervisor for that analytics and in the transform for any supervisor then must decide what is it gonna do when things go wrong. It can do several things. It can resume as if nothing happened hoping that just retrying will do the job. It can restart, it can stop or it can escalate and say I don't know what to do with this. Deal with it higher up. This is your design decision. So this is a whole aspect of design that you'll be doing for this kind of distributed programming. There's two strategies also you might say that if any of my kids fail, just do this one thing with this kid. Resume it or start it or stop it. Or I might say given the problem space if any of them fail, fail all my kids, all my kid actors because they might be entwined enough in a way, they might be sharing a kind of, they might be entwined enough that you feel that if one fails you better just restart them all. And so just on a high level I just wanna talk you through this and say I've chosen one for one. I'm saying if the one fails just keep it at that. Don't kill all the kids. And if it was this exception do this. If it was that exception restart. If it was custom and not found, stop. If it was, so you're constantly now on quite a low level having to decide how you wanna deal with exception. So I'm gonna pause with the ACA details there now and zoom out a bit and say let's look at the, let's talk through the differences here. So in the object oriented world with classes we have state and behavior and they're quite entwined. In, and we have shade state. In the actor model you have state. We had that balance variable in that account object or account actor, but it's local state. It's completely isolated. And so the actor model kind of one of the premises is shame nothing. You have your own state. So it's kind of somewhere in between almost functional and oh-oh. You're in this place where you do tolerate state but you only tolerate it on a very local level. Which makes for a very powerful model. With objects we communicate with methods. Here we communicate with messages. Methods have no protection in terms of synchronization. Actors are synchronized through the mailbox. The messages are synchronized through the mailbox. Synchronous in oh-oh becomes what's called fire and forget. So these messages, if you send an actor a message you immediately get a return. So you don't call an actor message and wait for a result. So in oh-oh you call, you ask and wait for an answer. In actors you tell. You say do this, end of story. If you want something back then you can simulate a synchronous call but you would more likely just send a message to some other actor which will then deal with it. In oh-oh things are typically an object. You always pretty much know where your object is gonna run, what VM it's gonna run on, what machine. In actors, actors have addresses and literally they are location transparent to the extent that you want to actually make them all as much transparent as you can so that they can run anywhere on any of the VMs in your little cloud. So when we're designing with actors, it's quite similar, I mean it's in some ways quite similar to oh-oh design. You're still thinking about single responsibility. You're still keeping things focused and tight like you would try to do in oh-oh. And you're still trying to find the right granularity in organizational things but the things are just different. Now we're dealing with messages. Do I have one message in the same way with exceptions? Do you have an exception, runtime exception or do you have much more specific exceptions? Same thing with messages. You want to find that right level of granularity for your problem. Same with hierarchies. The actor hierarchy is the scope for failure and dealing with failure. So you're constantly having to think about how do you want to create that hierarchy and failure zones come out of how you design hierarchies. And even within a hierarchy you can define failure zones, tighter failure zones or broader. So these are the kinds of things you're gonna have to be thinking about. With actors then, firstly actors is a much lower level thing than a storm. But we can do these kinds of things with actors. We can do work distribution, which is what I just did. I said, I don't want to run this on one machine, one core. I want to distribute this job. So actors is, actor model is great for that. But also what people do with actors is they write domain-driven apps. Actors, entities get modeled by actors. Domain-driven design terms I'm using here. Aggregates would be modeled by little actor hierarchies and you could then, you know, if you did an event sourcing kind of model use domain events to, or messages to model domain events. So now zooming out, stream computation. Storm does just stream computation. It's quite high level, very high level. And all those other stream computation things that we saw, they're much more high level than the actor model. So in Storm, I just said shuffle and in Aka I had to set up a router and distributed randomly. So there is, if you're just doing stream computation you do want to go a bit higher level than stay with the actor model. But of course the actor model is lower level and much more powerful and general in its capability. High level, Storm is much more high level. Topologies in Storm, relatively static. I mean you can change it but you have to redeploy it and it can take a few minutes, not a big deal. But the topology once you create it it's like an engineering piece. It's like you're taking a fluid from this pipe and you're splitting it into these different pipes and it's almost like a physical structure that you're creating that is gonna stay like that for your run time. Aka's, with the actor model is much more dynamic. You're creating an actor here. It's completely ad hoc and arbitrary and it's up to you how you wanna arrange things. Instead of a graph, the Storm graph was directed. It only flowed in one direction. Actors is much more general. You can go two way. So it would be like a bolt sending a message back to a spout, which wouldn't make sense in that context but that's what I mean. And then spouts and bolts are, generally you wouldn't have gazillions of them. You would make your little graph and you might run a whole bunch of them in parallel but even still you wouldn't have millions of them. Whereas with actors, very lightweight and you could have, and you typically do have many, many more actors than you would bolts and spouts. So there's this thing called the reactive manifesto that came out of the type safe crowd. I think Jonas Bonaire originated it. And it goes like this. It's a manifesto that is being stated beyond any of these frameworks that I've talked about. Even though it came out of Aka, it's kind of a generalization and it's saying let's write things that are like this. And so this is another way of saying scaleable, fault tolerant, saying all the real time, perhaps this is the interactive thing. So we can see that there's a broader thing going on here. One of the things that has come, the first things coming out of this is what's called Aka Streams. It's quite new and fresh and I don't know if it's quite ready for use commercially but if I were to do a streaming application today I would probably, I would try this. I would try this right now. So this is an attempt at creating a JVM standard for async distributed stream processing. It's quite opinionated about how it blocks. You can look it up if you're interested in that. So we've talked about time. I mean I haven't explicitly talked about time but when we talk about time in distributed programming or ever really, we're not so much concerned about time per se as the relative order of events. So when we're talking about time, really what we're trying to do in these styles of programming is to minimize the enforced order. So I want to parallelize as much as I can afford and I want to minimize reducing to single pipes and having to sequence through single pipes. When we talk about state then in the sense it's the change of state and the scope of that change that is the problem that hurts the most. So first thing is we want to minimize the change and in actor model for instance we keep the scope of that change incredibly tight. And the messages are immutable and the mutability has kind of receded down to the boundary of the actor. And fault tolerance, the general idea is embrace failure what was that T-shirt of yours? Basically that, right Bruce? Failure becomes a first class citizen. We don't treat it as an exception. I mean literally we call these things exceptions in the old paradigm but this becomes hopefully not the norm but it's not something we don't expect. So this is, if we talk about distributed and functional programming I thought it could be interesting to look at it through this lens. This is something concurrency oriented programming languages is something that came out of, that I came across in Joe Armstrong one of the Erlang creators, his thesis he writes about this and kind of just zooms out a bit of Erlang and says this is the broader thing that we're going for. We're trying to create concurrency oriented programming languages and so if we look at that versus functional programming the paradigm, functional programming is very helpful with the concurrent bit because of immutability because of the rich concurrency tools you have in that's my time, almost done. Functional programming quite useful for concurrency and the scalability part because you've got parallelization going on there so functional programming is addressing and helping us with those, the first and the third one and then with fault tolerance the actor model has its view of things and these stream computation engines have their own views of things. So functional programming is not necessary for distributed programming. In fact distributed programming preceded functional and there's a lot of distributed programming and frameworks that have nothing to do with functional but it's pretty clear that functional has a lot to offer but language isn't gonna do it. Any single functional language is not going to do it yet anyway. We need, for instance the actor model stands in for something beyond the language. The storm runtime is something beyond closure or the JDM so distributed we need something more than just the language, we need some kind of more structured opinion about things. Finally I wanted to just end with this slide. This is actually, I wanted to end with my own ignorance. This is something that I'm very interested in but I know very little about. It's something that was, this paper was published in 2009 and I'm sure there was earlier work in this but the idea is that you have data types that eventually converge without any explicit concurrency management. This is very akin to what Kishore mentioned in his talk yesterday with the monoids. He was looking at these monoidal and they were using these monoidal structures which were converging in a way that didn't get you into the same trouble. So it's something that isn't useful for everything but it's for some problem spaces. These kinds of tools then could take us into a place where we have to worry even less about concurrency and we just say the data type has these properties. Just send them out into this distributed world and it will all be great. It will all converge. So that's where the tag asset 2.0 comes from. So asset meaning the asset from the transactional database world. Of course we don't have asset in distributed because we're converging always towards some kind of consensus. But asset 2.0 is then an attempt at speaking about a more kind of mature kind of asset. So that's it. Thank you. Any question? Do a transaction. If you wanted to do a cross active transaction then you're gonna be in a different kind of space where you're gonna have to solve it a bit differently. But so with storm or so you're kind of, you can have your little transactions but the scope of them are gonna have to be much tighter. And there are some problems that aren't gonna be easy to solve in this way. Irrigation. Yeah, yeah. I don't understand your use case well enough but my gut feel says wants to say yes but I might be wrong but I wanna say yes because it's so flexible. I would be surprised if you cannot do that. What kind of tools? Like do you use like maybe Vagrant or? I didn't mention that but with distributed programming one is essentially flying blind. So you need glasses, you need flashlights, you need all the help you can get. So monitoring becomes number one priority. So you would be watching those mailbox sizes like a hawk and if a mailbox gets, if the message is on being processed you would want to jump in there and do something about it perhaps. So yes, you need to watch this because essentially it's for me it's a bit like it's our code is growing up when it goes distributed. We, local code you can keep on a very short little leash and you can redeploy it a little quickly and so on but with this it's like a dog on a very long leash and you can get up to much more trouble. So you're giving it a lot more agency and so monitoring becomes major, majorly important. So that's a big thing. Whatever you, whichever tool you use here you can be sure that monitoring has been not so much addressed and taken care of but has been taken seriously at least. Say the bolts and spouts are bigger but in total I wouldn't say that storm is heavier but the building blocks are bigger. You would typically just use less. In your experience what is the memory footprint like on an actor model? Well I mean the actor model it's published I think it's a few hundred bytes, 300 bytes per actor is the footprint, the basic footprint and then storm is more and not terribly much more but I can't actually remember it's a few times more. Both are based on the thread pool mechanism is it or is there a different mechanism it uses on the JVM? Sorry? Does both of the frameworks use thread pool mechanism or can we control the thread pools? It's just the last word I'm not getting. Thread pools. Oh the thread pools, yes. So with actors you can get much more explicit about how you want to play in the threads. Actors get, actors share threads. Bolts and spouts don't typically. So that's also why they're heavier. Bolts will typically have its own thread. Actors share a pool but even there you can play but so actors are much more lightweight like that also. Thank you. Cool.