 So, well it would be unfair if you have a conference on failure scale and we don't talk about microservices. You know, I've been sitting here wondering why is no one talking about it. Anyway, so we'll talk about it. So, this is about, I'm going to talk about microservice architectures, design patterns that you consider while building a system using microservices and how a framework that we call Gilmer is going to help you do all of that. First of all, who am I? I'm Piyush Verma. Sadly, I'm not Walter. The, my Twitter handles me's on 10. You can write, drop me an email and in case you do want to know more about my website, you can check that out as well. So, let's start with how many of you involved in microservices or intend to do microservices in your architectures? Okay, that's fairly a lot of people here. It would make a good talk then. So, what exactly are services? This would be a quick recap of the rest of you who didn't raise your hand. Actually, first of all, let me see how many people can raise their hand like in general? Oh, cool. So, everyone can raise their hand here. Now, I just wanted to make sure the maths do right. So, what are services? Services are logical entities that run separately from code units. So, think of them like processes in your operating system space. The idea is that they don't interfere with each other, each one of them being absolutely narrowed down to the lowest possible thing. So, I was having a word with Anand yesterday and he gave me a very good definition of this. He said, so, we as software engineers, you know, keep, we don't have an end to our software. We keep writing software and realize, okay, it doesn't end. But the whole microservice paradigm has actually changed that we are going the other way around. How much can we trim it down? Where it does nothing beyond a few bunch of lines and that's where we call it an end. So, that's what a service would be. So, as I said, they're independent of each other. They can be comprised of one or more services like they could be a service which does addition, multiplication, subtraction, or could be one which does all of them put together and calls the other services invariably. Services span across multiple servers for load and tolerance. And this is the part that we have been talking most about at this conference on how do you handle multiple servers and the services comprise of many servers put together. So, you could have like tens and hundreds of notification servers and one service which is responsible for all of that put together. Then services help you scale and are flexible. What do you mean by that? Primarily is I mean because it spans across service it becomes scalable but also the fact that you can do it in multiple languages. The same service could be written in Go. There could be another service which is being written in Java. There's another one which is written in Ruby and all of them comprise to make one system of yours. And that allows a service of rented architecture really easy to adapt to and really lucrative as well. This is something that I found on internet and I was stuck with it like this is like an amazing definition of why we need services. Everyone fails. I mean we've talked about it repeatedly in the past 36 hours and but as from the management perspective we need to know who do we blame it on. Might sound funny I mean the managers are not the bad people because on why it comes into the picture I'll explain. So let me actually do this the other way around. Let me tell you what the management problem is. So let's say you have a payment service now. Usually how we design our systems is we have a monolith which comprises of a bunch of interactions happening and we usually get by saying that we are a small team and we cannot really afford to have multiple distributed servers and we're going to make it in one single thing. So there are three people who are actually contributing to the same service. Let's call it a payment service. It's 1 a.m. in the night and something crashes. I bet you it will crash one day or the other. Who do you call which one of the three people is responsible for it. Now to avoid conflicts like this is where I say Microsoft is architecture has a lot of business value. This comes from Netflix School of Thought as well which is you few own one component of the system. You are entirely responsible for scaling it out for deploying it for maintaining uptime and for troubleshooting and fixing. If there's something that you cannot do you might as well offload it as another service to another someone who will be responsible. If your service is not spitting out errors when it should it's your problem. This accountability is something for which I personally prefer Microsoft is architecture above all the other things. I mean infrastructure problems can be taken care of. I mean of course they cannot be at the scale of large companies like Facebook Google etc. But to for the most of us we can do tackle it. I mean there are dozens of tools and frameworks out there which help us do it. But this is one single problem where we'll all fail every time and then the software problem is scalability. I'll cover this part later when through my talk on what exactly by what I mean when I say scalable architectures. Now just like any other computer science problem or any other thing we have design patterns to any Microsoft is architecture. So how do we identify them. Let's go through one by one. One of the foremost important design patterns is a request response architecture. This is what we traditionally have known as what we do with the city. A lot of people who use how many of you have used go kit or use g rpc in some form or the other in production. OK that's some some digital ocean guys OK a fair few other hands. So these are so request response is primarily where you actually I mean as the name suggests you request something. And you would get a response. Let's see what a sample would look like. I have a caller and I have a service. I send hello and I get aloha. There's a response back or I could get a wrong number. Whatever be the case I'm always assured that there will be a response. Be it a good data or be it a bad data. Now comes a tricky topic here. How do you differentiate a failure. When you get a response over here at first I got aloha then I got wrong number. How do I differentiate what is good and what is bad. So usually the essence is that every response comes along with the code. This is what is CTP status codes is also known for. So if you get anything which is 200 it's good. If anything that is greater than 400 or equal to 400 is another condition. Now this is important because we don't have to really parse the output every time and see if it's really adjacent really a hash of an array or a string or something. Now if you're using Gilmer now this is the time when I actually start using some code samples. Most of these are written in go but the framework is pretty much available across Ruby and Java as well. So I'll just I mean most of this is just pseudo code sort of a go make start easy. So every data transport that happens using Gilmer always has a sender ID. The importance of sender ID is I'll touch later but just to uniquely identify every single message that is being sent out and also it returns a code. The code if it's greater than 400 we know it's an error otherwise 200 is good. And then there's a sender the data is any interface that you intend to send back from the service. A typical request response structure done using Gilmer would look like this. I create a new message and I attach a new data to it. I create a request and then I send it when I do G dot request. So over the course of the stock every time I say small G that would mean a Gilmer instance and a capital G would mean the namespace or a class whichever you want to say it. So when I do a G dot request the first argument to that is echo echo is a topic. So all communication exchanges unlike STTP based transports will do our topic exchanges. So this is one thing which is inherently different from conventionally when we say macro service architecture. We always say that there is some URL or endpoint that we're going to hit. There'll be some IP address and then we'll path back into it with Gilmer. We all do it. We do it over purely non STTP transport and that is where readers comes into the picture. We use readers underneath. You can have your own choice of backend as well. That's really pluggable. So yeah, the choice becomes of a topic. So echo is a topic where you send your data and you get a response back. Now on the other hand on the server side or since there is no server on the service handler side, there's a new handler that you create and you say I'll reply to echo with this function is then the function basically just sends a data and says Pong. I'll do a demo of this later so you can just just quickly go through this code. The one of the most important patterns apart from this is asynchronous or signal slots. How many of you have used cute one of the core KD libraries that used to be around. I mean still around but was fairly popular. Okay. How many of you use JavaScript? Okay. Good. So every time you see the event pattern which is running in your browser is pretty much running on the same concept. So there are signals and there are slots a slot is a receiver for a message. So if someone who has to get something done emits a signal and says and has almost no control over whether a slot would be received or not. Sounds like there's a loss of there could be a potential loss of data here. But this is pretty much one of the most flexible and scalable architectures. I'll explain now. So let's say we design a sample shopping cart example. So and the Gilmer is an engineer. The web could be anything which is a web client or it could be another service calling another service for the sake of convenience. I'm assuming that this is this is somewhere hidden at your back end where after a purchase was made, you're supposed to send out notifications. So an item was purchased and now you're supposed to notify the user based on this and SMS notification to be sent out and there's an email notification to be sent out. And suddenly a few months weeks later, your management realizes that oh, we're going to make this app mobile only. And now you have to add a push message server to it as well. Why would you do that? I don't know. That's a business decision. Now each one of these service the services that you're bringing up now will listen on a topic item dot by an item dot by is now there's a star next to it. The star is pretty important because every time you're talking about UDP like systems, there is a notion of broadcast and that translates as wildcard topics in subscriber publisher patterns. The so each one of these services that you have actually is listening to item dot by star. So if you purchase an item, which is item dot by dot 500, all of these would get it because they are subscribing to a star pattern. Now let's say there is an online carnival going on and now you have to attach a new service. This service is responsible for payback points or a free delivery or a free good or anything something which is really unique to a particular set of items. So you attach a new service which is VIP item purchase and this one only works on item dot by dot 420. Now the next item that is purchased if it was again item dot by dot 500, since the topic does not match this service is not going to handle it. Whereas the rest of the three services that are currently running out there will continue to work. Now let's say you have too much of traffic going out on the push notification service. This push message server you have to scale up. So you keep adding more and more servers out there and since all of them continue to listen to item dot by dot star. Every time you buy an item, the message would be relayed across to that service. However, this will raise a challenge here. If you notice this would result in if I had three servers there, this would probably result in three messages being sent out. So I still have to trap that. So how do I do it? We include we introduce this concept called exclusion groups. How many of you use Kafka here? Or have you so you must be aware of what groups are. So the promise here is that in this group, if I tag it by saying a group push message, only one of them is going to receive this message. So even if I scale it to say 100 nodes, only one of them is going to process that message. This is an example of how you can achieve function point scaling through asynchronous patterns. So despite me and now if I was to scale a SMS notifier, I would do that as well by just adding another server there. This is a typical example of an asynchronous pattern that you will implement using Kilmer. The signal part is fairly easy. If you look at it, it says G dot signal, you specify the topic that you're sending it to. You construct a new message, a set data, which is a string on the receiver side. You have a G dot slot which says I'm a slot for this log example dot log and there's a handler next to it. Now if you notice carefully, there is no response here because since there could be a broadcast happening as well, there could be so many of the servers who are processing the same message. A response doesn't make sense. So if you have to think signal slots, think UDP on how you would design a system like that. The question now arises is what do you, which pattern do you use in your application more? Would you prefer using a signal slot or would you prefer using a request response pattern? The answer lies in, I mean, as a usual answer is entirely depends on your application use case. But let's start with the request response. Where would you use it? You would use it where you want a confirmed delivery where you're sending a message and now you want a response for it. The response could either be an error or the data, which is a valid return, but there has to be something. A request response pattern is something which is purely STTP like. So traditionally, whatever you've been thinking of a server architecture where you call a URL and you get a data is what you would use. And also the delegation and the responsibility of the error is what defines it the best here. If your caller is responsible for carrying out an action based on an error condition, which could be you try to make a payment and the payment failed. Now what happens next? Is this something which is which the caller is supposed to retry or call another alternate? Now the responsibility is for the caller, you would use a request response. Where would you use signal slots? Well, it's absolutely UDP like if you think about it, it supports broadcasting. So if you have multiple receivers for the same message and you want to send it out a fan out operation is something that you would use signal slots. Wildcard topics when you want want utmost scalability where you really think that there's a there's a caveat to it that because there is no response being sent from signal slots, you can never really guarantee whether the message was lost or transit or the server just died or something happened. So if you believe that at an infrastructure level, you can really do a good job of maintaining servers which are constantly alive. You would use a signal slot. Unreliable delivery when you're okay with this fact that there could be someone who is not receiving this data, you would use it. And the receiver is responsible for error. Let's say I'm relaying the error, but the service who's sending it does not really care about what like let's take an example of log forwarding. So from the application servers, you're actually emitting out logs. What you're using there is mostly UDP. So you use how many of you use ours is log D or services like paper trail. Okay, that's a you guys rest of you should use something logs are important. But if you're using your your UDP, your application doesn't care whether your log reads to the eventual destination or not. All it does is it just keeps spitting out log messages. There is a receiver at the other end, which decides now what to do in case that the log was not sent. Do I buffer it? Do I retry the file? All that is a responsibility there and hence you would use a signal slot. One of the bigger another challenge that these days is pretty hot topic. I would say with all these any of the hashikov guys out here. There was one yesterday. So there's a tool by them called console and then they have zookeeper. How many of you use zookeeper somewhere? All right, root 53 for load balancing and discovery. So service discovery and load balancing is a great topic to talk about. So let's see how service discovery actually works. So let's say you have a notification server and now you have a load balancer on top of it. The first step is we that you have to tell the service discovery that hey, I'm around. Now this could happen using anything. I mean, anyone ever use fire DNS here? A decade old DNS tool. So it's pretty much your in-house root 53 replacement. So you would either register your server and say that hey, I'm around and I'm supposed to do this. This is my responsibility. This is my path. Or well, you could use something like console which effectively does the same thing internally. Now, beyond this point, there'll be a constant exchange between the load balancer and the notification server. It will keep saying, are you healthy? Are you alive? Now the replicated load balancer itself is something that you have to scale up and you have to maintain the uptime of it. So this is any of the ops guys out here. You would know that I mean, they're all of their tools to actually handle this, but this is separate infrastructure that you have to maintain for yourself. Now, if I had another notification server to this, I would run through the pretty much the same process which would always be listened to. And in case notifications over to diet and have to bring notifications over three, I'll run through the same process again. And now I have a call the service which can extend data with the load balancer. But oh wait, how do we get the IP address? I still have to pass that on or hard coded in the application. And that address could change in case it's replicated load balancer. Now, how would you use this using Gilmer? This we realize that is actually one of the biggest pain points for not writing that big an architecture. Like if you're not running at a scale which really requires that many discovery problems, why do it? So in Gilmer, since everything is a topic exchange, let's redo the same problem. There's a notification server which comes through a topic which is a manager dot notification. You can call this anything that you want to. Now I can have more of such manager dot notifications. And if I have a now caller service which wants to send data to a manager dot notification could be any of the patterns could be a request response could be a signal slot either doesn't matter here. It still has to be routed to all of the servers. So this message would be sent to all of them. Now this is important part because when I introduce groups back then I'll explain how this actually works. Now the message would be sent to each of the servers out there. And once it is sent the next step that they do is they try to acquire a lock. It's a distributed lock here and each of the service now would want that lock. The lock identifier is the group that we define. So the previous one it was push message lock service. That's the lock identification. But the lock shall only be provided to one of the services here. So your server notification server one acquires that lock and processes that message for you. This guarantees that now you're not going to process this message over and over again for three times over where it should only be done once. So what happened here if you realize there was no service discovery because they were topics. So when you are sending the data you actually didn't care who you are sending it to. There is just a function call or a path that you care about you don't care about which server is going to process it because you don't need to know that. There is no querying of the discovery process. Effectively what you're saying is that I'm going to call a service and I'm not going to call an individual server because so this is like you know if you have a wallet if I ask you do you have money in it. If you have 500 bucks the answer would be yes or no. You don't particularly care about whether I have that particular note in there or not of a number. It's only a denomination that you care about. So servers are volatile. I mean they're here today and not there tomorrow. They're going to crash. They're going to be replaced. And the service is something that you utmost care about. You don't care about whether your payment server is service is able to run 100 servers. You don't care whether your payment service can handle payments or not. No load balancing either as long as there is capacity you will serve it. If a message has been delivered to one and all the fittest node whosoever acquires the lock first well is Darwin here. You don't really have to fight it whosoever takes a lock process that you don't care. Now another thing important about most of our distributed systems is errors. Errors happen all the time. You must have an ability to detect when an error has happened and what to do with it. You can't really just ignore them. So Gilmer provides you on a framework level multiple ways of handling these errors. The most important error that you care about is actually a timeout, a server side timeout or a client side timeout. A server side timeout is whenever you register a slot or a request handler, you say that I'm only going to serve this request for say 10 minutes. Imagine if it's a video encoding going on. You want to do it for a certain amount of time, but if it fails, you don't want to waste any further resources. A client side timeout is imagine you're making a payment and you expect a response within minute or so. Whatever the timeout is, which is comfortable for you. And if either of the timeouts are not met, then the request fails. The one other error detection that is really critical to your system is confirm subscriber. If I'm sending a request right now, since because there is no server notion here, so you don't know whether the server is actually going to process it or not, is a confirm subscriber. So while sending out a request, if you mentioned that it's a switch where you say, I want confirm subscribers to be true. So the library itself would actually raise an error of four or four in your application code in case none of the services out there were willing to listen to this. Now just going through a sample of how this error would work. Imagine the signal slot example where the signal was sent and the slots expected an error. They would just forward it to an error reporter. The error reporter is a small component which is built into the part of Gilmer itself. I'll explain what that exactly does. And now think about the service as well. Request response pattern. In case it's a wrong number, it forwards it to an error reporter. Now, every error should carry as much information as possible for you. And once you use Gilmer, irrespective of what language you use it in, the error structure would look like this. It carries a topic on which the request was sent, a request data which is what the request had, user data, in case you want anything extra to be sent out. Sender is that particular ID which is unique to every message. A timestamp back trace is an optional thing which might be available in case if there's a trace back available. Although if you use Gilmer, we always give you that. And then there's a code. Now, once this is sent to an error reporter, what can you do with your errors? Now, in a system, these are primarily the patterns that we have seen that you either ignore your errors. How many of you actually believe in that? Okay, no one. So, or you publish your errors. The publishing of errors or a queuing of errors. This is unique to a system. You could do it either way. So, once this is done, so each of these message, once you start a Gilmer server in your application, you can set the error policy on which error policy is it going to use. If you use publish, we use the same mechanism which is every error message is sent out to a signal to Gilmer.error. Now, this is where the interesting part comes in. Now, in your application architecture itself, if you're using this, you would have an error receiver which listens to which is basically a slot on Gilmer.error and every error that is actually sent out is received by Gilmer.error. And just to show you a demo of how we do this in production, this is, can anyone identify what this is a screenshot of? Do you guys use pager duty? Anyone? Okay, yeah. So, this is just a screenshot of pager duty. So, we have a component which is listening on Gilmer.error all the time and any error that comes in is instantly paged out to whosoever is responsible for taking that action on that particular service. Now, monitoring. Monitoring is another really, I would say, an interesting topic in a distributed system is where traditionally we talk about server monitoring, where we have log forwarding and aggregation. Every log that is forwarded, we'd like to mine it later and draw patterns out of it and see if there's an anomaly there. If you use paper trail or log stash or any such thing, loggly as well, these services, you can draw grip, regex-based patterns and then raise tickets out of it. Error reporting, all errors that you have in your server, I mean, these are basic de-factors of any server-based system that you have. All servers should religiously have this. So, I'm not going to touch points on those because pretty much everyone talks about them. From a service standpoint, what do you care about? As an architect, I care about only are my services running right now? Are my services able of handling capacity? Do my services respond well in time? This is where Gilmer gives you a component called health bulletin. This bulletin is written in Ruby. So, it's a separate process that you run in your system. What it does is, every server of yours which is actually registering slots or request via Gilmer generates an ident for it. It's a UUID and that message is sent out to the health bulletin. Now, the health bulletin sends a message back every whatever minutes we are now configured to and it sees whether it's alive or not. Now, this is a part that is to replace whether or not, I mean, this is a part to actually replace the server part monitoring whether this particular piece of server is alive or not. This is important from a system administrator point of view. Now, if we go one step higher, what do an architect cares? Is my service listening to all the topics? Is my service able to serve well in time right now? And this is what also health bulletin does. So, you define, these are the crucial services in my system. So, request item.buy.420 could be one example. Then you say item.buy.500, item.purchase.anything. These are crucial services that should never go down. Health bulletin will actually look for any active subscribers for that particular topic and if it's not there, it will raise an error. Now, the error is also done through Gilmer.error. So, we eat our own dog food. We use the same pattern everywhere. Every log message that Gilmer processes is sent out on Gilmer.log. A Gilmer.log listener emits messages like this. This is a message from paper trail. And if you look at it, that's the topic that I was listening to. The first one is a host ident if we really care about what the host was. So, it's back in manager0.datascale.io. Then the topic is Gilmer.health and then every message that is coming out is carrying a unique message ID. Now, all messages that you emit in a particular scope of request carried a unique ID. This is very crucial because now in a distributed logging system you do not know what request came where and what belongs to what. The unique identification helps you club all those messages together. Now, beyond this, if you follow a UNIX principle and we believe mostly in following the UNIX philosophy is very obediently and one of the policies is that every time you define a service, think of service like a UNIX command line. All these command lines can be clubbed together beautifully because they extend data over plain text. Now you know why you guys hate system D because it doesn't do that. And using composition, a command should be able to send data to another command or call other commands in parallel or pipe it. Let's take a sample example of what Gilmer allows you to do here. So, you can have a service 1 which basically pipes the output to service 2 which basically pipes the output to another composition. A composition could be any of these things. I mean, people who are familiar with bash would actually do this in the sleep as well. This is basic tools for any bash based guy. You could have an and the or and the parallel being in a really an exception here. So how do we translate that to a software? Let's take a sample example here. This is what a composition would look like. So what I'm doing here is look at the middle line. It's a new pipe. The first pipe is basically it fetches example dot fetch is a topic. The output of that is sent to example dot words. I'll give you a working demo of this. Basically, when we're talking about distributed system, it would be unfair if you don't have a map reduce problem with does a word count. Now, the pipe output goes to example dot words where the pipe output goes to example dot count. The pipe is sent to popular. A popular is another composition up there, which is basically running two services in parallel, which is find the most popular three letter words and most popular four letter words. All of this working. What you do is you create a new data, a set data, which is a URL of the file and you just say message batch dot execute. So any of these compositions could be restructured like this. Think of it like Lego blocks here. Now, if I was to alter this and how scalability comes to the picture and flexibility comes to the picture again, if tomorrow I have to change this where I say I'm going to add stop filter because the words that are coming out are usually carry a and the those sort of things. Now I just add a new service here, which is composition example dot stop filter and in a parallel and add a new parallel service, which is popular five. So the same code. Now I can actually do much more by just probably three lines of code change here. This is another example, which I'll do a demo later. This basically what it does is it fetches the weather data for a particular city for the past one year. Then it groups it up by every month. The next up is it creates a lambda function which pulls out the weather data for the month of January and then in parallel it finds weather dot main weather dot max main and maximum temperature in that particular month for that particular city. It will give you the hour as well for that. Yeah, this is another composition and if you see is a new parallel, which is actually trying to do it for the month of January and February in parallel. Well, now let's just see a few working demos of this. It's okay if you cannot read this part because this one is not really that important. There's just a basic demo of a Echo server and a client on the left hand side. I'm starting one server, which basically is an Echo server. Then I start another one in another team accession on the right hand side. I have a client where I start sending and request. These are basically sending a request at every one millisecond and both the servers handle this in parallel because this is where the locking comes into the picture that both of them won't do it. I could have more servers on the left hand side, which basically could be 10 or hundreds of them and only one of them would do it. Now the sample text example that I showed earlier, what does the problem statement look like? So I'm going to give a S3 URL, which has a lot of random text, trust me a lot. And the goal is to eliminate stop words from that file and find the most popular three, four and five letter words. And the idea is to actually do this in parallel. And the output should be with the three letter words, the popular five letter words and the four letter words. So this is a working demo of it on the left hand side. I have a server again, which basically this. So this is all done and go, but you have Ruby examples as well. If you go to the GitHub repository, then I have another one. Now on the right hand side, bottom right hand side, I'm starting a log server, which is basically just a slot receiver for Gilmore.log. Now, once I'm sending in the first message, basically what it says is it fetches the file. After the file has been faced, it sends it to a service. One of the service has a word count in it, it groups it, then it eliminates the stop words and eventually the output, which is the most popular three letter words in this file were way and for the four letter words, text and copy and then blind and blind. Now let's take a bigger example of weather data that I talked about a little while back. So this is given that you have weather data for of this form. The form is basically a JSON hash of day, month, hour and degrees. Degrees are in Fahrenheit. Could have been Celsius as well, but I just opted to take it Fahrenheit because it shows a varied degree of temperature difference. Celsius doesn't do much justice there. Then the goal is for the months of Jan and Feb, I'm going to find maximum and minimum temperature recorded. And if possible, I'll do the tasks in parallel. The output should pretty much look like this, this temperature of the dates and here's a demo of it. I think this should be much more readable. Same thing, left hand side, two servers, right hand side is a log client, which is just receiving the log messages. Now it pretty much does the same job here. It basically sends it on two servers, computes the data and fixes the output, which is on the top right hand side. So if I do a quick recap now of what Gilmer does, it's basically a library. It's not an external process. So if you don't have a problem which requires the scale of Kafka, this is where it fits in. There is no load balance needed because you don't need one. There's no service discovery because we do topic exchanges. There's no SCTP, there's no server endpoint to be remembered. It supports broadcast and wildcard topics for the function points, scalability, exclusion groups to ensure that a message is only processed once within a group. Synchronous request reply pattern, which is the traditional SCTP model, error and health monitoring through an external plugin called health bulletin. This is what you can configure is available on the GitHub documentations there as well. Failure detection through timeouts and confirm subscribers, compositions. This is the real microservices as we say where everything should be treated like a function. Now the good part is that all these microservices, you don't know what language they are written in. You are oblivious to that. One of them could be written in Ruby. The other one could have been in Python like the log messages receiver or all in Ruby. This is a message persistence. It's just a transport that you care about. Credits, data scale.io, that's where I work and we use heavily in production. We are a Cassandra hosted and managed Kafka and Spark Shop. So all of this is done using Gilmer. I can assure you that all the error reportings, management, everything around it. The repository, you could go to github.com, check out Gilmer Libs. Yeah, that brings a conclusion to my talk here. If you have any questions, I'm open for them. Yes, please. So I had some questions around the composition bit. Yeah, please go ahead. I was wondering how so you can basically design workflows using composition. Correct. So I was I wanted to know how you would handle any errors that happens in the middle of something because I cannot do timeouts because in my use case, at least I can either run two calculation engines or a hundred thousand and that would take a few days. So I can't really do it and I don't do it using requests. I do it using signals. So I was wondering how the rollback strategy would be and the error handling strategy would be right. Good that you actually brought this up because I forgot to cover it in my talk. So the compositions are reusable. Now if a composition errors out at any point of time, whatever is left in the composition can be executed again. Now it's an interface. So a composition has executed to it. So whatever. So let's say if you had a composition, which is add, then subtract and then multiply. Now if it failed at add because you pass in a string or whatever reason, whatever is left of it, you can re execute that. And or maybe you can pass it to another composition as well. And the at what case the error happens is could be value in your case. It cannot be a timeout. So you don't let a timeout be there because in that case the job will run infinitely. If you can afford that much, you let it be that way. But in case there was a 500 internally in one of the services, you would be able to reuse that composition very easily. And what we are working on next is being able to send composition on the wire. So if you use spark, you have a spark and you have a fire and forget kind of thing. So you should be able to do that. So in case the caller, which is the context right now dies, the composition itself pretty much become meaningless. So how we are solving this is you should be able to send the composition itself on the wire. Of course, in that case the lambdas will not work because there's no way to serialize or pickle a function. And I mean, if even if there is, it's a it's a challenging problem and no worth going there. So it will raise an error that your composition has a lambda here. So this cannot be sent over the wire, but anything else can actually be Jason serialized and sent over another service to run. And in case it's a really long running job, which fails eventually, one of them will be able to pick it up and continue from there. Like all your services that are deployed. Does that answer your question? Yeah, yeah, pretty much. So just to confirm, you can send compositions as Jason over. So when you say composition dot execute, you would say remote equal to true and Gilmer internally will actually send it over the wire by serializing it over Jason. We have a notation for that we're working on it. Thank you. Hi. Yeah, I can see you now. I have a question regarding the operations of operation side of it. I was just going to thinking if you can you come a little closer. Yeah. Yeah. So the operation side of it, wouldn't your pub sub service be a single point of failure in this case? So yeah, I mean, it's a fairly interesting point that you've raised here. We use Redis right now as a back end. If you check out the source code, the back ends are swappable. You can pretty much use any back end. But read is why I think it's one of the most resilient and the best transport layers around present out there, which and using Sentinel, you can actually go cross geography as well. So the stats are out there on the readers pages on the level of messages that readers can handle. And it's really tremendous. I'm not saying that this was this is something that actually beats Kafka scale. And I specifically mentioned that if you have that sort of scale, probably you would not use this or you would use this on top of a Kafka back end itself because Kafka would not support wildcard topics, et cetera, stuff like that. But yes, fair point. I would say where your scale and if readers, I doubt though that readers in a Sentinel mode will not be able to handle the sort of load that we're talking about. What's the sort of load that we're talking about? I'm just thinking up front. Okay. All right. So we are pretty good for a fairly large wide. Have you also looked into RabbitMQ or ActiveMQ or ZeroMQ? Correct. So the stats say that readers actually beats RabbitMQ hands down. And interestingly, the project actually started with the RabbitMQ back end, but it wasn't as scalable enough and hence we moved over to readers. But we still have it somewhere in the branch.