 Good morning, everyone. My name is Anumit and I wanted to present something that I had been working on as part of a startup that I was building with my friend since the past couple of years. So we present an API gateway on top of decentralized and distributed blockchain protocols and this is something that we encountered around last year trying to keep up with a very high throughput blockchain protocol. And heading into the talk, I'd like to just let you know that I might be speeding through a few sections because we will take a look at a few fundamental blockchain principles but what I really do want to do is dive into the multiprocessing code that we had written for this specific purpose. Just bear with me as I introduce a few concepts so it becomes easier for you to follow along the rest of the code. So everyone is talking about blockchain but what I do have to say is it's something that most of us would have done already in some sort of basic computer science course, a linked list for example. You would have all done it. So it's literally a logical chain of blocks. That's all that is there to it. The program article part comes that the blocks also contain certain transactions. And in transactions, the most basic form of transaction we understand nowadays happens to be debit and credit. You have a debit from your bank account and a credit into someone else's account. Think of it something like that but it can also be a bigger cycle depending on let's say someone else gets a credit and they happen to pay a vendor. The blocks are linked in a certain sequential order and the transactions are picked up from those same blocks and they're applied. So debiting from A to B, creating from B to C, so on and so forth, which causes something known in computer science theory as deterministic state transitions. So from a certain beginning state, we move to state one and state two and so on and so forth. If you're familiar with this term, it's called a deterministic finite automata or also known as state machine. As you can see from state A, I can move to, from state A, I can move to state B and the deterministic part makes sure that I can also reverse back from B to A or from B I can move to C, something like that. How are the blocks linked? It's never arbitrarily. There are peers that participate in a network where they gossip, quote unquote, gossip with each other about what exactly is the sequence of blocks, how is this going to be decided is known by a technical term known as consensus. So consensus protocols form the core of most blockchain protocols. The sequence of blocks also known as the global state does not exist on a server. It's not how we are usually used to seeing databases where there's one database at one central server. It's always being agreed upon by the peer that participates in the network. So that makes the part about blockchain decentralized. Distributed, yes, decentralized also. So new blocks are proposed to the network by peers through things like proof of work or proof of stake. Proof of work is popularly known as mining nowadays. It uses a lot of energy. So that's like a way to limit your entry into the network before you can start proposing things. So it's like, if you're rich and powerful, you get to propose blocks. That's what the problem with Bitcoin is. Most of people say that. But also what happens after you get into the network and propose a block, that block which contains transactions should have the maximum computational value, like a simple debit and credit between two accounts won't have as much computational value as much as let's say five parties settling mutual invoices among each other. So we try to pack as much computational value into the block and then propose it to the new list of blocks. Specific to this example which I'm going to talk today, it's known as greedy heaviest observed subtree protocol, ghost protocol. The peers which participate and communicate via a very low level protocol known as Ethereum wire protocol. It works on the UDP protocol. They are always communicating what are the list of blocks that they should be agreeing upon as the final central truth. In this graphic, the blocks in black happen to have the most computational value and if you see the one in purple or violet, some clients happen to have these kind of chain of blocks which has to come to a certain consensus that finally they all agree that let's discard all the violet colored blocks because we know that this chain of blocks in the black happen to have the maximum amount of computation. So consensus as you can figure out it's a dynamic landscape. As peers and protocols keep evolving, the consensus gets better but if your network has latency issues, even if you're running on a cloud platform there are problems communicating and synchronizing among each other. So this is an example I wanted to show you. Let's say we have this blocks in green. It's all good now. It happens so that every peer which is talking to your node, they all happen to agree. It's all good. But a subset of the network discovers that there is a different chain of blocks which happens to have more computational value. So just to make it schematically easier to understand I have made the same block over here but with a different hash which means it contains a different set of transactions and the one in the red happens to have more computational value. So there's a change at block number 623. The hash has changed and so have the subsequent linked blocks. It would look something like this. Of course it's an easier way to show it but from here the effect starts trickling down to the rest of the network but it still has not reached let's say the your peer that is participating in the network. Now you're faced with the fork in the version of the truth. Of course it's a high level view. It's not exactly maintained in the code base like this but you get to understand this. So after further synchronization we decide that okay 623 over here will accept the one with the block hash 14. Going back a little so instead of block hash 4 I took the block with block hash 14 with more amount of computational value. But let's say your application logic which is to say ERP application for example was reading from this transaction. It has certain logic working on it and it has pulled certain data out of it. But now what do you do? The transactions have changed. Are you going to roll back? How are you going to roll back and what are you going to read after that? A lot of changes happen at your application logic after that. We'll take a look at what can be an example of that. Ideally it should look something like this. It's good that you have a peer participating in the network. So within the peer this can happen but what about your application logic? There is a revert from the transactions that were done. If you go back to the state machine automated example you can revert. You can roll back. Go back to a previous state and start applying the new transactions. It's good at the level of a peer. Not so good at the level of an application. Let's see why. If you're familiar with what a purchase order is, so what usually happens is there's a purchase requisition or a request form which is approved by someone maybe from the finance department which triggers a purchase order. So I have something like this dead beef which is a transaction ID for the confirmation of a purchase form which triggers another transaction ID which references the previous one as the confirmation. So this transaction which I am hovering on right now, it happens to be a creation of a purchase order and it's referencing the previous transaction as the confirmation. And let's say this happened. The part where I showed you the fork happened in the network and you have these kind of inconsistencies now either the transaction was not included in the later chain of blocks or now it has a different ID. In either case at the level of an application it's now referencing a transaction that has changed. It does not exist in the same location in the chain of blocks. Not a good thing. So what does it have to do with us? What we try to do is we're trying to give you and distributed computing what it is termed as a transparent layer. The transparent layer is something that you talk to as if it was just on the web service API. Rest being that all business resources are mostly accessed as rest resources and we did not want to disrupt that model. No matter how popular GraphQL gets, like there are legacy libraries that gets hard for people to just suddenly port over. You got ORM and things like that. Anyway so what we take care of it is that all the transaction changes that you saw should not be your headache. You shouldn't be dealing with duct-taping protocol logic. All you should be bothered ideally is about like just send transaction, receive confirmation. That's it. So you can focus on tightening the screws of your business logic that way. So the API gateway usually for centralized system when people say API gateway people are like okay this is just another relay. What's the big deal? The big deal is try to maintain state with a decentralized network. That's when it is a challenge. Schematically your application will look something like this. It's a transparent APL here. You talk and make usual API calls over HTTP. You got webhooks, websockets, take the whole gamut of integration with Zapier. It's already a mature ecosystem. I don't need to go into that. The specific example I wanted to dive into is for this protocol you can check it out. They're called Thundercore. They support really high transaction throughput. I want to show you a code at the end of the demo. So Thundervigil is what we built on top of their Thundercore protocol. So our branding is like that. We just call it Vigil on top of the protocols we're building on. What it is powered by is in Python. Now we're getting into the Python part. So it does what we were explaining the challenges that introduced. So the microservices happened to do the challenges that I had introduced to you earlier on. You got your snapshots of the chain at different points of time. You detect the reorganizations that I had showed you. You can extract information from there, push that over higher order logic. You can replay because that being a state machine, you can go back and forth as you wish to. So this is a code I also have on GitHub. If you can go on to the Python and Python proposal link, you can find a link to my GitHub repo. So this is an example. I have a process monitor and coordinator which launches certain child walker processes. It coordinates with them over a synchronized queue instance. And usually what happens is if a child process exits, it just reports the state at which it exists via the managed synchronized queue. Why not threading? People asked me, like this is a problem with global interpreter lock. I really don't have the time to go into it. And we have Mr. Beasley over here. You can go have a chat with him later on if you really want to know what's the detail into it. Logically, it looks good, the code. But what happens is it's always a single context of execution where multiple threads start competing with each other, trying to gain access to the Python interpreter. So if your code has different CPU and IO bound needs, they start competing and that can totally mess up our stateful information. We cannot mess up the stateful information because that is our guarantee to the client applications, correct? So usually this is what a good situation looks like. You wait for the lock to be released, you acquire it, you do your work while the other one continues along. But this happens, right? Two cores, when you have two CPU cores, you have two CPU bound threads, then suddenly we find this starvation problem. Like you can see, it sits idle for most times and then when it tries to acquire, it starts failing. It's horrible. We can't really expect that to maintain a stateful API gateway. So we thought like we have different IO bound needs to monitor extract data. We also perform certain cryptographic operations on it. Let us not even get into this mess. So we will go into the way of launching multiple processes and synchronize them over the managed queues, right? Overhead would be people would say what about the memory and memory overhead about spawning all these different child processes. You have to maintain the records of them also. I will show you a graph later on from our own code demo. So you will see that the overhead isn't much. This is the part I hope it interests you and I hope we can have a discussion about it later on in the day. So the demo, I am just going through the demo code. This is not exactly our code base. You have a process monitor coordinator, which is launched from the main process and this process monitor coordinator will do the work of launching children worker processes. Hierarchy looks something like this. Very simple example. Three child processes being launched from process monitor coordinator. The main also launches a synchronized manager server process through it. For this example, I'm sharing two data structures. One is a queue. Another is a dict or a mapping. The coordinator listens to that synchronized queue for state updates, crash notifications, crash being the worker process shouldn't bother about handling crashes. They should just tell the state at which they have crashed. Something like this. So you have a message passing format when irrecoverable crashes happen or you have I just introduced a message passing format. I don't know if you know what this stands for. You can Google it later on. So this is a quick note on why not just multiprocessing.manager why am I using a sync manager, right? In either case, it starts a server process so that I can share it over different processes on different computers over a network, which allows me to horizontally scale our aggregation logic. It's very difficult to do that otherwise. You can manipulate those shared data structures through proxies. As an example, I wanted to show you this is a very simple function. I'm going to launch it into a child process. What this will do is just append a certain data into the shared list, like here. I'm starting a manager instance. And I'm passing that into a process target function being this basic function. And I'm just starting this process. Wait for the process to end, which is join. And then I want to access the shared list. When you press control C, when you're getting a keyboard interrupt, then I want to do some sort of cleanup logic on the shared list. And this ends up happening. For something like this, no attribute connection. What is this happening? If you look at the stack trace, while I'm trying to access the shared list, I get this exception. What went on over here? The docs tell us that the manager like this written already started synchronized manager, which is to say there are no explicit signal handlers. So when you press control C, second delivered to main and every other child process down the tree hierarchy, and the exception is handled by main, but I do not have control on the manager server because it's already started, which is why when we try to access the shared list, the keyboard interrupt exception is not handled and that server process has died. So we explicitly create that and we configure it to ignore the second signal, something like this. So I happen to have a init myth callback, and I start the synchronized manager with this init callback, ignore, just ignore as if never happened. Now we pass the data structures from the manager, the queue and the dict into a process monitor from this way. So it's going to be passed down the process to the hierarchy. I'm going to create this into the main process, pass it into the coordinator. And just as a side note, you should be adding more handlers, sick term, sick kill also, if possible, you should definitely and this behavior gets inherited by all child processes spawned from this coordinator. Like this, this to give you again, once more, take a quick look at this is how it happens to work together. I maintain a track of spawn child worker processes in a process directory. And this is how we report crashes. This is my child process called a fetcher process. If there is some error, I put this in the queue with which process it is, what is the message, timestamp, stateful information basically, while my process coordinator keeps listening. This is a very simple blocking. Listen, it's okay. It's running as a process in its own right. There is no async I own necessary for this. So as I keep listening to it, I'll be handling this message. If the queue is empty, continue along your way. I respawn also child processes if they have crashed. So this is the respawn logic. Ideally, I'm just basically guessing the getting the process initialization arguments from a registry. And I'm starting it. There is a certain time limit which is configurable. You can check it out later on in the code base. Once the respawn threshold has passed, I call the respawn method. And this is how the child processes also update shared mapping like this. I've given a demo of this mapping getting updated in the shared mapping getting updated from a fetch as if it's actually fetching blocks of data. And I save state also through approach on as pickling. So you open a cache, you write to it in a binary mode, and you dump a pickled object, the shared mapping into this file. State rollback is possibly the most interesting part. So this is how I have structured my demo. In this three child worker processes, there's one state rollback notification listener. So when you pass a message onto it, it passes this message, the stateful information back to the process monitor coordinator. The coordinator listens to this. It begins rollback by bringing down all the child worker processes, right? Once it brought down the child worker processes, it will respawn worker processes with the state information from which they should resume operations. For this basic example, I have something called rollback index because that's what the fetcher process does. It starts fetching based on an index of data. Our workers brought down, the spawn child workers are brought down by first delivering a sigterm signal. And if the signal does not work on them, send sigkill. So the code is there. You should go and give it a try. Add a sigkill handler so you get to see what happens in case you don't want to totally avoid memory leaks with zombie processes or orphan processes. At a code level, it looks something like this. Phase one, when you want to reap the children through PSU till school terminate on these children. And if that does not work, see if there are processes still alive after that phase two, send them sigkill. Sigkill should be killing them. The code looks something like this. I want to show you the demo. Damn it. How much time do we have? Can I take five more minutes for like without queue? You can ask me the questions, maybe off the session. It's okay. Can you give me a quick thumbs up if you can see the terminal because I can't figure out. Thank you. So this demo, it has this block producer, as if there is a chain of blocks being produced constantly. And here I'm going to start the file. I'm going to remove cache so that. So what I do is this, as you can see the terminal, it's running a fetching process and it's fetching blocks, blah, blah, blah, like that. And I can query the current state of this, what state it stands at. So this is kind of all the blocks that has fetched until now. And I also get to see process directory, what I'm maintaining over here. Fetch it is running on this PID, blah, blah. What I've done in this example is after fetching block 20 is going to crash, just to show you what can happen. So I made it crash after a certain period of time and it brings down the fetcher and it starts respawning as well. And you can see it crashed after fetching 19 at block 20 as it has recovered. It has again started fetching back from 20 over here. And this is where I want to show you the rollback. What I do is I have this rollback command. So I want to tell it that, hey, I want you to rollback. Can you start fetching back from, let's say, index 5. So I'm going to send this command. As you can see this, God, is this kind of visible? Like, so right now what happens, I receive this command and it started bringing down all the child processes over here. And then the respawning happens after a while. So as you can see, this is the begin index where it started fetching back from. Because this is what I told it, like, hey, rollback onto this index, right? So this is what happens. And going back, like, on a very long run, this was the kind of memory, memory and CPU usage. Dude, what the hell? Is the net slow? So sorry. You can go ahead and take a look at it later on. Like, you can see the graph. I left it running for about 45 minutes or an hour to show you the long run graph usage. I wanted to show you how the CPU usage and the memory usage drops while the respawning happens. And the 1200 transactions per second part, I want to show you a live demo on it. So basically what I have over here is our development environment on which I'm going to, in this development environment, I'm going to make a bunch of transactions on the Tandakore protocol, which I wanted to talk to you about. Guys, I think the network has become slow and I'm having a little bit trouble opening the live demo. I have two minutes left. I'll try to connect it with my hotspot. Please stay with me. This is the most interesting part. What do you call it? Murphy's law, right? Unbelievable. Oh God, why do you do this to me? Okay, hotspot is on, goodness sake. I still have one and a half minutes to go. Is the hotspot connected? Yeah, hotspot is connected. Great. So what I'm going to do is, please, please work. How will we make in India? How will we make in India? What the hell? Look at this. Okay, so this is our development dashboard, totally made by absolute geeks. Looks shitty. I know that. But that's okay. I'm going to run a load of transactions. I'll share this code also. It will get up. Don't worry. So here I am running a bunch of transactions. I'm also going to open up a WebSocket client for this. Yeah, that's my WebSocket client. Soon, 15 seconds later, given our connection will be alive. Yeah. So it's about to make 16 transfers, right? Like this. It will soon see updates coming over here as we are extracting logic. Do understand, this is not happening on a centralized server. This is happening on a blockchain protocol where extracting information as it goes along, it starts extracting information. This is our API gateway. This is our WebSocket solution which starts extracting data as the transactions are being sent out, as you can see. When the transactions get sent out, I start getting, sorry, I start getting updates onto my dashboard over here just to show you that this is how fast we are able to coordinate with this blockchain protocol. Each of the transactions that you see over here are the ones that are being sent over here. I'm sending them out and I'm catching them as soon as they go. So we are able to keep up with this high throughput transaction protocol. It's not a centralized server, it's an actual decentralized protocol. And yeah, Python multi-processing and the approach I outlined earlier allowed us to do this, right? So that would be like a dive into what I had been able to build with my team. Yeah, so yeah, that's about it. Thank you.