 So the title of the talk is scaling FVM with multi-threaded execution and actually what does that mean? It's simple that means parallel execution of smart contracts. So this is the this is the goal our main goal and we're working together on that on that topic and So more precisely let's assume that you have a block that contains its full transactions to on this picture And they access smart contracts. So you have TX1 and then TX2 on the block The usual way for executing those transactions is to sterilize them So you have the miner that will execute first TX1 and TX2 and that will be the same for all validators And the reason is simple. This ensures that the state is consistent. This is what you see at the bottom of this Figure where you see that X initially equals zero and then at the end of these two transactions X is equal to 20 and The idea would be to execute those transactions in parallel both at the miner and the validators and So leveraging multiple threads that are available on modern multicore machines The problem that you observe on that on that picture is that now X is no longer the same at the miner and at the one validator that is Displayed on that on that on that picture So why would we need to do that was the expected benefit? Obviously you can guess. I mean it's to improve the blockchain throughput Maybe the latency, but that's not That abuse that's not sure that this will be improved By leveraging as I said before this multiple available cores of miners and validators And the challenge here as you've seen on the previous slide is to ensure the consistency of the blockchain Despite this parallel execution, so we don't want to have this X that equals three on miners and 20 for instance on on on the validator So how can we ensure that a First simple idea would be to use static analysis to a priori detect the conflicts between Conflicting transactions the conflict between transactions So this static analysis techniques could help build an Exception schedule concurrent execution schedule that would then be used by miners and validators to execute transactions in parallel while yielding the same results at all at all nodes So that's what We show here So we have a bunch of transactions five transactions and they read and write two variables X and X and Y and so observing the The sequence of read and writes we can derive a graph that will be used to to actually Define conflicting transactions and make sure that when we execute them We end up with a consistent results at all nodes the problem is that Static analysis is not efficient with languages that are quesitoring complete and That alone typed references. So for instance, that's the case this date here where Transaction three actually uses a function F to access using written writes portions of the state and so it's very difficult to Analyze that code and to know whether transaction one for instance Is a cause of transaction tree or not? So the outcome is that static analysis is not adequate for contracts written languages such as solidity Just a side remark you might know that some blockchains already I mean use that kind of of techniques for instance as the case of Solana and actually, I mean they are using a simpler form of static analysis They're asking the developer of smart contracts to actually define in solana for instance the accounts that will be touched written and read by transactions and Using this this declarations made by the Smart contract developer they can produce non non conflicting parallel executions So the the second solution if we don't want to use static analysis because we claim that it's not Possible to use that with languages such as solidity at least in an efficient and and and and broad way is to detect conflicts at runtime using Runtime instrumentation and this is actually what yann has been working on during his internship And what he will describe next so the operating principle is the following one We pre-execute to create a schedule that might be a for drawing schedule or something that looks like that I mean the idea again is to have a Definition of conflicting transactions and the graph representing them and then to speed up execution at notes and a block Not that a block later will be accepted if it comes with a valid For drawing schedule a valid one is one that does not yield conflicts when Executing transactions plus if the state that the final state that's been reached when the minor Produce the the the the block is the same that is observed by the various validators So I now let yann describe what he did on the FVM to implement that solution and the first results Good. Yes. Thank you yann Yeah, so first before I tell you about my solution I just wanted to quickly talk about the FVM architecture so that we have a common understanding of what we're talking about So this is the FVM picture That you might have seen already, but it's a bit complicated I also will not spend too much time on it You just can just note that there's a transaction that comes into a file coin node the node itself has access to the State store to randomness and cryptography functions the transaction is then entered into the FVM where So this is the first boundary and then there's second boundary We're an invocation container responds and we're the Vasem code of the actor is then executed and so Actors then have to cross all two boundaries again if they want to access the state So for our purposes, I have here a simpler Description of what is needed for us to understand my architecture kind of so again We have transactions a batch of transactions that we would like to execute faster. This is our goal here It's entered into a machine that has a call manager the call manager spawns the kernel and Spawns invocation container for each transaction or for each actor that's executed and passes it's the kernel The actor can perform. So it's that's the advantage of having this Containers is that technically they're all local they can access the outside world through the kernel and through system calls So mostly for example, they could access the state. They could access crypto functions, etc. Etc. Etc. The container can also span spawn new containers if send a call is invoked invoked and The kernel would be passed along right, so how Do we come in and how? Do we add the parallel execution to that architecture? This is kind of what we looked at first? So as a reminder Vivian told you this already. We have a batch of transactions In the first phase we want to realize whether the dependencies between them and in this That's so that's what the miner does and in the second phase. We want to execute them faster So in this quick example, we see that Transaction three depends on transaction one and transaction two But transaction one and transaction two they're independent and can be run on two threads Concurrently whereas transaction three comes later Right, so how we do this in code is that we start with transactions note that these It's a graph, but these are not the dependencies yet. These are just senders and receivers We take these transactions we run them Through the normal pipeline, but we capture all the system calls So all the the the pensies that they could have Some system calls do not create the pensies some do And we need to be careful to capture the right ones And we use this to create either a fork join schedule Which are the exact dependencies or in our simplified case? We just create a set of the dependent transactions and the independent transactions So this is the first phase that we do with this wrapped kernel and we once we did not at the validator We don't just receive a batch of transactions, but we also receive this additional dependency graph We're inside the block and What we can do now is that instead of having just one environment that execute transaction is to spawn many of them in different workers and each worker will create will execute different chunks and we know that all the chunks will be Independent and will not conflict However, every Fred has only access to a local state and not the global state So in a second step, we will have to merge So we all we don't write to this get we just buffer all the the rights and in a second step We then merge them into a global state We flush them into the global state Right, so this is still work in progress nonetheless We want to have benchmarks as quickly as possible to kind of get an idea of What the performance gains could be and for that we need a realistic workload and To understand a bit what what the workloads could be in the future we can have a look at Current EVM transactions and this is on the left. You see an Ethereum block and every vertex Is an account or a smart contract and you see that there's quite a lot of dependencies Between them you could think oh well, there's no parallelization that can be done Because it looks so like a big cluster But actually this is only on the account level Since we use the FVM and we have just the CIDs. It's very easy for us to capture fine-grained dependencies So even if two transactions access the same smart contracts, they might not have interdependencies Between each other. So what we think we cannot change So for example equity pools rather etc. They might not even have created dependencies at all And so we think we might be much closer to the picture on the right For the workloads that we expect and even this is without incentive incentives, right? So we might even be much more powerful than that All right, so this is one option that we to get a workload by replaying existing transactions or existing blocks which would Would be nice, but also we don't know exactly how it's going to evolve And so and the other option is to have synthetic data and to which would allow us to parameterization and Simulating potential paths in future Workloads so we try to kind of have a mix and I can just quickly explain our The workload we implemented for these quick benchmarks for preliminary benchmarks so we have a bunch of accounts and Actors the the blue dot is an actor the accounts all created transaction invoke the actor And the actor simply hashes A certain value with the caller ID And for some reasons that I can quickly say so we have we can simulate Crypto system calls and read and write to the block store So this will allow us to easily verify correctness of our of our implementation by verifying the X the value of X at the end of the the Execution between a sequential and the concurrent execution for example We invoke other actors So we test the send system calls and we make sure that's because these are All rust actors right and the rust compiler can optimize code a lot We think we make sure that the workload cannot be optimized by the compiler Right so before we change anything we wanted to have some quick benchmarks And this is the first kind of benchmark that we get where we see on the x-axis This batch the size that we get From zero to a thousand transactions in a batch and we on the y-axis the average time it takes to execute the batch And we see very modest improvement For the concurrent solution so we kind of went back to the drawing board and kind of looked at what takes time and We see these blue regions are so on the left is the system calls to take up quite a bit of time and the Wasm code the actor code itself, but a lot of We realized that there was a lot of overhead basically in our in our code that we could remove And then this is the current preliminary benchmarks Where we we see that there is some advantage of having multiple course We get we see the minister returns, but with four workers for example and a thousand transaction And a thousand transaction we get about 2x a bit more than 2x improvements and About ten ten thousand transactions per second execution speed that worth this arbitrary workload Yes so in the future what Do we plan to do so we want to complete this implementation with the merging of the states We want to verify that indeed our execution is consistent and that we didn't make any errors Most importantly, we want to compare it with different two-phase protocols So at every step of the way that I showed you there's different ways of doing things And we're not sure that we I mean it's pretty clear that there are other options And we can compare with one phase protocols that optimistically execute transactions And in the end the big goal is to choose the best fit for the FVM That's yeah, thank you very much for your attention. That's it from us So how do you sort of? Quantified the maximum available parallelism and these kinds of workloads what do your thoughts on that? What are the natural bottlenecks around to who points things like that? Sorry, the bottlenecks in the workload Yeah, I mean in what's what's common. I mean, I think if you you know if you base your Observations on EVM workloads. I think it's well for now. It's probably the best we can find right in any ways Yeah, I think so too And in the EVM, so this is some joint work from my PhD ongoing work But indeed I think or we think that mostly DeFi is what creates an NFTs marketplaces for example marketplaces are designed in a way that a lot of times the same Field or access And so this this could be a bottleneck, but again, this is So because this is brand new there's like no incentives for programmers to care about these things So it's unclear to me really what the bottleneck is going to be Yeah, for now it seems like it's DeFi and NFT for a few So one question when you mention workers What is the so you first do a pre execution to understand the dependencies? Yeah, and How do you parallelize that? So I lost I mean I see that you have a different number of workers But how's the how does this match into the architecture or like this the different stages? So the the pre execution is not paralyzed at the moment. Does this run sequentially? So the miners would Needs to have just better But then the parallelization is in the execution Yes, so you're you pre-execute you you understand the dependencies and then all of these workers that you were Plotting is the amount of let's say threats in the actual execution in the VM, right? Yes, I take the blog I pre-execute and then I can launch or spawn different threats that are number of workers within the execution Yeah, okay. This is why we call it to phase Actually one phase wouldn't I mean wouldn't work that way would be optimistically You know executed in a parallel way or as we use a pessimistic approach Or basically we first create a graph and then we use that graph to parallelize These are the kind of two approaches. This is why Ian said in the future work We might want to look at you know one phase protocols too. That's By the way, these two phase might also be done, you know with a parallel execution in the initial phase Exactly. So the initial phase can also be Paralyzed we didn't choose to do that. I mean for such a thing. I didn't know what part we were Yeah, thanks. I mean the design space is You need to pick one point So I have a question regarding this picture. So Fantasy is correctly on the left. It's per smart contract. So each circle is a smart contract and we see a lot of seemingly dependent transactions and on the right every circle is kind of an object or it's the same this the same Representation except we remove some edges Because we assume that for example writers can be removed you see 20 contracts can be optimized So so you kind of remove it based on the type of yes, and tab we thinking that some contracts Currently create dependencies, but they don't need to create dependencies if we would they were programmed correctly or programmed in Concurrency in mind Dependences that we can't safely remove Actually in the future the idea is not to remove those dependencies is more to think about how to Give incentives for smart developers to avoid Creating those artificial dependencies, you know, it's like you know Yeah, so I'm wondering how this would work with content addressable Data because if you modify something there, then you inevitably have a different route And if I understand correctly each actor in file point It's current state is represented as the root CID and if you change anything there You change the route and then kind of you have a conflict negatively If you have two two transactions would touch the data of the same actor, then they both Have a conflict and you kind of you mean just Just the fact that they touch the same actor they will conflict is what you mean Even if they are not even if they are not touching the same part of the state Yes, yes, yes, yes I mean you're right in the sense that the root CID will change, but actually that's something I mean that's But still there is no conflict. So there is something in need to Yeah, you can probably merge As Ian was saying merging is still a not-going work and indeed I mean because the transactions won't conflict because they won't touch the same part But they will yield different CIDs. So they might be seen as conflicting transactions. So this needs to be taken into account carefully and You're right Thank you so much All right