 Cool. So hi everyone. I'm Luke Marsden. This is my colleague Kai Davenport. We've done some prototyping so far and we're going to continue developing Becquiao with all your help, hopefully. I wanted to start by just saying, yeah, let's make this a discussion. I really, for me, a goal for this summit is to make what we're building better because we've got all of your input. And Juan, your presentation was helpful already in terms of giving us some more ideas and I know there's so many smart people in the room. So yeah, please jump in and suggest different ways of doing things compared to the ideas that we're presenting. Nothing is sacred. So yeah, with that, we've already talked a lot about vision, but just high-level computers are major missing parts of the wet-tree stack. We think we've got a good cracker at solving that in a way that also creates a framework that lots of other people can be successful in that context. And ultimately long-term, I think we want an open, trustless market for compute, but we're not doing that from day one like everyone has said already. That is kind of the eventual goal and we are in the process of laying the foundations for that. So yeah, to come back to Juan's triangle, there's this triad of trustless compute. And that's a little bit like the cap theorem in distributed systems that you can kind of only choose to. And like Juan said, there are going to be lots of different approaches that fall in different places in this space. And so, for example, there's cryptographic verifiability being able to generate a proof by doing some intensive computation to be able to prove that you did a piece of work without disclosing anything about the work. I don't understand maths, I don't understand it, but it's very impressive. But of course it has performance implications, and then there's common motion where you can encrypt a piece on the client, ship it to the server. The server does some computation on the encrypted work, on the encrypted data, and sends an encrypted result without the server ever even seeing what the data was. It's crazy in my opinion, but it's really cool maths. But again, it has performance implications. And then there's this kind of alternative approach to optimistic verifiability, which is more like you let people do the work. You assume that they're doing it correctly, but you also sometimes check. And sometimes you check later or sometimes you check kind of quickly. But you use that as a kind of economic incentive to catch people out. And these are all things that we want to support in the framework that we're creating with with battle yell, but we are not going to try and solve all of these things from day one. We want to create a space where other people like you can come and contribute, provide implementations of the interfaces, iterate with us on the interfaces because we're probably not going to get the interfaces right from day one. We're going to try and build stuff with us and we'll make it work. We'll change whatever needs to be changed to get that stuff to work, and then we'll build this framework for all of these pieces. So like it says on the slide, these are advanced topics we've got some much more basic stuff that we need to get right first. And I've already said this the goal is to be a framework. We want our code to be reusable we want the network that we deploy to be runtime extensible eventually. A bit like live p2p Connie should really be the data developer, not the consumer, which means I should probably change her name but never mind. Connie is the consumer. She just wants a way to be able to with a great developer experience, specify a job and get it to run with no hassle, and eventually pay a fair amount of money for that job for that work to be done and and get the result back to be able to be confident that the result is correct. Pro the provider wants to be able to deploy back to y'all onto a bunch of servers where they're already running IPFS and or Filecoin, and efficiently be able to accept jobs where they already have the data locality and one said data has gravity and and and so eventually prove the provider would want to be able to generate an additional revenue stream by running this adjacent to their Filecoin nodes as in running our server on the same machines that they're running lotus and so on. Eventually, and then there's also blue bell the blockchain developer and that's probably you lot. I think our people who are developing solutions in this space that that want to collaborate so that you don't have to solve every problem yourself. In the early days of web servers and things like that Apache had to re-implement so much stuff in C to do basic things like forking processes and and being able to shell out being able to scale and and these days there's framework like like Golang that do so much of that work for you. And so if you wanted to implement the Apache web server from day one now, you wouldn't have to go and implement all that stuff yourself and see you'd be able to build on the shoulders of giants. And so that's that's kind of the framework that we want to enable here. Yeah, high level. There's this kind of conceptual overview of what we're building is there's little p2p at the core at the very lowest level. And that's just how the network is going to manifest and the peers are going to talk to each other. And then there's IPFS a layer on top of that with IPLD is as everybody here probably knows a super useful protocol that can be used to move things around and do content addressable storage. And then the piece that we're going to build and this is really the focus of the project is this core scheduler piece that can be extended with the various different sort of plugable verifiers that can be plugged in on top, going a little bit deeper into the interfaces and the plugability. I'm not going to read every word on this slide we have a whole nother talk on plugability after this one, which probably will be shorter than this talk, and then lead into discussion, hopefully but yeah, the different I will quit level pieces here. The scheduler is this core piece, which is able to allow the user to express. I have a job. It has this job spec. I want to run it and to then coordinate with worker nodes or compute nodes that that have spare compute and are available to do that work. And so the scheduler interface there. So there is a scheduler interface in our code, but the scheduler itself is core. And so the interface might one day be used for swapping the scheduler out for like a smart contract based implementation of the scheduler, but the scheduler itself is not designed to be kind of runtime plugable, whereas the parts of this diagram that have a little asterisk on them are designed to be runtime plugable. So that's an important distinction. So, yeah, everything talks to the scheduler basically the scheduler is the heart of the system. We're going to hang a reputation system off the scheduler. Initially, we're just going to make that like, like Dave said, kind of a reporting based reputation system so it's just going to publicly keep track of which nodes behave well according to the verification protocol. And then we've got the plugable verifiers, plugable compute and plugable storage. Plugable compute is probably a reasonable place to start and we'll go over this again later. But the idea there is primarily we're focusing on wasm and deterministic subset of wasm because non determinism is hard. I'll talk about that in a minute. We also already have implementation like a plugable compute that supports Docker runtime and firecracker runtime for doing kind of vm based isolation, which is useful when when pru the provider is being asked to run untrusted user code. Pru really wants to know that the untrusted user code is going to execute in some sandbox that means that she's not likely to get rooted. And then the plugable compute will interface with the plugable storage interface. Initially, we're just supporting IPFS for storage. Eventually, we'll also have an implementation for Filecoin. And there's no reason why we shouldn't also have implementations for consuming storage from other systems as well . And then there's the verifiers. And the verifiers are interesting because the way that you do verification will vary a lot depending on which one of these or some other stuff you're doing. And so we'll talk a bit later on about some proposals we have for decoupling the verifier and the compute interface so that you don't have to have, so they're not tightly coupled and you don't have to have like every compute interface shouldn't need to know about every verifier and vice versa. But talk about that, talk about that in a bit. So we have already developed a prototype over the last few months. You will have seen that on on the weekly reports and the slack channel. So what we did there was we implemented a scheduler on top of live p2p. It's just very simple live p2p gossip sub. All the nodes can connect to each other. They can discover each other. They can chat to each other. And then we have a scheduler implementation which allows you to submit a job. And at the moment, the nodes, they just record jobs that have been requested in memory. So when a message comes in over gossip sub and Kai will talk about exactly how this works, then then it's recorded in memory. We have a CLI which I'll talk about the way that interfaces with the rest of the system in a second. We've got this Docker and firecracker backend. And then we kind of we went down the rabbit hole of nondeterministic execution in the prototype. And I feel like that was useful because we've learned a lot from doing that. And I can report back on kind of some of those findings. So yeah, I'll talk about about nondeterminism in a second. But the way we did that was with this trace based verifier. So what you can see on the on this graph here is what we're calling in a nondeterministic context evidence of work. It's not proof of work. It's not. It's not cryptographically sound proof that you definitely did work, but it's the trace of the CPU and memory profile of executing a job. And what we found all the the idea behind this is that it's kind of like a variant of the halting problem without running some code. The holding problem is without running some code is impossible to know whether it will terminate on that. The kind of corollary of that is without running the code, it should be very difficult to predict what the CPU and memory trace of that code will look like, unless you've already run it with the same data and you've seen kind of how how it how the execution proceeds. So this is just an example I think of a program that we made it allocates a bunch of memory and then it. Yeah, like it reads the data file into memory and then it. I think it just runs said 10 times in a loop with a five second sleep in between, but it was just like indicative of the fact that you can get these kind of useful traces out of the system, but those traces are noisy. And, and therefore, in order to compare those traces you end up with a signal processing problem, and you end up with a machine learning problem actually, which, which makes getting confidence that things were run in an equivalent way into kind of a hard problem that has tolerances around it which is, we've got to the point of kind of saying that's one way of doing it. We have established that it's definitely hard. And we've, we've found the. We've kind of found areas of research that we would need to do in order to, in order to get that support for non determinism into production. I'll talk about determinism a little bit later but just to kind of take a step back and give a big picture of of what back to y'all is. It's really three things that we're building is the CLI, which is the interface to the scheduler mesh. The CLI is really important because that's where a lot of the developer experience is going to be and we need to make sure that that's like really a joy to use that CLI. Yeah, I've got more detailed diagram in a second, but the CLI talks to the scheduler the scheduler makes is responsible for placing work across the network, and then there's the actual job runner, which executes the work. So very big picture there's there's these kinds of three different conceptual pieces, the way that they work in practice is that, and thanks to the control plane team for this diagram this was this was helpful. The CLI communicates with the battle y'all server over Jason RPC. So we kind of assume that a client is also willing to run a battle y'all server and in order to be to have that server act on their behalf for a long running job. That is kind of the request aside, and then the compute node side is that actual execution of work. They both subscribe to the scheduler, the request to node submits a job. The compute node which would then be running on a different server probably bids on that job. And Kai you're going to walk through this in more detail on you but the the the request to node then has to explicitly accept that bid. So there's kind of a protocol already for negotiating whether someone's going to do the work or not. And then the compute node actually uses the runtime to spin up the job. Download the data from IPFS. Although in many cases it won't need to of course because the data is local and and then submit the both the evidence of the work and the actual results of the job back to back into IPFS so that it can be consumed. In a little bit more detail. So this is what it would look like honest on a single server. So this is kind of what production would look like. You've got the CLI that's talking to the battle y'all server. The battle y'all server right now shells out to IPFS to read and write things from IPFS and to check whether the local IPFS node actually has a copy of the data and I will show how that kind of self selection works in a second. And then the IPFS server will keep track of the data and copy it around it if it needs to. The important part here is that the battle y'all server then spins up a Docker or ignite runtime ignite is a project from company called we've works in London that wraps firecracker in a Docker like CLI basically so you can do ignite run time and it spins up a VM and it gives you a Docker like experience. So it either spins up a Docker runtime Docker run times are super useful in development and NCI because you don't need hardware virtualization support the ignite runtime will be useful in in production where you want to where you potentially want to VM boundary for isolation. It spins up a Docker container or a VM. In both cases it calls that the runtime, and then it starts another IPFS demon inside the VM runtime. And the reason it starts another IPFS demon is that we found IPFS is a super useful way to move data around. And in it's also super useful way to move data around on local host, because this IPFS demon won't isn't long running. So it won't have the copy of the data that is here, but it can stream it from the host into the VM so it's a very efficient way of streaming that data over the VM boundary. And that means you don't have to copy the input data from the host into the VM, which is nice because we want to avoid having to copy that data that's the whole point. So the user code can then just read that data over this fuse mount IPFS will stream it because I the IPFS demon inside the VM will will peer with the IPFS server running on the host. It's all over local host so it'll run at like 10 gig plus speed to stream that data. And then the user code finishes executing, and we gather the results and write them back. I also wanted to put up this scribble that we made of how dev stack works. So if you've tried the read me then you probably tried spinning up dev stack. Dev stack is a super nice little tool. It's part of the CLI actually, but it allows you to spin up a three node cluster just locally on your laptop. It spins up a lippy to P network that's private so it doesn't actually go out and try and talk to the public DHT. That's really useful because we want to be able to dev stack is used in our integration tests suite basically so it's a way to get fast fast like spin up of a stack that you can test and both for interactive testing and demos and and also see I so it spins up this private lippy to P network. All of the back of our servers are just assigned random ports on the host. They all are connected to each other by peering with each other. And then it also spins up. A bunch of IPFS servers that are also just running in processes on the host and it gives them all a unique IPFS path so they act like independent IPFS instances and then it can also spin up multiple ignite and docker run times of course when you actually submit jobs and then those things are like a microcosm of this thing basically so tldr we've got this cool thing called dev stack that allows you to spin up a demo stack of back of y'all and run it entirely locally on your laptop or in github actions which is where we wanted to do CI. Okay, so I promised you that I talked about determinism so I'm going to have my little determinism rant. The. So I actually believe and I think one kind of alluded to this already but in order for what we're doing to be generally applicable to the software engineering community at large, we are going to have to support non deterministic execution. And just because so much code is non deterministic pretty much all the code that people outside of the crypto world right is non deterministic. I mean, my, like, even the most trivial go lang program that I write is non deterministic out of the box because it's the order in which you iterate over maps. Sometimes I feel like it does that just to screw me up but it's actually intentional because it tries to screw you up in CI so that you don't end up relying on going map iteration order in production. It was a design decision by the goal I team. As soon as you have multiple things happening in parallel any concurrency in the system, you can end up with race conditions. It's going to be when you have multi core. So if you have, if you support multiple threads, if you can run those threads on multiple cores, then the operating system scheduler is going to start running your threads at different times and you're not going to be able to control when that happens. So you're going to end up with different, different executions of the same program on the same data are going to proceed down a different code path every time you run them. Now we have testing frameworks. We have CI and unit tests and integration tests for the code that we write to kind of bound the behavior of the code to within certain sort of envelope of how that code runs. But the code is going to run in a different way every time you run it. And I don't think that we're going to replace the big cloud providers with this thing, unless we can support this eventually. Another example is stochastic rounding so floating point numbers. When you have floats it when you do things in like machine learning, you will often end up with the actual hardware generating entropy in the hardware to randomly round your floating point numbers up or down if they're exactly 0.5, because that results in better machine learning algorithms but that is entropy that exists on the hardware. And then there's even simple things like like logging the date in your time stamps when you when you do logging. Like, I guess this is me banging the drum we are going to need to support non determinism eventually. And that's kind of why we went down the the non determinism rabbit hole, but it has these big implications for how you for how you do verifiability. Right, so I'm going to show a very quick demo of the prototype that we have. This is DevStack, Luke talked a bit about DevStack. It's basically useful if you're developing a tool which I hope some of you in this room begin to do. How do we have some form of realistic network running on my laptop so I can change the code and quickly iterate and that DevStack. The main job it does is save you manually copying out all the commands you'd need to do to bootstrap the network so allocate random ports, allocate a TMP directory for IPFS isolation, and we should have three node network so let's make haste and see what that looks like so we're going to spin it up and it's busy creating three lippy to be nodes all interconnected with each other. At the same time as allocating random folders in slash TMP so we could start adding files to IPFS, but the whole point of back allow the data has gravity sentence which I love is like let's let's send a job about into the network and only run that job on a computer where the computer is local to the compute so if you don't have that CID on your actual disk don't attempt to run this job would be the too long didn't read. In order to test that out we've got three nodes in our cluster let's add a CID to one of those nodes submit a job and say the node that has the CID on its disk should be the one that runs that job and hopefully there we're getting around the fact that the data has gravity by sending the compute to where the data is kind of get the message by now I could imagine so let's first of all look at what file we're going to test on. We have this nice little I'm going to call grep the same as said like Dave's point earlier most data scientists use existing tools that are decades old like let's prep and said are interchangeable for the point of this like they're existing tools that work for data scientists right so if we have a look at what's in that file it's some fruit nothing nothing big nothing complicated. We are going to grab that file for some fruit that is the point of the job. Let's add that file to one of our three IPFS nodes. Let's just just you know to prove what's going on here like we have an IPFS folder that is for one node is all on the same computer but let's treat IPFS path zero as node zero is not actually got any VM level isolation but let's pretend it does. So let's add this file and then we should have a CID back. There we go. So we now have that data in IPFS on one of our three nodes. The next thing would be let's skip over this. By the way, this read me that I'm going through is on the repo right so this whole demo you should be able to pull repo follow this read me just as I'm doing like I'm kind of cheating I'm just cocking basically from the read me this is the demo right but the upside is you could do this yourselves. Let's submit this job. So if we look at this command, the basic saying back allow which is the equivalent of go run dot because we're in development mode and target the Jason RPC server of the first node now we could actually change that out. In fact, let's live on the edge and let's submit the job to a different node one that doesn't have the data for example to demonstrate like let's go over here but the job will be running where the data is. What is the job well it's grep for the word Kiwi in our file right so it's nothing special but that job should run where the data is. So let's run that. See what happens. Oh, I know I've not done. Okay, yep, that's fine. A bunch of text here are the environment variables that you need to export in order for everything I've just described to work because we've got three different nodes so we have three different Jason RPC ports and three different folders. So I just skip that step if we paste all of those in and three running everything. Right, go again we're going to submit the job to the node number one. Our job is grepping for the word Kiwi in our file. Now if you notice on the top there, we should be getting some compute nodes kicking in. I did deny. Okay. Okay, we know what we're going to do. We're just going to follow the region. I tried to live on the edge you see I was given it all the screens not going to screen is the problem but no it's me that's the problem. Okay. Right, just ignore me here once I do the obvious thing. Right. Okay. Step number one. We're going to start a Dev stack. Here we go. There it goes. That is going to print out for us all of our environment variables that connected three different nodes that's these right here. We're going to export those in our environment down here. We are going file to node number one. We're going to say, do I have a CID? Yes, we do. And go to run a simple job like this. Okay, that looks much better. So one event should have picked that job up and should be running it. The other two should have entirely ignored that job. And that's because of the self selection. And you know, it's a simple content. Do I have the CID locally on my disk or not is what we're calling self selection. I'll talk a bit more about self selection in a bit because it's an important component of the system, like avoiding having a scheduler node, avoiding kind of having another entity in the network is what why we're doing a self selection. But if we now have a look at the state of the network in terms of jobs, we have a job and it says it's complete. Only one node has run it, not three. We have a results CID. We can copy the ID of the job there and we can say results list for that ID. And we can see we actually have a folder that's produced the artifacts of what that job has done. And all of that data was retrieved over IPFS. You know, it's how we have a natural IPFS link here. That wouldn't work because we're in DevStack mode. So we're not on the public IPFS DHT. If we were actually running actual nodes that then we'd be able to obviously view the results of that job on IPFS because this thing isn't connected to the to the global IPFS DHT. But if we just have a look inside our local folder of what happens there, we can make the right. So here is like the artifacts that the job produced. So we've got standard out, which we can have a look at and I'm fully expecting to see. I'm fully expecting to see the word Kiwi in this in this file because that was the entire job. There we go Kiwi is delicious. These these two lines here are to do with how we're wrapping the verification engine. We'll talk about that in a second. So basic demo done. We've said, hey, I've added a file to IPFS. One of those nodes out of the three has that data locally that submit a job that uses that file the node that had that data locally has run that file. Right. So that's the general concept of avoiding data to gravity. Let's make it slightly more interesting now and run a job on all three nodes, which is essentially an equivalent step of what we've just done. But before we do that, we can add, we can add the file to all three, all three nodes. So first of all, add it to node zero. Okay. Let's actually get into the folder. First of all, add it to no zero. Then let's add it to the other two nodes as well. Okay. That's printing out our CID. There we go. And then here's the difference, the key difference between this job and the last one that we run. In this time, we're going to see concurrency three. And so let's imagine that we had a world where like the network is a thousand nodes and 200 of those nodes have the CID. And we're like, yeah, just everybody who wants to do it, run it. Well, the concurrency setting says, well, actually, we only kind of need three people to run it because of the verification engine that we're using. How many people we want to run the job will be a function of which verification engine that we're using. Right. What we're basically saying is we want three people to run this job. The confidence setting is how many of those people should agree for us to determine that those people were acting faithfully and the other person wasn't. And so essentially what we're doing here is an example of knowing which verification engine we let me rephrase that knowing which verification engine is suitable for the job that we're submitting. Right. And that's a crucial aspect of this entire project and why we're almost why we're all here is to think about the different jobs that data scientists want to submit the different ways in which we'd want to verify them. Have a library a bit like LibP2P that enables people to very quickly and easily implement those different verification strategies. And so then surfaced them back to the people that want to submit the job. So an example of that would be I've got a deterministic workload is always going to produce the same result. Therefore I'm going to use some kind of hash based hash the results of the job because those hashes absolutely should line up because it's a deterministic workload. Right. Another type of job which Luke alluded to is I'm going to train an ML model. There's no way in a month of Sundays that's ever going to be deterministic. How am I going to make sure that people are keeping honest we need. Let me just use the phrase quotes more interesting verification modules at that point. So I think what we're looking at here is an example of given a trace based evidence of work verification system which is what we're using in this demo. It's essentially profiling the memory trace of the work being done. How many people should attempt to run the work and how many people should agree other two settings that we use it right. So let's try it. See what happens. Let's run that. So. It's essentially the same command regretting for pair this time because why not. And we're saying run it three times and two people should agree to settings on the end. So at the top now we can see things getting a bit more busy like three different Docker containers starting running a job. Obviously in the real world this would be three completely separate computers but we get to see the entire thing happening in front of us. It looks like something is like something's happened. Let's have a look at what so let's list the job now and we have a whole bunch of results and the job I think is this one. It's not the it's not the easiest of tables like definitely some work we keep doing UX is to improve the output of the CLI entirely acknowledged. I think we spent about 10 minutes plugging that table in to be clear. You know part of you know what would good UX look like discussion topic for the on conference. Let's have a look at the results for that job that's the interesting thing so results is for that job. And we have the same outputs before as in hey I've run the job I put the results somewhere and we have these three lovely green ticks because everybody was faithfully trying to do the work and was being honest. It's a bit of the results. Where do those green ticks come from we are comparing the memory traces and doing some k-means clustering kind of really doesn't matter out of them because hey we're going to develop an interface for the verification system and you could come up and roll your own version of verifying this results. But for this example we're taking memory traces using k-means clustering and saying which are the outliers any and in this example the answer is there are no everybody did the work. Interesting questions get thrown up what if all three people have lied well now we're accepting all three people's erroneous results so there's a lot of interesting discussion to have around verification engines and the different pros and cons that come with them. The final part of this demo makes this slightly more interesting. What we're going to do is start another dev stack but this time with bad actors set to one right so what does bad actors do. It allows you to test what it would look like if somebody was trying to mess around and essentially if you're in a bad actors mode. What's just happened there. Oh it's just switched the screen off hasn't it which is then throwing my computer into. Oh we're back we're back okay cool there we go. Just ignore that bit. Bad actors says to a node when you receive a job. You did on the job as though you were going to run it. When you get to actually running the job though just sleep for 10 seconds haha nobody's ever going to find out right. So let's do that to do that we have to start dev stack again because at the moment we're running in everybody is honest mode so let's go into. Dev stack bad actor and essentially like Dev stack has a flag right I could say Dev stack with 20 nodes five of them are bad actors it's up to you how you. You know what sort of numbers that we play with here so let's copy these three. Let's copy these lines so that we have our environment set up as it should be. Again let's add the CIDs to all three nodes because we want all three nodes to self select this time right we're not kind of doing an example of self selection. We're doing an example of bad actors so all three need data that gets us to there. Now let's run this job and this time we're running we're running a grep job but we're just throwing a little bit of extra like actually allocates some memory so we can vary. Basically I want you to see some pretty graphs right that's what that's about I'll explain that when we show the actual pictures the core of it is it's the same job as before it's a grep job. So let's paste that and hope our bash coding there you go it's all working. So same job same as same as before it's running three Docker containers with a job happening inside you can see two of them have done a grep thing because we can see the phrase pair is nice. And let's have a look at the network and see what the network is saying and hopefully it should sort of get close to being complete. There we go we have what looks like a successful job right hey three nodes have completed and finished their results and they've submitted their results. Let's have a look at the results so results list for that job ID. And this time we get oh that's interesting so two nodes have done what we've asked but one of them looks a bit sketchy. So why and we internally know why because we started DevStack with bad actors one so we know that that sketchy node has done no work whatsoever and still tried to claim the reward. If we have a look in the folder now for the output. We can see something interesting so first of all, if we made that small just so I can copy it let's make it big again. Right so here is the output of a job that was working correctly and if we have a look at that standard out log. There we go pair is nice that's the same example as you know hey we run the job and it was the correct output. Let me copy the bad actors folder before I do that let's open the graph that we get. Right so that is the graph of memory allocation for a thing that actually did the work right is essentially just visualising the tracing that we've plugged in to take the memory done we'll talk more about how that actually worked when we get to the interfacing interfaces section. If I zoom out again so I can copy the bad folder here we go. Right and then xdg open metrics dot png. Notice how this looks very different as in the numbers along the side of the notable thing compared to the other graph. Very very wildly different numbers is the concept of that graph is trying to convey. Now if we look back results that we were seeing in terms of the results for that whole job. Each job we're using a particular implementation of what we could call the verification interface. So there's a tool called PS record which essentially says given any command started with PS record and will record the memory metrics for what happens during the execution lifetime of that command. The graphs that we're looking at there are a visualisation of those memory traces. Those that evidence this is the phrase to remember it's not proof of work is evidence of work gets transported back to the client in a production way. It would be encrypted because of course if I'm a bad actor I'm selling the network and I see you know Lou just finished the work and he just brought the results back to the client. I'm just going to go I'm going to copy his results and submit those as my own results right so we have to encrypt the evidence between the person that did the work in the client. That evidence is then sent back to the requester node the requester node in this whole stack I'll talk through this in the diagrams is like the custodian of a job. It's the thing that you submit the CLI submits the job to the requester node it then broadcasts it out to the network but it's a long running process. And so as service bid on the job the requester node is the thing that says hey I've got five bids I'm going to accept these three bids. Those three nodes are the ones that run the work. The compute node then will send the evidence having run the having run the job it will send the evidence back to the requester node. The requester node is the thing that's combining all three sets of evidence running k-means clustering to determine one of those people was lying because the other two agree with each other. So the whole point of that entire system is it's one opinion over what we're calling a verification implementation. If all of a sudden we're thinking the workload is deterministic that would look very very different. We wouldn't need memory profiling we would be hashing the outputs because we know they should be the same each jobs going to have a different approach. The whole point I think of this project is to build a bit like live P2P but for verification engines like it's a plugable framework that saves you all of the hassle. And in terms of everything I've just described the library should do for you and you're just thinking here is how I wrap a job here is how I produce some artifacts. Here's how I can compare different artifacts from different compute nodes to decide who is lying or not. And I think there's all sorts of different interesting implementations that we can look at there. This was a demo of one of those implementations. What did we just demonstrate it's the lower level there is some work that happened. The compute node produces an artifact of having done that work and transports that artifact back to the requester though the requester node looks at all the artifacts and makes a decision. So that lower level model that I've just described should be the thing that all of the verification implementations take make use of. And it's not an opinion about how we verify how the compute is done is more of the transport mechanism for getting those artifacts from A to B today make a decision. So that's like the useful stuff that would have to be done for every verification implementation. That's the important part. The actual opinion how should we verify these compute loads I think is what we need to make pluggable and modular so that lots of people can submit different approaches. How do I keep a database of key values in a distributed way that does that has no centralized server like as in there's a lot more detail to the answer of how it actually works. But how do I keep state across a whole cluster of machines where no one machine is more important than the other. I mean fine how did I do. Is that what DHT is. Yeah one example of IPFS DHT is given a CID which nodes have that CID for example right and so it's just like how do we keep that state across the whole network. The DHT does that job how they actually work. Good question one concept there would be like I'm going to bid on the job and I know I don't have the data. I just cost myself a massive ingress bill slash the other part would be I never intend to even run the job then we're going to look at the verification engine catching them out. So the self selection based on where the data is is more of an efficiency system than a verification that something happens system like it's in I can still do the job and have to pull the data. So there's a couple of strategies that we're looking at for this and these are longer term they're like down the road map because the goal is really something that's explicitly insecure as like a first pass and then iterate on it in public. You somehow distribute the verification jobs evenly across the entire network. And if you have a way of establishing the verifiability, you can therefore, on average, call people out over time as in that kind of solves the three, like three, just running the job on three nodes is a small subset of the total network. So 51% attack on that small number is is easy, I think is what you were saying. But if you can, if we can evenly distribute the work overall across the entire network and that's, that's a strategy to to help solve that problem. And then there's a couple of other approaches which I'll just mention I won't go into detail. Other approaches include a reputation system and also staking. But those are again down the road. So a combination of just a requester because it's like I'm responsible for submitting and tracking the lifetime of a job that's done somewhere else. Just the compute node as in I just want to run jobs. I don't really care about submitting jobs or a combination of two. There's each back on node essentially is up here to every other one. There's no request a server and the whole point of that is if I'm a CLI, I'm a user who wants to submit a job. So I throw the job into the void and at some point it will be done. There's a certain amount of long running like as in I need some long running process that listens for bids coming in listens for results coming in as soon as the results have met my concurrency set. Now combine them and you can't do that in a CLI. So that's all the requester node really is is a long running custodian of a job submitted by the CLI. But it's the same as every other back on our node. It's not like some centralized server that has elevated permissions. To be clear, it should be very. I'm going to say possible not easy to for me to think I've got I've got a different approach to verification. So I'm going to implement a thing on the compute node that produces some artifact and I'm going to implement a thing on the requester node that compares all those artifacts. At which point because it's just an implementation of the interface that like the project that we're doing is here's the interface and then somebody goes I've got a great idea for what would be a very expressive way to do the verification. So hopefully lots of people implement lots of versions of this interface and we have a whole ecosystem of expressive DSLs to decide on which is verified or not. And so I think the point of today just to really, really reiterate is let's not go down each and every rabbit hole of the demo that we've just seen. The demo we've just seen is a really rubbish way of actually verifying anything. It just demonstrates the transport layer between the compute node and the requester node and getting artifacts between one and the other and then comparing them to make some kind of decision. And I think the point of the system is that that flow will remain the same irrespective of which location implementation. I'll skip through this quite quick. I was I was thinking like if ever I was in a room where I didn't need this slide it's probably this room but just for anybody that didn't know. At the moment the prototype is lippy to P network with the schedule interface which we'll speak about in some time soon means that we could in concept replace this with a smart contract. Essentially, let's call this the transport layer for the system is how do we get a message from A to B. At the moment gossip sub is how we're doing it. This is how gossip sub works. There's a whole network of peers using lippy to P. You put a message out and it slowly propagates to the system. Everyone can hear about it depending on your subscriptions. I basically thought I want to learn how to do animating in Google slides is the point of that slide. And I think I pulled it off. So there we go. Right. Now we can actually look at what we just saw happen in the demo. Every backlog node has Jason RP or every requester node has a Jason RPC server. So how do I actually interact with the system? There's a CLI slash I can write code in any language I want submit messages by Jason RPC to get stuff done. Right. So here's the message comes in. Jason RPC interface is the requester node messages go out over lippy to P. The answer comes back and the CLI can display something on the terminal like we saw in the demo. What is the backlog node? And this alludes to what we're speaking about just now. Three main components that Jason RPC is almost an afterthought. Like it's a necessary thing to get request into the other two components. But the request node, this is the thing that says I am responsible for the lifetime of a job as in I am essentially the client. I'm the one that this job is being done on behalf of. And the reason that the request node exists is there's a lot of back and forth going on. There's bids happening on jobs like there's nodes that are marking the job as I would be willing to do this if you want me to. And then there's results coming back and there's a lot of messages coming back from the network with a job that you're interested in that you would need to react to when they arrive. So the request node is the long running process that listens to the rest of the network for events that occur over jobs it cares about and reacts to those events by either confirming bids or rejecting results or doing the things that you would need to do as the custodian of a job. That's the request node. The compute node is the other end of the whole system, the other end of the pipe if you like. Hey, somebody wants something doing. I'm interested in doing that thing. Let's bid on that job. Hey, my bid got accepted. Let's actually run the compute. Let's submit the work. Let's submit the verification artifact back to the request node. And so the back, you know, when we say a back of our node, it's those three things. I can very much run a back of our node just in compute mode, not really, you know, never intending really to submit jobs via the requested node. And in reverse, I could run a back of our node just in request or node mode and never really intending to run jobs myself now, whether we actually ship it as like a flag that says only start up the request or not interesting. I just call that UX discussion. The important concept here is those three components combine into what we're calling a node and everything is peer to peer. There's no kind of server otherwise. This is a different conference. Right, self selection. So this whole concept of it's not, you know, the request node puts a job out there. Right. It just says I want this to be run. There's no concept really over, you know, checking that somebody does have the data for them to select themselves to run the job. The whole point of self selection is to say, um, here's a job that will use the following data. Do you want to run it is there's no enforcement of the fact you actually have the data. It's a convenience hook to say, is it worth is if I run this job how much gravity would I have is another way to put it. Right. So let's see what what happens when I don't have the CID the job comes in. I'm a compute node. I see that that jobs arrived. I asked my local IPFS server. Hey, do you have that CID answers no. So I just ignore that job or put another way I kind of choose not to run it right choose not to bid on it. Yeah. The other the other example, a job comes in. It says, I'm going to use the CID. I look at my local breaths in my IPFS repo. I say, oh, look, I do have that. Therefore, I'm not going to have to incur a massive ingress bill. I do want to bid on that job and so self selection is an interesting concept because it removes the need for a global brain of this is who we want to run this job. That's not what's happening. You just throw the thing out there and you see what bites in some ways. Let's start with data has gravity and let's look at this as a very long and see naive but an initial an initial poke how to filter out the how much gravity would I need to have if I choose to run this job. That's the focus of this step nor hey let's protect against lots of malicious actors right this is more about how far would I have how far I'm going to stop the gravity metaphor but like how expensive of my ingress bill be if I choose to take this job on is really what this is more about than protecting any sense of attack. So let's have a look at the flow over all of what we saw in the demo real quick. So basically a job comes in via from the CLI via the Jason RPC server. It arrives with a request to know the request and note says, I'm going to broadcast that job out to the rest of the network using gossips up. At this point any compute node that is active would hear hey this job is up for grabs I can choose what to do with that job at that point. This is where the self selection thing kicks in all of the compute nodes are busy saying hey do I have to see it locally or not. Some of them say yes some of them say no. At which point the network is kind of filtered down to those that would consider running the job. The bids are then put back right so this is concept of a bid which is I would be willing to run this job if you want me to. So the bids come back to the request a note and the request a note at some point later when we have it financial incentives and a reputation system and all of the stuff you probably would need for the thing to work at scale. I might choose the reputation of one of those knows is below my threshold so I'm not even going to allow you know I'm going to deny them the opportunity to run this job and that's up to the requester right because it's the requester who is. The one who would end up paying if we are looking at financial incentives. As soon as those bids are accepted the nodes who had their bid accepted then run the job produce some artifacts. How they produce those artifacts is very much up to the verification implementation that we're using for that job. Which as we've learned already is open for debate what that should be and it absolutely should be pluggable so it should be lots of options. As soon as the results of that job come back to the request a note the request a note using the verification engine that we've used for this. Compares those jobs to each other discovers one of them is an outlier or otherwise erroneous and says I'm going to accept the results of these two nodes and reject the results of that note. There we go back as in there is the high level. What does it is a lot of stuff that we've ignored from from that flow but in terms of the stuff that should stick. There is a proposed flow and what do we replace we replace things like storage engines compute engines verification engines these things can't should be easily replaced with a different opinion. The scheduler and the flow of the job isn't necessarily what we've seen in that diagram but as a result of this conference we should probably come to some consensus about what that should be and then that should be the system that we propose out to the world with the various pluggable hooks that we presented so there's it's the same as Luke said earlier like the scheduler will have an interface but it shouldn't really be pluggable it just is. It means that we at some point soon could replace the limpy to be gossiped up implementation with a smart contract implementation. That's the point of the scheduler having the interface the compute storage verification interfaces are absolutely designed for there to be lots of different pluggable modules and everybody hopefully will write lots and. Variations on that to suit different use cases. Yeah, cool. What I want to talk about next is so we've showed you the prototype and we've shown you what it does and we've already started poking holes in it there's lots of ways that you can poke holes in the prototype that's that's why we built it. The plan for what we do next now is to continue developing the baccalaureate baccalaureate project towards the goal of operating a production compute system. The initial goals for that compute system are to be scheduler focused reliability focused and scale focused. And in particular there will be no incentive or payment system in the first version of this or maybe even never in this specific project there might be another project later to to develop that. And so what I wanted to propose here is that in order to to make this distinction natural about incentives versus kind of plumbing that sits underneath the system that might have incentives. So I propose that we rename the project and part of this is because no one can pronounce it. No, I'm just kidding. The main reason to that I propose to rename the project is as follows I propose that we call it IPCS. And that is because IPCS will be for compute what IPFS is for storage. In particular there's this little two by two matrix up here that should make sense IPFS is a volunteer network for storage and Filecoin is the persistence layer on top of that adds incentives IPCS should be the volunteer network for compute and there will not yet be an incentive on top of it although there is kind of space for that in the future. So the goals then of the IPCS are a rock solid scheduler, high reliability, high scale, and some Byzantine fault tolerance to a point. And I'll talk about that in as we go through this. It's actually harder in some ways to do Byzantine fault tolerance when you don't have incentives because some of the approaches to the game theory of enforcing behavior rely on things like staking and slashing and things like that. And if you don't have those, then you have to use different approaches. So here's the master plan. Right. Please do actually tell everyone about this if you want this is a joke that you should keep it secret. But yeah, please do tweet it. We're going to use this old product chestnut. I don't know if everyone's seen this but it's a thing that's came out of Spotify. The way to ship software products is not to try and build the whole car up front. It's to build a skateboard that's a little bit useful and learn from that and then I don't know flag on it for some reason to make it shiny and then shift a bike and then a motorbike and then a car because at each the point here is that at each point. You've got something useful rather than having to wait like nine months before you've got anything useful so the master plan is. Is described in those terms. So the first release is going to be a basic system that only supports deterministic execution, then we're going to add scale, some scale, then we're going to add some reliability. Then we're going to make it support multiple files rather than just one file at a time. Then we're going to add more scale. And then we're going to address performance. Then we're going to address some Byzantine fault tolerance. Then we're going to extend the system to support pipelines and DAGs rather than just single jobs to kind of describe a pipeline system that are alluded to earlier. We're going to attempt to add some more Byzantine fault tolerance in phase nine. And then if we have time, we're going to finally tackle my favorite chestnut of non deterministic execution and file coin integration, although we may end up moving the file coin integration piece earlier in the project. So, I'm not going to go into detail about how we're going to do each one of these steps but I'm going to tell the story of what it's going to feel like at each step. And it's intentional that there are some negative adjectives in here because it helps you understand what this journey is going to be like as we go through it. So we're going to start by building a system for unreliably running a single deterministic program on a small file, which is already an IPFS, assuming everyone is trustworthy and there's only 10 people on the network. And then we're going to put that thing public. Great. We're going to test whether we can do this by trying to run cloud detection on a single Lancet image, because we've already got the Lancet data set, hopefully an IPFS. I know it's in five point hopefully it's an IPFS. If it isn't, we'll just publish it there. We'll put it there ourselves. We'll get the result back and you just verify it by eye. So this is already, the goal here is this is already a useful thing for someone to be able to do if they want to process data on the Lancet data set. They may well be able to verify by eye because they might only be doing it on 100 images and they can just scroll through them and check they all look like they got grayscaled. So yeah, you know, that's step one. Step two is trying to scale that from 10 nodes to 100 nodes and have it be able to process over 100 terabytes of data in IPFS. At that point, we might have an error rate and we can be able to be able to measure that error rate. What that might feel like is that we throw another 90 nodes into the network or other people show up and add nodes to the network. And at first things break but throughout the course of this phase, we release new versions that make things incrementally better. And by the end of the phase the network's working well, although it might now feel a bit slow, and there might still be an error rate even though it will be less than it was. We're then going to try and make that thing more reliable. So we're going to try and make it work 99% of the time. We should be able to test that by submitting 10k jobs and observing that on average, at most 100 of them fail. But it might at this point be kind of slow. It might take a while to resolve each job. By the end of this phase, the goal is the job execution significantly more reliable. And I'll give you some insight into the kind of reliability issues we're expecting to see. And how we might solve them. For example, at this point the system will have no way of stopping nodes from becoming over full, as in trying to take on more work than they actually have CPU and RAM for. And so we'll have to put in CPU and memory limits on the nodes at that point to stop the nodes running out of memory and crashing. So that's kind of how we potentially see this unfolding. Then we're going to add support for multiple files, so that you could say, for example, submit a job on a directory in IPFS, and it should shard that work across all of the constituent files in the network, or batches of them. That's all TBD in this section. Still using data locality where possible. Then we're going to try and scale the system up to 1000 nodes. I understand IPFS is currently on 15,000 nodes. Is that about right? It's 350,000. Okay, that's news. Cool. Okay, that's helpful. So anyway, we're going to be a small subset of the scale of IPFS. But, you know, like, when we get up to 1000 nodes, you might start finding that the amount of metadata about the jobs become significant and if there's a lot of jobs coming in a second, then we're going to need to do some optimizations around the data structures there and just shard things out into multiple live PTP, like gossip sub channels and so on. And then at this point, the idea is many users can run Landsat data workloads in parallel, and there might be some other public data sets that people start using in the network, maybe there's some biomedical images, or there's nine and there's nine other use cases who knows. And everyone can do that without the network failing is the is the goal of phase five. Phase six is trying to make that thing fast. We should be able to by the end of phase six do hundreds of job executions per second, and resolve those jobs within a few seconds rather than minutes. That's kind of the broad idea there. So phase seven gets kind of interesting. This is where we want to add support for up to 10% of the nodes in the network being malicious. And so we are explicitly bounding that at 10% in order to make our lives easier. I won't go into all the details about this because we're running out of time, but we've got some ideas on on how we can tolerate some percentage of failures or some percentage of bad actors by like evenly distributing the work, the verification jobs across a bunch of evenly across the network and and then looking at that trace. Well, looking at hashes of the outputs as opposed to the trace data because we're still assuming everything's deterministic at this point. Then I think this pipelines and DAGs thing is going to be quite interesting. I'd actually love to hear more from kind of Phil and then Rico on like how useful this will be to the data scientists. But you can imagine a Landsat job that isn't just one job it's like a sequence of steps you grayscale the image and then use downscaling it and then you do something else or you do fan out and fan in and lots of interesting things we could do there. Then phase nine, by the way, the end of phase eight is tentatively. We think we will be in a round October at this point. So this kind of. This might be a good time to do another kind of big tada. I mean, we will have been releasing this continuously throughout at the end of each phase, but at the end of the end of phase eight in October, we're going to have a pretty fast, pretty reliable public network that supports pipelines and all this fancy stuff. And at that point, it's worth maybe doing some kind of big announcement around it or whatever. And then additional to that. And then we'll analyze anti and fault tolerance. And then my favorite non deterministic execution, which would be a good kind of proof that the system is pluggable in terms of verification strategies at least. If we say we've got support for non deterministic execution, and we've got this trace based verification. And it forces us to demonstrate that the interfaces are valid. And yeah, if I was 11, phase 11 file point integration does what it says on the tin. But we would have to give IPCS nodes access to a file coin wallet that they can spend money to get data out of file coin and put data back into file coin, programmatically. And then out of those phases that's kind of the journey that we see to production, the journey over the next few months. Obviously there's a bunch of conference topics that kind of fall out of that so these on the board later. So with that talk, but I will warn you, we have another talk that's much shorter that we are going to try and do in the next 30 minutes, unless people want to take a break and come back in 10 minutes five minutes. Yeah, lunch after 30 minutes. I mean obviously if people need to take a break at any time. So, I'll do the first two slides and then hand over to you if that's okay. I feel like we've already said this a lot today but we're not building this all ourselves. We want your help. We want to be able we want to build these foundational pieces that enable other teams to launch other projects that use that share this code and, and yeah, we want to make we want to help make everyone in the space successful. And a rising tide lifts all boats that sort of thing. So, in order to deliver on that goal, we need to make things pluggable so coming back to the octopus of doom or whatever you want to call this this diagram. So what these pluggable pieces the verifier, the computer and storage which we want to have pluggable runtime, and then the scheduler which is core and a kind of optional reputation system so I hand over to you Kai to do the rest of this. As we said like the schedule is the core of the system really like it's the mechanism by which requested notes and computer to communicate with each other. So I want the job run I want to build on the job I accept your job here's some results I accept those results right so that back and forth that we saw in the animations is all of that goes by the scheduler now an important concept is like things that change the state of the world versus hooks that you hear about some change of the world and this is where I would have shown you the code but I'll just talk about it right so submit job is is a function of the interface. A job was just submitted right and that leads directly on to self selection like do I want to bid on that job based on at the moment in time. Do I have to see ID local, but there could be lots of other parameters that affect that decision right so when we have economic incentives what's the value of the job is it even worth me thinking about doing that job what's the CPU memory profile of the job how long do we think it's going to take there's a whole bunch of stuff that comes into play when as a compute node I hear about a job arriving on the network and I want to make a decision as to whether I want to bid on it now how do we build an interface for that is going to be hard so. Subscription call my approach is essentially saying. In the compute node there's a subscription call that says hey your job just arrived on the network. Do you want to bid on it or not so there is a function in the interface that says bid on job. What happens between those two events a job arrived and now I'm going to bid on it that could be arbitrary user code that only that compute node plugs in like how do we implement that it could be a hook it could be a job has arrived call out to me. My custom function that makes a decision based on all of those parameters that I have authored for my specific use case for my computer and that could be I don't know is it a Tuesday there for bid on the job is complete it should be absolutely up to the compute node. Operator as to what logic decides on whether I bid on that job so the core approach of the schedule interface is to kind of have these two. These two halves of it really is like the moment you've made a decision call these functions to change the state of the world right so a function is being on job. Here is a subscription event system that you can hear about when things have happened that you may want to then make a decision about and that happens at both ends of the equation so the compute node. An example we just walked through is a job has arrived I want to be on it so I'm going to make a decision at the moment in the prototype that is literally do I have this CID locally aka IPFS refs local right as in there is no plugability at the moment for that decision but there can be because of the way the interface works so. The other side of the equation is a bit of just arrived from a compute node. Do I want to accept that bit and that comes down now to the reputation of that compute node how many other compute notes have been on that job and again it's a good example of I've listened to an event that's just happened I can decide what to do with that event turn back around and call the except bid function on the interface. So we've tried very hard to kind of walk the line between having enough opinion that the things useful and not having too much of an opinion where you can't now have your own opinion right and it's always the. Challenge of any interfaces with the right line in the sound so. I think maybe work at one part in the next day and a half like I'll grab my laptop or it's on GitHub like let's have a look at the schedule interface that does exist. Let's walk through in our minds what kind of scenarios that doesn't fit very well with or what have we not thought about I mean that just bold us to hear from all of you at this moment. Hooks for self selection and bid acceptance that's what I've just described today right so this is an interesting concept so compute verification. As in like how do we make an interface for running a compute job producing some artifacts comparing those artifacts across lots of compute jobs that are run on different nodes and coming to some decision about that. And how do you make that generic enough so that you could put it you know you can write your own implementation fairly easily and again it's quite a hard thing. I'll just say this is the dashed line in this diagram it's kind of like how do you decouple the verification and the compute interface without. Yeah having every verify me to know about every compute implementation and vice versa. No that's good. One approach that we're taking and I say you kind of heavy double quotes one approach hence like we could always take a different approach and I can't stress enough like it's not hey here's what we're going to do. This is not what's happening in this moment right it's like here's what we're thinking of doing please tell us we're wrong. So that would be a good thing for the next day or so here is the idea as it stands right is there is a artifact format that the compute engine needs to produce. So that could be a hash of the output. It could be a memory trace in the demo that we've seen we're saying there is a memory trace that the compute implementation will produce in the form of an artifact. That's one half of this equation the compute engine knows how to produce that format right the other half of the equation is given a whole collection of those artifacts come to a determination about who is who is being honest and who is not being honest. So let's think of a deterministic equivalent of the demo we've seen it's a lot simpler than what we've seen in the one we've seen it's wrapped the job with PS record, hence producing memory metrics. That's the artifact side. That's the compute node saying I can produce some evidence that I did a thing. The other side of the equation the theme is happening on the requester node is the K means clustering over those artifacts to decide who are the outliers put those two things together and you've got what we could call a verification implementation right the deterministic version is quite a lot simpler. The compute node needs to produce a hash as it's artifact right what algorithm do we pick well that's now down to your choice of implementation but it's a hash of some kind that needs to line up across all of the compute nodes that have actually performed the work there's no way you could know what that hash is if you didn't do the work. The artifact would be encrypted between the compute node in the request and now so nobody can just copy what you've produced the requester node side of deterministic compare the hashes is fairly simple. Who who has not got the correct hash is the way that you determine an outlier. ZK snarks for example it's like the compute node needs to produce a proof. I mean how does it do that that's up to the implementation that we plug in for that verification. As in you'd need to produce a ZK snarker having done the work at the other end it's comparable of the snarks we keep continue on that vein as in it should be fairly trivial to implement whichever verification strategy we need. The core thing about this slide is how do we how does the compute node and the requester node interact with this verification strategy and the concept is the compute node produces some known format of evidence. And that's a hash or a memory trace or a ZK snark or the other thing that we're plugging in and then the requester node knows how to compare those artifacts and how do we manifest this as an interface is a very interesting question. This is this is I think where we are in this room today and tomorrow let's talk about what should look like we will then go and implement all of that. That's where we're at in the project. The prototype that you've seen has no interface for verification right it just wraps the job in PS record and uses K means clustering at the other end. So we need to extract that into an interface and then hopefully write lots of old services. And I just give a concrete example of this evidence types to help clarify the idea which is that suppose you have a Docker runtime and a wasm runtime and you have a trace based verification verify right. The trace based verification verifier. It operates on CPU memory traces so it declares that it wants CPU memory traces the Docker implementation of how you trace the CPU memory might be very different from the way you do that in the wasm runtime. But because the evidence type is it specifies the specific format of like I want to CSB file where each line is a one per second sample of CPU memory and they can they can both produce an artifact that can be consumed by the same verifier without the without the trace based without the trace based verifier needing to know specific details about how wasm and Docker work. And that's kind of the best way I have to describe the point of this kind of decoupling. So yeah hopefully it's a useful idea. I'd love to get people's feedback on it. And then this this is somewhat simpler the computing storage interfaces like I mentioned IPLD programs which is very interesting like let's express the program that's going to be run as IPLD if we've done a good job of these interfaces. That should be an equivalent implementation to I want to run the following Docker image that's got my work in it versus I've already compiled the wasm binary and it lives here run that for me. This is I want to start a firecracker VM and just run arbitrary commands like it should be possible. Depending on the nature of our workload and which verification engine that we're choosing to use with that workload to heavily underline that clarification like it should be possible for this network to support a variety of compute interfaces when it was actually running the job. Now we might choose to stay with like in our first iteration right we're going to do wasm with deterministic workloads like the interface should make it possible to do. Any other version of compute that you want the same. I think it's true of storage we're going to start off with IPFS. It's an interesting concept though with the actual interface because when we say storage interface it's more about like how do we how do we manifest the storage to the compute job right so how the. How the prototype works at the moment as Luke showed in a slide that we use IPFS mount to get at the files that exist on the same. The IPFS host we need to manifest those files inside the VM so okay well just IPFS mount IPFS demon dash dash mount inside the VM and we can now get to all of the files on the house that's one example of what we'd call the storage interface so it's less like. Hey here is a type of storage it's more how do we actually manifest that storage and the data into the compute job to be read from the current location. Another example would be like concept of volumes so a bit like how Docker volumes work it's like hey I've got this storage somewhere on my host I want it to be mounted in this location inside the workload and then as far as the workloads concerned you have no idea. About where that data came from on the host you're just mounting it a known file path and the job expects the data to be at that file path so. The storage interface looks after how that actually happens and it's not easy it's not the easiest of things to implement because for example with our prototype like how do we bootstrap IPFS demon dash dash mount like now the compute node needs to kind of think about that because it needs to do that before the job runs as a whole bunch of like different approaches we can take especially with using the. Docker there's a there's a simple way that we can have an IPFS container that's almost like a side car to the job. My brain is now going into I want to implement this interface for one use case I think the whole point would be let's collaboratively take a step back everyone else takes it but I've got this other type of storage car you've not even thought about it does that work with your interface I think this would be the discussion to have in the. Unconference right because that's all of our brains into a pot and stir it around and come out with some interfaces with means we've got a good we've done a good job and so yeah some some examples I think of how this fits together so. With a case snarks like the the artifact that the verification module on the compute node is producing is a snark and the requester node knows how to consume the snark to check who has performed correctly or who has slide the homomorphic encryption like this almost is a case of. I think this is just a good example of something that should just work rather than having to build an implementation for because in my mind I might be wrong and happy to be corrected but like. The data on the storage driver is already encrypted so we're just mounting that data into the compute job the compute job already understands it's working with a home homomorphic. Storage volume whether there's any any actual hardware that's needed to do homomorphic encryption I'm not sure. Not necessarily right so the point I suppose I'm making is homomorphic homomorphic encryption should be possible at the user layer as opposed to for us to have to do anything specific to support it but it's an interesting example to throw up on the board because. Like maybe I'm wrong when we say that and it would be interesting to check if that's true and do we need to do anything in order to support homomorphic encryption question mark. Optimistic verification is a good example of what we're already doing is and we're just going to assume everyone's going to be correct and we'll check after us to see if they're correct or not as opposed to the deterministic hash based system which we could almost say. Isn't optimistic it's kind of you know we can immediately check if people are lying because they don't have the hash just lining up. I think the point of again like just to. Be a broken record like the point of this is to add to this list of working examples do a sanity check like do these examples fit with what we're planning on doing with interfaces and emerge with. A high level of confidence that our interfaces cover. All of the use cases there you go there's there's basically we just get like. That's right yes yeah.