 Hi everyone. Good morning. Thanks for coming at this early hour. Hopefully no one's to hungover. So I'm going to talk about Enigma and specifically our decentralized data marketplace protocol. What's the... Make sense. Alright. Hi everyone again. Good morning. I'm going to talk to you about Enigma. Enigma is building a decentralized data marketplace that is part on-chain on Ethereum and part off-chain. So the background to Enigma starts at MIT in 2015. We launched the Enigma project from there where we basically conceptualized a decentralized computation platform that can guarantee both privacy and correctness and with better scale than just block-sends themselves allow. How many of you here have seen that white paper or are familiar with the Enigma project from that sense? Alright. Not bad. Not bad. So cool. Yeah. It's been cited quite a lot. I think it's one of the more cited white papers in the space. Downloaded quite frequently. So I guess there's a lot of people who share our vision. Anyone can help me with the slide? I have to go closer? Cool. So some about our motivation. So our motivation, data is everywhere. It's the most valuable asset right now in the 21st century. But right now the situation is that only a handful of companies can really capitalize on that. They're hoarding it. They're monetizing it. But that's not something that the open public can take part in. And that is what we are set to change with our protocol. I'm going to give you a quick outline of my talk. I'm going to first of all speak about what is a data marketplace at a high level? How does the on-chain work with the off-chain? Who are the stakeholders? And how are we thinking about designing this? At that point I will not focus on how the network stores data, how computation is done. That would be the second part of the talk when we go deeply into that. Again, given time constraints, this is going to be pretty high level. And finally I'm going to give you a taste of Catalyst. Catalyst is an application that we're developing that is going to be the first one to live on the data marketplace. Alright, so at a high level the Enigma data marketplace, it has three main stakeholders which are pretty obvious and trivial. You have data providers, those who want to sell data, data consumers, those who want to consume data. And on the off-chain you have worker nodes, which are basically just nodes who don't have any stake in the data buying or selling, but they are the ones who are actually starting the data, doing computation, answering queries, and so forth. So my focus right now is going to be mostly on the contracts between the data providers and data consumers first. So starting with the data provider. A data provider comes, has a data set that they want to sell. What they do is, first of all, they provide enough context to the off-chain network. They basically need to register the data set. Now that could be in one of two ways. Either they upload the data to the network, and I will show later on how that's done securely. Or they can actually store it in some of their own nodes and just provide enough context to the network on where that data lives. The network then computes some kind of address, like a permanent address, kind of like a DNS, that the network can then route consumers to that data set. The data provider then takes that address that unique address, submits a transaction to the off-chain network, also puts the price of the data in some other metadata, and also stores a deposit, which I'll touch a bit later on. Now that the data, that data set is registered on-chain, there's a reference to it off-chain, a data consumer can actually come in and send a message to the blockchain saying, hey, I want to subscribe to this feed of data, and obviously they need to provide payment, and they also need to provide a deposit. A deposit is there in case the consumer defaults in their payments in the future. So one note about the incentive structure here, really what Enigma is trying to do is it's trying to kind of tokenize or capture in a token the value of data. We're trying to make something that's really implicit today, and kind of, I guess, stuck behind walls, we're trying to make that more open and very, very, very explicit. So there are two ways in which the incentives kind of work. The first one is the value of data, directly, that's an explicit metric, so a data provider basically sets a price to what their data is worth, and then the market either buys into that or no. That's the explicit value of data, and they have to share their earnings with the nodes, the worker nodes in the network that are doing computation, that are doing the storage, and that is really where the implicit value of data is also reflected. Another note about discovery and ranking. So we expect to have eventually many thousands of data sets in our system of all kinds. We can employ some statistical methods, we can tag data sets, but at the end of the day we need to find some mechanism to run data sets that are kind of in the same bucket. And the best way we see to do that is by directly correlating the economic incentive that is locked in a data set with its rank. What I mean by that is that simply the amount of money that people have bought into that data set plus some factorization of the deposit that the data provider has put in. So basically the idea with the deposit here is that data providers need to kind of put their money where their mouth is. That sets the rank of a data set and kind of break ties. Now this is a good way to transition to the off-chain computation and storage. Again would be pretty high level, but I want to go through some of the main ideas. This is really where the heavy lifting happens, off-chain, and we'll talk about that next. So the main idea is, and that's the main idea that we also set in the 2015 white paper is as follows. Global consensus is expensive, I think all of us here are aware of that. It's not scalable, it's very pricey. We need to figure out a way to basically segment the network and for different computations, for different storage, different data sets, data blocks that we're storing we need to make sure that only some of the network is utilized. And we need to do it in a way that's secure enough. Again it does, it's not as, you know, won't be as perfect security as like global consensus but good enough for, you know, decentralized needs and scalable enough. There are a couple of good ways to do that. You know, first of all let me touch on the stakeholders here. So you probably know this as sharding, that's kind of what's been popularized in the space. I'm going to use quorums. Quorums is more of an academic term saying that you have a large network, you select like a committee and that committee is really in charge of like some portion of the data, some portion of the state, some portion of the computation. So what we're doing basically is, you know, for every data block, every data segmentation, every data set that's coming into the network the off-chain network needs to go into a randomized protocol where they kind of like flip a coin and use that to randomly select a subset of the network. Likely small it's big enough that it's secure but small enough that it's scalable and to select that quorum, there are several techniques. Most of them use threshold cryptography or secret sharing which is very similar. You can also use random beacons on the blockchain which are very fast but they have some trade-offs in terms of the entropy that you can get. But with kind of like this black box, you can actually for each data you can select only a handful of nodes that will be in charge of that and that's where you kind of get like your scalability properties. Sorry. But that raises the questions. Once we have a quorum for one data set, right, and that quorum needs to make sure, we need to make sure that that quorum always runs the computation correctly over that data set and we also need to make sure that that quorum itself cannot actually see the underlying data if that data is sensitive. These are the tricky questions that we have and I'm going to talk about that now. So this is not an exhaustive list but this is meant to give you kind of like a quick overview of different techniques in the literature for handling outsourced correct computation with privacy. So let me go very quickly over that. First we have, you know, blockchain blockchain is, you know, fully centralized, gives us very strong integrity and consistency but it's really bad at keeping secrets. Like there's zero privacy in it and it's very, and it's not scalable at all. Then we have something I like to call partial encryption. Not a scientific term but that's like an umbrella for like order-preserving encryption, deterministic encryption, kind of like encryption methods that are not perfect but give you some confidentiality for the data but zero integrity. And then we really get to the interesting stuff of like the really heavy machine guns of cryptography. We have fully homomorphic encryption which is would be the best if we ever figure out a way to do it fast enough. Right now we're not sure how to do it. There's Snarks which are great if you want to prove a statement and then verify that many, many times but for other use cases it is unfortunately limited and you have multi-party computation which we feel is a good trade-off. Multi-party computation gives you, you can basically compute anything with privacy, with correctness under some assumptions. It is more scalable than the other cryptographic solutions because it really only uses symmetric cryptography but I do want to emphasize that it is still much slower than like hardware-based solutions and of course computing over unencrypted data. And this is why actually we at Enigma we're focusing, it's a bit off here, we're focusing on both of these technologies, MPC and secure hardware. So secure hardware is stuff like Intel SGX and trusted execution environments. We feel that MPCs is great for like applications like identity where you want to have zero trust but if you're okay with having some minimal trust in the vendor which is what secure hardware requires you then you can really get full functionality great scalability and privacy. And with that I'm going to yeah I think I have enough time. I want to walk you through kind of like the basics of MPC, right? MPC is an amazing technology, it's something we've been working on for a while and that also would help to illustrate kind of like the strengths and weaknesses of it. So MPC says something like this imagine there's an ideal world where we have like this Godlike computer that we can trust and we can outsource every computation to it. Now that trusted machine would never leak the data, no one can breach it and we can always trust it to run the computation correctly. But in the real world that's not possible so what MPC comes and said let's simulate that trusted machine, let's simulate that with the network kind of like the argument that blockchain is making by the way. Let's simulate that with the network and the statement says and this is a simplified statement that as long as there's at least one honest node, at least one node that is not a bad actor you can actually make sure that the computation is correct every computation and that no node in the network, no one in the network can actually see the raw data. And that basically means that data remains encrypted end to end. Let me walk you through a simple example. Let's imagine we have a network of three nodes a very small network and let's say we have the data provider on one hand and then a data consumer. I'm ignoring the blockchain here I'm just talking about the process of the off-chain storage and computation. So let's say a data provider wants to store some number X what it does locally it basically goes through a protocol called secret sharing that protocol essentially splits that data into shares and encrypted shares and sends one share to each node. So node one has X1 node two X2 and node three X3. These shares are completely encrypted no one and the others cannot see anything about the raw X. Then we do the same thing with Y and then actually the data provider can go offline data lives in the network now a consumer comes in wants to run some computation the state of the network is that they collectively hold X and Y but it's encrypted as shares in each node. Now the consumer comes in and wants to compute just X plus Y so because of the properties of secret sharing actually that comes for free. We can just each node can really compute their local shares which give them the encrypted summation and then they can just send all of the shares back to the consumer and the one thing about secret sharing is that if you have all the shares in one place you can reconstruct you can decrypt the data but the important part to realize is that there was a computation going on in this network and we can extend that network as much as we want but none of the nodes in the network have actually seen the raw data. They have just operated on fully encrypted data which is pretty amazing. So let's run another example with multiplication. I'm going to go fast over that it's like more complicated I just want to give you the main ideas so multiplication requires communication between the nodes they can just compute it locally but we know that if you get all the shares in one place then you can decrypt the data so we need some trick we need to avoid that so what happens in MPC and that happens here but that really happens in every like if you're writing a VM and you're implementing any kind of protocol you're actually going to use the same trick over and over and the trick is simple. Whenever you need to share some information with other nodes you're going to re-encrypt it using some like a one time pad and then you're going to send the information so that's what the nodes are doing here then they're communicating that information with each other and then they're doing some more local computation that if you kind of plug the algebra in you would actually see that the Z's here end up being exactly the encrypted shares of the product of XY and then you can send that back to the consumer and the consumer again having all the shares can pull them back in and decrypt the data and the network saw nothing which is and that's the point now why did I show you addition and multiplication other than them being like relatively simple well there's a nice theorem saying in computation theory saying that you know if you have addition and multiplication you can build any circuit essentially you can compute anything that you want so these are the building blocks that we need there are I mean a lot more complicated protocols how do you compare two numbers you know how do you do like floating points and fixed point operations many of these obviously I'm not going to get to here that was a lot of my T's actually doing in working about improving these protocols we got like between 10 to 100x speed up on quite a few foundational protocols but unfortunately I won't have time to discuss that here so that's it on the hard stuff I want to finish with like something simpler and kind of nice we feel that in order to bootstrap a data marketplace we need to also introduce some use cases right we need to set the demand side we need to bootstrap it with some interesting data and given the space we're in given what we're interested in given the state of everything that's related to like blockchain and cryptocurrencies data we felt that building I'm sorry in investment platform for cryptocurrencies where all the data that comes in and all the data that is like stored and is going to be kept on the enigma data marketplace and that kind of sets and boots up the network and also allows us to stress test the protocol as we are building this out now Catalyst is still centralized because the protocol is not yet live but Catalyst is live and operational if you're interested in algorithm trading like using crypto data for doing some research I really welcome you to try it out it's on our website this is one cool example so someone from our community has actually used Catalyst to build a model of Markovic portfolio optimizations on crypto assets which I think is fascinating we have a few other examples like that in our blog and I welcome all of you to really look into that and with that I'd like to finish if you're interested in you know off-chain computation the future of data, data marketplaces anything that I discussed right now please come talk to us I'm here John is John show your hand John is also here John is my co-founder please come talk to us and if you're excited about this and like you really want to be at the forefront of solving like really some of the most interesting and hardest problems like we're hiring people so come talk to us about that as well we'd love to hear from you thank you