 Hi, everyone. My name is Dmitri. I'm from Ocean Protocol. I've been working around decentralized data for a few years now. We started up with looking at intellectual property and creator rights in a scribe. Now we moved on to how can we create open source decentralized databases like BigChainDB. And it came down to something that's now is Ocean Protocol where we're looking more at intelligence. Data without humans, without consolidation is not much worth. So we're looking into public intelligence networks, incentive systems around those. And it's early days for many of these things, but the paradigm shifts that we're employing are interesting and promising and at least it's something else. I have a background in machine learning for, let's say, microelectronics modeling. And I used a lot of predictive models to check what's the actual performance of such designs of microelectronics into the real world. And some of these things would take years to compute, taking into account all these process variations and stuff. So we would make models out of them, accelerate. And we always were looking for a better model, but rather just putting in more data would have given us the same results without basically being skeptical about is it a support vector machine, a neural net, some Bayesian network. So it's shown that you should just feed more and more data to AI becomes quite powerful. And this data revolution has been starting for 15, 16 years and there's a few entities in the world that are really good at it. So they get most of the benefits. Just to give you an idea, we're working around something called autonomous driving. There's this mobi platform that tries to pull a lot of car manufacturers, fleet managers, you could say anyone with a dashcam. And it's all about looking at what's the amount of data required to create models that are safe enough to employ into autonomous cars. And it turns out that it's about 5 billion miles, maybe more. And that would take maybe 40 to 50 years to acquire for a single company. So pooling between companies might be interesting. So data pooling by itself is very interesting, but then also opening it up for an intelligence network of data scientists is even better. So we come to a point where we see that there is a lot of data out there, but it's locked up and only few enterprises have also the data scientists to consolidate that data to extract the value from it. There's a lot of AI startups, typically coming from big companies, but then they don't have the access to the data. So in the middle there's just a few lucky few, which obviously sometimes crew it up, or they... Well, you could say don't be evil is something from the past already. And yeah, data is a lot of value and a lot of money. And it leaks and it breaches. And yeah, we kind of got used to all these things. But I think giving a more equitable model for this is more sustainable for the future. So that's what we try to implement in Ocean Protocol because, well, it's an open source community. It's not really an enterprise. We're not out there to create money. We're out there to make something that's sustainable, that, well, can give you an alternative to existing data monopolies or AI monopolies. And it's all about democratizing access to data and AI or intelligence. Some core principles are self sovereignty, being the owner, the controller of your data assets, also attribution and provenance, and of course privacy. So trying to disintermediate between these two worlds doesn't only need something like monetization in the middle, but also something about the commons. And commons is something we tend to forget that it still exists. However, if you work in open source or in open data or in open science, then you could say the commons is the open source code, maybe that review, everything related to digital commons and media art. But it's something that companies don't really tend to implement. Sometimes they create an open source project in a service layer on top of that, and that's quite nice. But can we create something that's an incentivized commons? That's basically one of the things we like to think about. So rather than siloing resources into centralized economies, we're looking at pooling resources, decentralizing communities. So one of the first things we did one and a half years ago was creating a data sharing tool. It was about data access sharing. I'm just going to take a bit of water. And this was a project together with Toyota. And well, it was more exploratory looking at how can you open up or connect multiple data providers without actually leaking their data. So one of the big problems is that companies want to share data, but they don't want to have their data escape. So as in the previous talk was mentioned, that is possible. And you can use technologies like multi-party computation or homomorphic encryption or trusted hardware. But still you're relying on a lot of components. And I think if you look at data sharing, it's not the sharing that you want, it's the consolidation of the data. So typically you're looking at how can I prove that I run somebody's algorithm in my data center and how can I prove that I actually did it. And that I actually run the algorithm on the correct data and can I also deliver a cryptographic proof or some form of trust that vouches for my actions. So many of these things were encapsulated in this project and that's kind of the start of Ocean Protocol. We're working with a company in Munich who's called Connected Live and their privacy is of course, well, it's health data, it's health monitoring data. And there is very important to look at how can we preserve privacy and negotiate privacy and how can we make sure that there is no leakage of the personal identifiable information. Sorry, I'm having some problems with my... Yeah, yeah, I don't know how it happened. All right, give me a second. Okay. So Connected Live is a company in Munich. They're looking into DNA and other types of things, mainly related to Parkinson. And all the data they can gather is useful, but they don't need to know the source of the human. They just need to know the data and the labels around the data so they can have data scientists exercise their algorithms, make better predictors, and automate that process. So looking at these things, then I think things like, well, you could say, a fund for data science. Maybe getting hard time on my press today. Well, just looking into another use case, here we have something called the universal recommender, and think about all the data that you use for labeling what you like, what you don't like, the suggestions you get from Netflix, LinkedIn. All that personal preference and opinion is now somewhere scattered. But if you would only have like one minute of your digital life on a daily basis, what could you have for an AI assistant and say, what would be the most interesting thing for you to do now? What would be your most preference action or best service to connect with? So can you gather all these personal preferences into your own digital self-sufferance sphere and look at my personal universal recommender? It could be trained on humanity, but it could be personalized towards yourself. Think of it as a form of AI assistant or coach, learning of all the actions you do, but staying on your control on your self-suffering axis. So all of these things are quite promising. But then, yeah, why would you use a blockchain to help here? I mean, it's about self-sufferance, it's about privacy-preserving tools. A blockchain is just something that connects these things. And I think I have three versions of a blockchain that explain what are the aspects that we think are interesting to include in this data-sharing economy. So first of all, a blockchain is a chain of blocks, which is all about provenance. And provenance of things that happened in the real world, claims, proofs, and they have consensus power. This means the proof of work that a Bitcoin blockchain does or a proof of stake of other blockchains. And sometimes I could just compare it to a bit like carbon dating. It's basically a forensic tool that allows you in the digital age to say that, well, somebody made this claim at this point in time and it's been agreed upon by a majority of consensus. So in Ocean Protocol, our applications are data-service supply chains. They're basically a combination between data, algorithms, and compute. And this means that an algorithm can run on data in some compute environment and then store the result for inspection or maybe for annotation. And what we want to do is, well, we create time-stamping processes in the middle that actually say that, well, this person actually delivered data. This entity delivered algorithm. This compute did its job. It combined the data and the algorithm to store the results somewhere else. And all these actions get recorded as a prevalence tree onto a blockchain. So we're not only looking at single providers, but for each of them, like a full network that you can say, well, it's an inter-service network. It can combine any type of service for data or algorithms or compute into a full network. So you could always choose what's your... what's the cheapest, what's the most decentralized, what's the most privacy-preserving, best encrypted, what have you. Another way to look at a blockchain is a world computer. And there we have smart contracts running us kind of unstoppable codes onto a world computer. And for us, it's about creating agreements. And agreements around computation is, well, where does actual computation is going to take place? Is it on my site? Am I going to download the data? Or am I going to send out my algorithm towards a trusted provider, towards the data owner? And we have to have this negotiation. And that negotiation goes into the conditions of a smart service agreement or just, let's say, a smart contract containing a service agreement. So this is quite interesting because it gives you a bit of mutual protection between sites. If you're a data consumer and you say, well, I'm only going to pay if you actually show good behavior on, let's say, good behavior is running my algorithm onto your data set. Whilst the data owner could also say, well, here's a little bit of warranty that if I behave badly, you get basically part of my stake in the game. So here you can play with negotiations and agreements between parties. And I guess if we're flexible here, we can encompass a whole range of use cases, not only look at privacy preserving, but also public data. Now, data science is a bit of tricky. What we have is that we have to combine at least three ingredients to get intelligence. Data and then human work in an algorithm and a compute layer. Now, obviously, not every configuration is perfect. So private data doesn't want to leave. It's a no-data escape. But big data also doesn't really want to leave because it's just too big. You can't push it over a wire sometimes. There's this thing called, I think it's called the FedEx bandwidth. And basically it says that at some point, if the data becomes too big, it's more cheap to take out the hard drives, move them in the FedEx truck across the country than sending the data over the wires. So assuming that big data doesn't want to move means that you have to bring the algorithm to the data side. And that's something you'll see in a lot of hardware manufacturing processes as well. NVIDIA is optimizing to put more compute facilities around the data storage site. So this is your negotiation patterns that you have. For us as a service agreement, well, it's a smart contract, has three parts. One of them is a service that you can detect on a metadata store. Then you have conditions, which is cryptographic challenges that need to be solved. And then you have a reward section for payouts. And the payout is according to the service quality that's being validated. So each actor in the system might have its own conditions to bring to the agreement and its own terms of payment. So creating a flexible platform is part of the ocean contracts that we're providing. And many of these conditions aren't, well, ideally they're cryptographically secure, but sometimes they're more subjective. And then you have to have stuff like human creation, dispute resolution, albeit legals or not. One of the use cases we have, and we created an infrastructure around that, is where there is a secure sandbox on the data provider side. The data always remains within the data provider. And then data scientists can submit algorithms to this data science sandbox. And the data science sandbox generates proofs that it actually runs this algorithm on a data set. And those proofs are the ones that go into the ocean network. And the access control for allowing the algorithm to go into the sandbox is what's happening in the service agreement. Yeah, we're creating a few interfaces around this, like for data discovery, but also how you would use ocean in a Jupiter notebook, which is, well, we just want data scientists to use their own tools, but just have this additional superpower, which will connect to more data sources, find and discover, and monetize data, what have you. One of the things we're looking for, that we've also been working for a lot, is how can we signal relevance of data sets. And there is this interesting concept of creation, or tokenized creation, that allows people to basically lock up some portion of money, saying that, well, I think this data set will perform well in the future, it will become popular. So I'm putting a bet on this data set's popularity, and if it gets used more, I get a bit of interest from that bet. The last version of a blockchain is an incentive machine, where we're looking at, how can we have a blockchain that rewards people with minting or creating new tokens and assigning them to people with good behavior? One of the earliest examples is, for example, looking at the Bitcoin network. Here people have been adding compute power to the network, hashing power, because they got rewards for it, they got Bitcoins. And what you get now is this huge supercomputer that basically has custom A6, custom chips, has people operating them in mining farms, and there's a lot of people acting around this network just to get a Bitcoin reward. So you could say this is a quite effective incentive machine. Now, it's also a bit of a dangerous incentive machine because it only has one incentive that is add more computers to the network. But security networks work more and more. And then you can create something like, well, here's a paperclip optimizer, showing that maybe we should be more careful in designing these optimization functions or these objective functions. So what would it be for open data, open science? I think there's a lot of debate that you can do about incentives for open and common science. What would be good incentives? What would be bad incentives? I don't think a single objective function would fit, but looking at the core values of this network where there is self-sufficiency, there is attribution, there is privacy, incentivizing maybe relevance and quality, openness. So everything that can be kind of proven or at least vouched for in the real world can become a lottery ticket to a reward function. That's an interesting mechanism because then you can earn tokens by having good behavior and use those tokens to consume more services or get a bit more governance or control in the network. So if I look at behavior in myself, I have like most humans, we have two actors in our brains. One of them is a rational decision maker, the secondary brain part, and then we have this primary monkey in our brain which is all about instant gratification. And those two have to be taken into account when creating an incentive system. Now, for ocean, we have something called network rewards. It's basically a big portion of the ocean tokens are locked up and released across the operations of the network towards what we think is good behavior in the system. Of course, what is good behavior is subjective and is governance. Now, for us, our initial design function is we want to maximize the supply of relevant data services. Data services, data access, storage, algorithms, compute, creation, so part of that. So if you can prove that you delivered one of those services in the network, you get a lottery ticket and every ten minutes or every hour you get the chances that you get drawn from the lottery and you get tokens as a reward. This means that, well, governance becomes more and more important because it's about how you steer the network function, the objective function in the network. And how we're thinking about this is, well, there will be a form of global network governance talking about protocol updates, but if you have then specific, let's say, data pockets or knowledge tribes, then they have different missions and visions. They might also have different types of incentives, tasks, bounties, and things to be solved. So grouping these things into more clustered tribal systems, we call them, but it's basically communities. It's part of our design. As a last slide, I just want to think about a few things that might be interesting for discussion. So for science and research, I think provenance, creating trusted knowledge graphs, curated knowledge graphs, maybe think about if you create a dynamic paper in the site and if you click on an experiment, you can automatically see the data flow from previous experiments until the source. Public funding and rewards. Currently it's institutionalized, but I think many people would benefit from a kind of more open funding system. Governance around research projects, publications, what gets accepted into what not, the communities. But also just signaling of emerging properties and locking up a commitment saying that if this gets accepted as a project, it's actually going to release my signaled funds, hence promoting more and more emergent problems and things that people care about and not only like institutions or governments. So feel free to have a chat with me. Maybe you can flesh out one of these things. Yeah, that's it. I think there's also, well, there's a lot of, if you want to know more about ocean, there's a lot of things you can do to help or to just check out some of the code we're doing. We also have a big bounty system where, well, we already completed a few interesting ones, but you can also just suggest a bounty saying that, well, this would be cool to do with Ocean Protocol. Can you guys provide us with some tokens or everything related to the bounty infrastructure that we are using? So that's me. If you have questions, go ahead. So Conrad promised us it would be a complex world, but thank you very much for that thoughtful, well thought out explanation of the inside working. So I think I've seen a hand up at the back. Thank you. Thank you very much for this technical details here. How far are we away from playing around with this? So I mentioned before that I couldn't find really something that was so really useful or where we actually could put something in practice. How far are you away from this? So let's assume I have a small use case, an academic small use case, and I want to run that. Can we do this tomorrow, or do we need to have to wait? Where are we standing here? I think you can do a lot today. We have a thing called Fit Chain, which has the on-prem compute with containers. So you basically, what you do is you put your algorithm in a container, you ship it to an on-prem compute, and those two services are enabled. In two weeks, we'll release our second version of the test network. So you also have, you don't have to care about running Ethereum instances or nothing like that. And then you can plug in some of the Jupyter notebooks, have a few modules around that as well. Yeah. I mean, just check the GitHub. There is a lot of things popping up every day. So that's cool. Thank you. Thank you. How do you go about preventing the rich multinational organizations or enterprises dominating the open source systems? Yeah, that's a good question. If you create decentralized systems and then you have a few whales entering, and then they basically centralize this stuff again. Well, I don't think, well, there's a few ways. You either have different incentives and saying that for people to participate in the system, you have to make stuff for free and open for the commons, and that's already deterrent for many multinationals. Other ways could be saying that, well, we have a tribe and we set up a local governance system. And in order to become a member of this, you have to be part of an open public society rather than part of a company. So what we want to avoid is any type of threshold for people. It has to be a permissionless system as a bottom. If you create the Internet, you're also not excluding big companies because they basically build it. So the protocol itself is agnostic. Local governance systems could help choosing what incentive schemas you want and what not. Thank you for your presentation. At some moment, you mentioned a secure sandbox as part when the, for notes that actually execute. How do you ensure that there is no data leakage at that point? And if this sandbox imposes some sort of constraint on the complexity of what you can run inside them? I think you summarize the trade-offs really well. We've been looking in things like SGX and MPC. And I must say SGX has memory requirement that's just not feasible for big data. I think it's 265 megabytes you can run in the enclave and that's just not going to work. Having no data escapes, for example, if you have a one nearest neighbor model that you sent to, and basically it maps out the real data onto these model parameters and sends it out again, that would be an attack through the model. I think just having common practice in saying that if you are sending data or allowing access to data in your own secure sandbox, make sure it's anonymized, make sure it may be already clustered or aggregated. What you can do and a few of these things that we are doing is curating the templates of what are the typical types of models that at least enforce an aggregation function for native, not no raw data escapes, but more seeing first make sure that the end layer of your model at least has an aggregation function that it doesn't like also like have the raw data points exposed in the end. I think it's the responsibility and also the right to choose where your data goes and how you treat it. At least the sandboxes are running on the provider side, so that's where you can do any, well, if you don't want to leave your premise in the end, then you just block it. Thank you very much. And I want to quickly thank all speakers for their thoughtful insight on their data work. Thank you.