 Hi, great to be here. I'm Marcus Jones. I'm working at big chain DB who is implementing the ocean protocol project So I've been at big chain and working on ocean directly for I guess a year and a half now And I'm a data scientist and that's why I was brought into ocean to help flesh out the concept for data science And what does that mean for ocean protocol? Just a quick survey. Who has heard of ocean protocol in the audience? Okay, quite a few people awesome I Also have a question that I usually so actually I give a lot of talks at Data science through data science audiences or crypto audiences So it's really fun to have a different type of audience. Let's say some people coming from Scientific publishing for example I always start with a question. Who here has used a DAP in the last week and Was that DAP your own DAP or a different okay? That's interesting. That's really interesting. Why? You know, we've had for example the Ethereum network has been around for so long and people have promised a huge adoption and it's going to change everything But actually there's not a huge amount of actual usage and we can come back to that in a bit Yeah, so why is it interesting for me especially here at blockchain for science. Well, I've Had some interaction with with scientific publishing for example, and I was quite fortunate at university my mother was in the library department and After I graduated she still gave me her access. So I went to after graduation. I was working in a research agency and it's a national research research agency in Austria and Funny enough, they didn't they could not afford to have Access to a proper I guess like when I was at school. It was called compendix for example or Yeah, you all know what I mean like these big Compilations of papers and actually my research agencies a national research agency of australian it simply could not afford the subscription fee to have access to scientific papers and a lot of people come up to me and I would give them my my mother's password for this account and it was Not quite about board, but seem to work at the time until she then retired and then I lost access And this was a really dark period for me. So It's when I really hit me like okay most this is this should be based a basic human right like getting access to scientific knowledge and so I'm yeah, I was quite interested to prepare something for this talk and Just yesterday hacked something quickly together. So it's going to be a bit of a demo at the end, too Okay, so I'll talk a little bit about the some motivation why does ocean protocol exist and Then we'll go over what is ocean protocol and Talk about what does it mean to be an asset in ocean protocol? Talk about some research topics and then that sort of demo thing at the end so We know that data is valuable and There have been a few studies, but we don't really know what this data economy is, you know, what is it really? Like where do you draw the boundaries? So there's companies like Google. They're fully data-driven. Let's say they're AI first even so that's another term That's that's become so first you let's say maybe you digitalize And eventually maybe your company becomes AI first what does that mean? That means all of your business processes have to be in support of of artificial intelligence So some numbers So let's say many trillion euros are globally. I don't really know and of course it's a very You know one of the reasons it's hard to define is all of these steps like From from generation Processing exploitation, how do you exploit the data? How do you generate value from it? Why well one huge driver is is artificial intelligence? So we have this this result Yeah, almost 20 years ago now, but This result showed that Indeed when you put more data into a model the model improves Which at the time was was quite surprising especially the people that were researching these algorithms because they were paid there to increase the performance by a few percentage points by spending a year researching how to squeeze out that last bit of performance when actually you could have just collected more data and Use that salary that we're paying that person. You just add more data. This is especially true for deep learning. This is a study at they went to a NIPS conference and they were asking all of the researchers there. There was about 300 people in the study When do you think deep learning or let's say AI in general will replace some of these tasks So we've seen already image recognition We can exceed human performance already and then of course these amazing results from the game of go Where we're also exceeding human performance. We're beating grandmasters and Part of that is due to this huge amount of data that we're putting into these systems So data is valuable Another challenge is This concept of orchestrating all this data so We're producing just by existing data with our smartphones smart devices IOT devices and Then of course in the corporations and enterprises and just everywhere There's exponential growth of data, but you also need to organize and orchestrate it. So orchestration really means automating this flow from from raw data to getting value out of that and There's a lot of issues. For example, provenance. How do you track? The how this data set is transformed through your business and who do you blame if it goes wrong for example and This all leads to a loss of potential value and then of course we get into the next level, which is AI orchestration So now throw in to this original data orchestration issue So not just data now try to track the models. So now, you know when Let's say when the model goes wrong. How do you how do you fix it? Why? Why does it break? Just just the versioning of models the versioning of the data that's used to train the models the new incoming data just very Yeah, very complex system. So that's part of the motivation for us. So the AI orchestration or The data economy I like to use this term AI ecosystem. It's we don't really know what it looks like Actually, but this is one of the key motivators for why ocean protocol as a project was started So yeah before go on like just in a nutshell ocean protocol is a is a decentralized Protocol so it's it's not a it's not an application. It's literally a protocol at the core level and that Protocol enables you to accomplish some of these goals of orchestration for the the AI ecosystem One key aspect of that is this concept of a static asset. So we're gonna start there. So as a data scientist This is a simplified view of basically your entire life as a data scientist So you start with data you create some model pipeline this let's say it's a Jupiter notebook and you have a bunch of Scripts that are going to be coming in and they're going to take the raw data They're going to transform it and then they're going to train a model. That's happening on some execution server. So compute It's then going to produce the trained weights or it's going to try to produce the actual model Then you take that model into production and you generate some value. That's a whole purpose Existing as a data scientist. You want to eventually get some value out of the result And of course, this is very simplified because you do this in a very iterative way and you go back and you Evaluate the metrics and you say oh this didn't work. Let's do it all over again with Different transformer on the data or a different model Okay So the first thing that you're probably aware of also if you've heard of ocean protocols that we can register data on you on ocean protocol and that's indeed what the current status is you can register definitely data onto ocean and so When I look at it as a data scientist, I see more I see actually you can register any type of static asset and that's that's simply a thing that Is not dynamic. Let's say it's it's static that means you access that thing once and you get it and And that's basically the end of the story So that's that's sort of a static asset, but there's a reason why I'm using that word Static asset because I'm going to contrast it with a dynamic asset where I the generalized asset Another interesting thing is that this whole workflow could also be a static asset. It's just a composition of several of these Dids so that's a core concept in ocean protocol. It's a decentralized identifier, which is a web 3 standard I think it's still in alpha But we use this extensively so it's basically basically just the url for An entity in in the network in the blockchain network It's the identifier and that DID resolves to something called a DDO Which is a DID document which describes that that asset So yeah, by the way, this is this is what more a more realistic workflow looks like so many many many many steps like little transformations and then getting the model and then at the end we do a bunch of training and cross-validation and then Actually, we do that all like 10 times after another Okay, so again like a static asset has a DID So that DID is what's registered on chain. That's the pointer to the asset Interestingly for ocean protocol. We also assign it owner. So that's a public address So that means you an ownership of course is another important concept in decentralized tech is How do you manage the rights that you have to that as as the owner of that? So in a sense, it's it's representing the control and with your private key that gives you the rights to do certain things your cryptographically guaranteed rights Of course, we also want to store for example to check something on chain. So that means that this data asset is now it's Let's say you could change the underlying the underlying data But if you do then someone's going to know because they'll just calculate check some and say you change that The DDO is the next thing. So that's the description of the of the asset. So that's the metadata And that's just a JSON document that we that we Store all the metadata for and then the actual asset itself can be stored in basically We're quite agnostic to that. So we have drivers for IPFS. We have a zero. We have S3 on-premise and the whole process of Accessing this data Then comes down to taking the DID the one that you want So you could search through the metadata and you like this one And then you request access to the underlying asset and then the server will send it to you That's that's how static assets work right now And these are the main components at a very very high level. So we have the user Interacting with some library and that could be the Python library We have a very detailed Python library and a JavaScript library for the front-end and they're both in line with each other. So the basic basic terminology is the same and The this is our metadata storage and then this is acting as basically a proxy So it's going to hide the data and it's going to control access after you after you purchase or fulfill the conditions to access that That underlying So I'm a bunch of examples. I'm going to Let's see switch over really quick If I can I might need some help with the Mac Remember to just click this first and then maybe just be on standby on standby Just in case. Yeah. Okay, so this is this is our actual live one of our live production networks So it's actually one of our test nets. So I guess they can take a step back further and say Ocean protocol is a nonprofit foundation. All of the work that we do is open source for ocean protocol It's all on github and this is a what we call the commons marketplace commons marketplace is a way to register These types of assets that have a price of zero. So they're commons. Yeah, they're just free data sets Why do we do that? Well, you can go to our github page You can take this exact website all the JavaScript You can you can fork it and build your own marketplace change the change the logo and you've got your own marketplace change it to Instead of having a price of zero you change it to price of whatever you would like and you've just got a marketplace running on ocean So we have in this marketplace. I just added some a bunch of sample assets for different different data sets and we can just look at quickly how the publish flow works so let's just Yeah test asset and then here's one thing that was just released in September which is really amazing, which is we now support IPFS fully. So you take your Data set, let's say it's a zip bunch of CSV files. You drag them on here and And then they're uploaded IPFS in the background So you don't need to be responsible now for making that endpoint that underlying asset You don't have to put that in S3 yourself for example That that was one one place where we saw a huge amount of friction from users and the other the other source of friction That we saw from users. So again like when I go to a data science conference I ask yeah, how many dApps have you used and then the other question I have is Do you do you have MetaMask and people like what's MetaMask? I've never heard of that and so this is another key thing that we identified in onboarding your people is Avoid MetaMask as a first You know in the first instance and the first point of contact So we actually on every login we give you a burner wallet and then that's acting as your proxy and of course It's not You know you shouldn't use that forever, but it's for a commons marketplace totally fine and Yeah, then if I had a data set I would drop it in there and then it would add the metadata Just as text and then it would be registered and show up as Let's just take this first one here. So it has a category date and Author and then this is again this DID Which is the unique identifier for this asset. So the idea would be you you search for something Search for something and then you would get a file and Then it would start interacting with the blockchain and Checking it's going to check certain things like do you have enough money? secondly Can we can we lock that into escrow for you? So that's another key aspect and then finally it will start streaming that once everything is confirmed And that's a very that's a very simple example of what we call a service execution agreement Which can be a very complicated composite of of conditions that we fulfill including things like Let's check to make sure that this data set has been verified by three nodes etc, etc But that thought that type of functionality is not in this commons marketplace So to continue So again some key concepts at Ocean, it's all about unlocking data. How do we get there? There's a lot of a lot of stuff That we can go into detail on but some key concept for us So we want to be open source decentralized permissionless Be governed by the community and we want to support all these types of different incentivization mechanisms and different ways of Yeah, we don't again like coming back to this AI ecosystem. We don't really know what it will be So we want to provide some underlying technology which enables the community to decide how to How to build on ocean? So this this community government government is also very important like right now We're a nonprofit registered in Singapore with a board the idea and the dream is to Completely make that board obsolete and give that over to the community and that would be let's say governed by Dao That's the that's the target Okay, so lots of developer resources. So at ocean protocol calm You can also contribute to things like yeah What do you want to see next and you can make proposes spec for some cool feature? We have a lot of test networks and public networks as well Including the commons the commons marketplace which I displayed we also have a version of that which you're which you are actually able to Today change the price from zero to whatever you want use your blockchain identity and then you can sit back and People pay you if they like it yeah, so The next thing is Actually, we talked about we talked about these static assets, but in the sense Getting a static asset is a service right? You're you're calling a service and the service is streaming you a file So that's the more general way to look at it is that you get access to an endpoint and that could be for example some you know at rest API with you buy token to that and Okay, now that it's a rest API you basically unlocked all of web 2.0 with that So now we have all of our web 2.0 zero potentially at our fingertips So one of the first targets we have is compute So Yeah, so for example some examples You could say yeah, I'll give you SSH access to my GPU instance for some price per hour or I haven't I have a model you send me an image. I'll give you back the classification of that image. So let's let's say there's satellite images of The Amazon rainforest and you want to know if that that area has been deforested or not Well, that could also be an endpoint on ocean. You could say per image some amount of ocean token so streaming IoT data or training on Training on data and now that that's even more interesting. So now if I train a model and I'm doing it on your server and the result I get back is the model that I've actually never ever seen your data That's extremely interesting So that's actually for us one of the killer apps of ocean protocol. So That means you can train on data that you never see Okay, that's for example Data in an enterprise that's Segregated over the globe they have operations in China in the US the Chinese Data science department once again inside about training their their business process. Let's say their mobility company They want to learn how urban mobility works in general. They have no data. They want to look at the US data Then the lawyers get involved then you've got a huge compliance issue. What if the Chinese? The Chinese team is able to send their Pipeline their Jupiter script to the entity in the US train it there and then receive the model back so very very interesting operation mode and What about genetics data for example, so that's even potentially more interesting and more impactful and there's so this is all a very Interesting stuff, which we're currently researching so we have A client project for this exact topic and we have a lot of other Back in support for this and that's our next version. So stay tuned So some different modes. There's a lot of different things we can do. So for example, just So for example, you could also say but if I send you my training script Then you have my training scripts. Okay, then maybe you trust a third party So then you have a third party as a double-blind or Federated learning we can enable federated learning using this Maybe you want to enhance that with homomorphic encryption or Again, like this other this first case actually demonstrates this where the you're I'm not even receiving the model back All I'm getting back is access to an endpoint, which is the predictions of that model So I don't even get the model back. So there's no model escape either and Yeah, there's a lot of research on on this stuff So I want to get to to just to wrap up some so again like yesterday I was just thinking about okay, what can I what can I talk about and I'm really interested in this topic of of science and Science publishing and right now what we have is PDFs. Let's say and that's That's amazing quite frankly. This is a lifesaver for me. So I use this all the time I use our cave or archive all the time and I thought okay. Well, actually this could be this could be really interesting to build something like this, what if we what if we have ocean as well in the mix and then I was thinking what about something like this. So We have this paper deep image retrieval. It's a data science paper. There's four authors What if I was able to click on it and I got the source data What if I was able to click somewhere else and I got the model that they used to train? So their actual model How cool would that be as a reviewer? to click on something and I actually get the Not only the source data the model and even the pipeline in the notebook, but I actually get the compute server. So that's I click that and It's maybe it looks like this so I click and Then I'm Let's say I go through some purchase process or some access control process I start up an instance in the cloud with Jupiter notebook running So this is Jupiter lab. We're so at ocean. We also have another front end I can't really demo it right now, but it's because you need to log in but it's running Jupiter hub, which is your own Instance of Jupiter lab running in the cloud and it's already pre-loaded with all of the author's libraries It has the data. It has the pre-trained model and That's all Enabled Right at your fingertips So that would be very cool. How would that work in the back end? So for example in ocean protocol, we would have one asset, which would be let's say Yeah, just it's like a web, right? Like there's all these things that connect to each other So attribution network, right? So from Zarkam earlier So these are the authors so they would have on the edge. They would have the tagged author clearly Everything in ocean is a DID and a DDO. That's how we conceive of it and that's what enables So it's a really basic primitive that we're building on and this enables all of this So then you would have the core research and this would be linked to the pre-trained model So some type of composite asset The actual implementation would be a Jupiter notebook and that that would be so actually I went online and I searched and I found somebody that actually took this paper and implemented it for his PhD and he has a Jupiter notebook So yeah, that could this this PhD student could link directly to this research What if somebody went ahead and said actually this is very cool we'd like to offer this type of compute server and This compute server is what I just showed you It has it has a data the notebook the pre-trained model and the PDF research and the whole body of knowledge and It's organized. Let's say it's owned by a person. It could also be a DAO It could be blockchain for science now Let's say that these we all get together and we say we want to support this We donate to this or maybe we do it at cost. We pay for the Amazon cost ourselves. It's governed by DAO That'd be pretty cool And of course, you know, we can go on and on like so provenance. This is version two Let's store every previous version on there. Let's link them together Let's have okay. So this model was this VGD 16, which is the visual geometry group in Oxford. They train this model But there might be models out there that That aren't open source that maybe we'd like to purchase and maybe It's trained by group and we want to delegate payments to everybody that helped to train that model So we maybe maybe many many owners that are receiving partial payments We could also with this notebook this notebook could be living on the same instance as the data itself So you don't need to transfer the data. So there's one aspect which is just the inertia of the data So it's moving and it's you know, maybe takes you two hours to download or you can't download it at all Maybe sitting on premise. So now it's let's have compute the data this notebook. Maybe it's not free You can charge per hour this This this implementation is curated this this author has upvote some maybe or more inclined to To go and use this author's notebooks. Yeah, and so these assets I published Just yesterday and they're now in the commons network and of course with decentralization This could be anybody in the world that has access to the internet and there's no Censorship, etc. You guys all know that So it's very cool stuff. So road ahead So again, the core mission is to unlock data So we want to allow and incentivize this this discovery and access to data, which is let's say a current status Right now these composite assets are supported in principle But there's it's just a matter of changing the structure of the DDO to then so our structure of this DDO Which is the again the description of the DID? Or the metadata if you call it is completely flexible. It's a JSON and you could add an attribute set saying Yeah version like previous version and then you could link that so it's again. It's an attribution network completely flexible And the current focus of the core dev team is this compute the data Which is again very very exciting and then in longer term permissionless ecosystem. So Incentivization so a lot of the research topics that that have been presented by Paul and and Zarkam and others Yeah, you should join in the comes the conversation in our office It's it gets pretty interesting at times to talk about yeah, NFTs and dows and bonding curves and it's awesome So Then yeah, bigger goal fully permissionless and ownerless. So what does that mean that means ocean as a substrate or a public utility for for data so that means anybody has the right to access it and to use it as they wish and Yeah, on the on the on the right-hand side again just some you know, there's so much, you know, that's each of these topics Take one of these and then apply it to ocean protocol and you'll get another 30 minute talk Out of us and there's a lot of a lot of blog posts on a lot of these topics as well So we've had full-time researchers for a year one at least on on just curation and provenance Bunch of blog posts on bonding curves. I'm currently really invested in this federated learning aspect. So I'm now training Yeah models on data that I don't have access to for client and Yeah, again, that's that's that's ocean protocol and then at big chain DB Where my contract is that's for us Maybe a little bit different because for us ocean protocol is a project and for us what we're looking for is really about You know again going back to this remember this first question. I asked who is use a DAP Yeah, where where is the problem? And that's another thing that we can talk about of course And yeah, that's that's the end of the presentation. So I'll have some questions. Thank you Very inspiring and interesting just on the very first part, you know of the data ownership if you don't take it into that kind of Like like restricted access or that you wouldn't if you just open a file, right? And see the data. I mean we have seen that for example with Google, right? We all thought if you have a journal publication or so that could only be access to the journal and then Google started to whatever Scan them and then bring out digital copies. So once somebody has access and and whatever written an algorithm to digitize that if it's just as simple as a screen capture then Suddenly it's it's it can be transferred to other databases, right? Yeah, so have you looked into that problem if it's let's say what you enable one access How you prevent that it doesn't get out of your database and spread in an uncontrolled unwanted way to others Yeah, this is core to what we're doing. Absolutely, and it's it's I skipped over the slide But yeah, so escape escape of data escape of model and that's the issue down here is you know Compared to you know what property is this thing I can take with me and you have to fight me for it to get it back But you know digital data or digital assets, which is exactly what ocean is dealing with you can easily duplicate it There's no expense. So This again, this is just coming down to Basically my opinion is that there's no data escape is For in the extreme case the the way to solve it That's the answer basically so if you if you because once it's once it's on your local machine That's it's yours essentially and you can put that on bit torn if you want So that's why that's exactly why no data escape is the most interesting. So that would be the process where again I Go to this marketplace I download a sample asset which has been let's say anonymized or just small and then I build my pipeline and then I get Access to training on the data on your premise and I never am able to see the original data set That's one solution Maybe I'll say a second question. I mean data manipulation Let's say for certain purposes, right? Let's say I'm the president of Brazil and you do this deforestation and Analysis and I give you whatever photoshopped data to provoke that a Similarly independent organization would come to a certain conclusion that would be favorable Let's say for my policy or my economic prosperity or so on So how would you address data? Validation that somebody doesn't game or try to manipulate Databases in a way that would be lead to certain outcomes that that they would prefer So it's a question of trust in the end and There's many ways to to enhance trust so one way Would be to have some type of verification for example of that data So that would be another interesting mode of So let's say that I mean even taking us to back the first thing could be just curation and that would be just reputation base you upvotes for example and Then you can get into things like staking and slashing as well. So I mean it's all research So there I'm not gonna give you the answer. We don't know But these are all of all of the things that were You know this this protocol enables this type of thing the other interesting thing is and It's getting away like you're talking more about the manipulation of the raw data itself So the question is can you automate that? I don't think I mean if you if you're really good at photoshopping Or maybe you just take satellite pictures of a different forest, right? How do you then basically the question is how do you or collides that on chain and in that case? Unless unless you control the source. There's not much you can do really in the end But with things like what could be interesting for example in because one question We're quite interested in is the the quality of a data set the quality of a data set Actually something you can measure because and you can do it automatically just By having a model train on that data if that data has a signal in it then theoretically you can train a model that should be able To perform better than random chance So actually that's something you could automate and that's something that we're building as well Thanks for your talk. I was wondering if you have or anybody in your team already spoke about this to actual scientists and data scientists and what was there? What was there? Response to it. Did you see any interest from them? Because we're also facing similar issues. So yeah, lots of beautiful ideas lots of complex workflows, but are they gonna use it? Are they gonna know that they might really enjoy it? Of course, we've talked to lots of people and a lot of people are interested the question is Are they actually willing to really use it? Yeah, because expressing interest. Yeah, it's very cool, but actually using it. That's exactly where the point we're at now is yeah Again like going back to the first question Have you used a DAP? No, it's yeah, there's a lot of barriers. I think a lot of it is UX issues Just the fact that you still need to explain people what's a private key public key what's the address Yeah, these are all challenges, but we do see traction absolutely do see traction. So we have At big chain we do have some clients lined up for some interesting Collaborations to help solve actual business problems and that's the most exciting part definitely you mentioned homomorphic encryption Yeah, they are like other things coming up on the roadmap or something and trusted computing environments Can you give a statement on where you are in terms of these technologies or like event data leakage? Yeah, because that's very interesting for privacy as well Yeah, so on our Github page we have a lot of work on this compute to data so let's say ocean so a lot of really intense work on Compute to data so we have two tracks on computer data because we have clients that are interested in it now We have as let's say a simpler version of computer data, which is a An ability to basically call that end point so that endpoint wasn't a difficult thing to To pull together and that means that that endpoint gives you it receives a script does some training and sends back the model I mean basically web 2.0, but what gets more interesting is is The of fuller version, I'm just looking for the OEPs. Yeah, there we go and And that is that what we're calling V2 is exactly this like full implementation Where is it? I think it's in a branch It's it's it's um, it's really exciting because it's it's working on the concept of Kubernetes operators so that you specify an operator and That's what's the engine driving the compute behind the scenes. So it's pulling it. Like it's a very generalized concept to have any type of Transformer and then chain them together. So mutating things and then finally publishing that result back to ocean. So stay tuned the computer data Use case does it assume that the parties are generally? Trustworth to each other. So they have like a faithful or a good Relation with each other or could it also be like real public users with one being an attacker and using like maybe time-based side channel attacks Also, do you have so question? Do you have to protect to these? Advanced attackers also with your good question. So I mean for for the first steps. We have to assume trusting environment let's say trusting parties But of course the target is is really to unlock things like genetics data So there's like there's these amazing repositories of genetic data from different countries Which one you'll never you know, you have to go into a cold room You have to walk in you have to place all your devices outside enter use the terminal there to even look at this data So that's the target and of course, that's that's a very you have to make sure that you're working on this antagonistic framework And yeah to guarantee that is Yeah, a research topic But it's the goal. Yeah, cool. Okay. Thank you What do you think how much time would they need for it any guesses? Ten minutes seven hour. Okay, so it's pretty close. I found an average value of 13 minutes But obviously it depends on how complex the case is and The same question is what do you think how much would a farmer company pay to get access to your DNA data and Maybe to ask you a few questions About your health or your lifestyle any guesses here 15 nothing 15 100 Okay, so for most people the range is between 100 and 200 us dollars But there are also cases where it's much more for example Parkinson's patients They get up to 20,000 US dollars or their data is up traded for up to 20k US dollars So what I'm going to talk about today is the role of data and AI in health care the current struggles that reach researchers have using AI in health care and Then I will explore how a blockchain based system could help and then I'm gonna present free privacy preserving Technologies that are interesting in the context of AI and blockchain convergence And I'm gonna talk about current and future research projects in our group. So how does data serve AI to improve health care? We have a lot of data in health care. We talked about genomics and medical imaging. There's also an Emerging field of mobile health with our smart phones We can measure how many steps we do each day, but we can also measure our pulse using our smartphone camera and As we just heard from markers data is essential for AI There are several approaches toward building AI systems the most common way nowadays is machine learning But there are also other approaches such as knowledge bases. Maybe some of you have used Alpha which at the core of its system uses a knowledge base approach and This AI system can then generate positive impact in health care It can depending on the use case for example detect diseases It can recommend therapies or it could even be used to develop new therapies So what's the struggle it sounds good, but there are also struggles Health data obviously is very sensitive data We do not Like in general to share our health data We also don't want to get any ads on social media for example because we shared our DNA or something like that So we really have we are really careful and cautious when we The deal when we are dealing with sharing our health data Sharing the health data also has implications under the data privacy of your relatives if you think of genomics For example, your DNA is very similar to the DNA of your parents or your siblings or your children And if you share the data, it does not only maybe hurt your privacy, but also hurt the privacy of relatives Obviously there have been hacks and misuse of data if we think of equity facts or Cambridge Analytica in other IT fields So people are getting more and more aware of data privacy issues They there are also legal borders for sharing data So it's hard even in Germany to share data across hospitals in Germany And it becomes even harder if you want to share data across hospitals internationally There's also limited available the availability of data on rare diseases and now the second category I'm talking about it's a bit less obvious is To the power of available computing resources Maybe some of you know machine learning can require large amounts of computing resources when we train models so they are expensive energy intensive and partially limited in their availability and there are even tests in machine learning or in computational biology in general which are really really Hard to do computationally and where there's barely available computing power One example is protein folding. There's this folding at home project from Stanford University which is a very large distributed network of Users doing protein folding calculations Obviously nowadays the Bitcoin network is a larger distributed network, but it's very interesting from a distributed systems point of view So the question here is can blockchain help to incentivize data sharing and computing resource sharing How would such a scenario look like? Well, the idea is that we have hospitals that have patient data and they upload data either onto the blockchain or at least on a distributed file storage service and Manage access rights through the blockchain and it's not just one hospital, but several hospitals and then this data Can be used for example it can be used By other hospitals which maybe Take have a new patient. They measure some data might be genomic data or medical imaging based data And then they get a recommended therapy with calculations done in a decentralized and privacy preserving way It could also be that University researchers like myself want to make use of that data or Maybe even former companies want to use that data and maybe they then pay some token Which is then distributed to the people whose data is used or to the hospitals So what would be the benefits of such a blockchain based system? There's a large collaboration across hospitals researchers and industry. So there's much more data available which improves AI systems as A result we get higher accuracy of those systems Ideally we do it in a privacy preserving and Self-controlled ways or the people down here who share their data Have the ability for example to revoke their data sharing and to control over who has access over their data We can use financial incentives for the data sharing and also outsource computation so could we build such a system on tools that are available today obviously Interesting here is Ethereum. It's distributed computing platform provides smart contracts ability and it's the second largest cryptocurrency by market cap and If we check whether it would be feasible today Well, Ethereum is immutable and has a global reach We can also easily Create or realize financial rewards over Ethereum. There's an emerging decentralized finance ecosystem around it But there are two issues here one is data privacy. So Cryptocurrencies and cryptocurrency systems are pseudonymous, but not really anonymous and Also scalability with regards to computing power, but also data storage capability So we need scalable and privacy preserving computation technologies to Bring blockchain and machine learning together What's out there what computation technologies could we use? Traditionally, we have used cloud computing if we need large amounts of computing power But that's not really what we want to go here for when we think of decentralization of Trust of trusting nobody basically in cloud computing. You have to trust the cloud computing provider Maybe they are certified, but you still need to trust What's interesting here is the field of privacy preserving computation technologies There are three categories one which I called edge computation and