 I would like to talk a little bit about how we can hopefully generate open knowledge based on closed data and what kind of new deals are out there to make this happen. Smallest claim in this regard is I will mention some companies later on that seem to have solutions regarding this. I have no connections here and I'm coming more from the perspective of the bioinformatician who well crunches a lot of data on a daily basis and enjoys doing this in the open. So as an open science enthusiast and John mentioned this already in the morning and he made the actually laid the foundation for this talk in a way. In science everything data, source code and clearly in the end the paper has to be open. Everything else would not be good scientific practice. I like to avoid the word open science often because as John also already said this is just pure science. There are foundations of our scientific process is openness and there's a small but whenever I give this workshops regarding open science and how we can implement this somebody raises the hand and says but well there's sometimes data that is linked to personalities that is linked to patients. So there is a higher good sometimes which is called privacy. If I'm a patient I have cancer I would like maybe to donate my genome data for example for research purposes but what I do not want is that this is somehow linked to myself and in principle this is only one small example because we have a lot of data that we should share but maybe should not share because it has an implication on how we can live. So behavior data everybody or many people have fitness trackers for example that track behavior or what am I eating. All these kind of things can be stored and can be used in science to see general trends. So socioeconomic status is also something that in principle is can be a piece of research so you can collect this data and try to make scientific statements out of that. But as an individual I would not like to have this in the hands of other people. In the records we just saw the example before by Dennis that you basically sharing this data can be useful for the research process for researchers but for the individual this might be not so beneficial at least in a certain amount of time. And if we come to the core of ourselves the genomes or axoms or at least some snips and these are maybe some biological expressions but I think everybody is aware that we carry basically the blueprint of our body in us and every cell so the genome the DNA is an important description of who we are and well how we develop and how we live. Today we have actually technologies making this easily accessible. This is one chart that you find very often. The first human genome costed around 2.7 billion US dollars and took 10 years. And this was then available roughly the draft in 2001. And when you then started to sequence a new genome it just costed 100 million. Cheap in comparison. Today and this is due to different technologies that arise roughly in the mid-2000s. Today I can sequence a genome in roughly one or two days for around 1000 dollars. This is definitely a game changer and I'm pretty sure many people are interested in having access to this data for their own purposes for medical reasons for example in order to have personalized medicine and this is a very powerful tool that will definitely change how we can do medicine and how we can do science. And then we are not only our body here but we are actually an ecosystem and we're carrying a lot of bacteria in us so the so called microbiome or gut microbiome skin microbiome all of this can be tracked today again sequencing technologies are actually helping here. And also this is saying something about me and also this can be something very interesting for researchers. And then you want would like to even connect all this so if you know how people behave or if you have certain certain traits how they are linked to the genome. This is then the interesting question that for example Jebus analysis can solve. So in principle as a researcher I have a lot of interest in this type of data as an individual I am actually not very interested in sharing this because this gives a lot of weak points maybe to others. So scientifically speaking we have a lot of interest having these data of a large population and it will dramatically impact how we can science how we can research if we would have access to that. But on the other hand this can be really a problem if this kind of information leaks and is accessible because this can lead to systematic discrimination due to political ideological or even commercial interest. Maybe the health insurance is not taking my contract or I cannot sign a contract because they know I have a certain disease and I have to pay more for that for example take another one. And this is kind of questioning our solidarity system. Maybe at some point people are discriminated because they carry a certain elite of a gene. So in principle there are good reasons to make this data open but there are also a bunch of good reasons to not make them open. So we have clearly a moral dilemma here. Should we protect invalid rights or should we push the scientific progress. And this was mainly for medical data but clearly this kind of situation is also in many other fields. For example financial data of organizations. You could do really make interesting research on top of that but the individual organization like a company might not be interested in sharing this. Energy consumption of devices for example would be also interesting to engineer devices differently location data of vehicles all these kind of data from different domains have a similar problem they might be very interesting if we have a large population we can have access to but it brings issues for the individual there or the organization. So how can this be solved. So is there kind of the possibility that we can generate open ideally open knowledge on closed data. Can we have kind of black boxes that we can maybe see the full data but that we can at least reproduce if somebody claims something that we can go back to the same blocked or hidden data run our algorithm and get the same results. Can we maybe train machine learning models on top of that data and use that for analysis later on. Or at least can we make predictions on top of this closed data that then can be confirmed in different ways. So using this more as a hypothesis generation machinery. How is this done currently. Just an example genomics England is a state-holder company that is that aims to have 100,000 full human genomes which is huge. What they do currently they have closed data centers where only certain people have access to remotely and they can run the algorithms there and only the results leave this via an airlock. Similar approach is the so-called personal health train. They have these data stations similar to this so basically data centers where it can push in some algorithms some more programs actually and you get some data out. So again it's both of them are kind of locked systems and you need again as always you need to trust these instances and trust is something that is not always well earned. For example 23andMe a company that brings genome information, SNP sequencing or SNP information to the broad population and then they sell it. So you went there you just want to know a little bit about your own background and suddenly they sell your information to others. Once again this is nothing you can change if you lose your key. Okay if you lose your bitcoins. Okay this is your damn genome. You cannot just get it back. Well there's genome editing. Well but this is something far away and this is by the way not only your genome. This is your tells a lot about your family as well. So this is huge implications and this is really catastrophe. So there must be better solutions and there are. So I said with this question mine I looked around a little bit and tried to understand where we stand in respect to that. And there are blockchain based decentralized data marketplaces that try to exactly help here. The promise is that the data owners so for example if I sequence my genome I can put it somewhere that I have full control about what of that is shared and with whom that is shared. One important thing is also kind of a standardization of data. And what the let's say the people who want to consume the data have is kind of that they can incentivize to get more of that data. And there it becomes already a little bit critical. But in principle a genome I said if it costs $1,000 this is still too much for many people. But if you put into such a system into such a marketplace information about yourself maybe you have a certain disease maybe you're healthy I don't know. But if you put this information into that marketplace the data consumers can contact you indirectly and tell you okay if we get your genome you will get this and this token that can be later on be traded into fiat money. But with this they can have an incentive or given incentive to sequence certain people or to get more information. It doesn't have to be the genome sequence I said can be also other stuff. But with this the idea is actually to promote this accumulation or this collection of data in a standardized way in an anonymous way. And give also the power to the data owner again. And this to be honest sounds very interesting. And also for pharmaceutical industries industry this is very interesting because they can have this traceability again. If they say okay we have a bunch of patients here we do this and this trial in them and we get now this wonderful results and we can sell our medicine here. But under the hood nobody can prove this. With such a system they can always say okay here's the data if you run your own analysis on top of this you should get the same results. And this would look then very simply like this you have the dog or the marketplace the data owner give access to the data to a data consumer. And they in return get a token very simply speaking. And still keep as said all the rights all the power over above their own data. There are certain underlying concepts some of them were mentioned before fully homophobic encryption is kind of the holy grail in there. And as far as I understand it wasn't not really well implemented so far. Multiparty computation might be a solution basically you break down the problem into smaller pieces and an attacker would have to have control over the whole network. They built on certain hardware concepts trusted execution environments like SGX from Intel. There are a lot of these things out there at least as white papers. They're discussing this and I said this would look very roughly then that the data consumer ask for example for certain data might find it already in via the blockchain. So the data consumer might ask for this might find somebody who offers us already or motivates data owners to contribute their data. The data importantly is stored off chain so is is outside there and is not stored in the blockchain. And then you have these secure compute nodes such as SGX for example and the data owner allows the data consumer to have access to the data to give it basically into the secure compute node and only gets in the end the results of that computation. So rather elegant but also rather complicated system in my opinion. And there are numerous protocols, providers that have kind of a general purpose solution for this at hand. Ocean protocol we will hear right after me I guess. Enigma protocol they have the concept of secret contracts basically smart contracts but in these kind of encrypted environments or in these trusted environments. The Keaton protocol from the ASIS lab and OpenMind clearly with a focus on machine learning although I think the ocean protocol also has a focus on this. So there are different potential providers of solutions in this perspective and they are not limited to a certain use case but are rather broad. On the other hand we also have a kind of more specialized providers coming from healthcare and they come with these kind of concepts. So very elaborate it is already Nebula and the Long Genesis. So Nebula has to please keep in mind this is done by George Church who is a big driver in this genomic field and a strong person in that field. There's also LunarDNA, I'm not sure how to pronounce this, PHAROS encryption and all of them offer solutions basically that you are not unfortunately not yet offer solutions but they tell that they will offer solutions where you can either give health data, health records or even genomic data. Nebula is even working together with a sequencing facility in order to generate the data to store the data and then put this into these private pots that can then be managed via a blockchain approach. So there are already a lot of people in the boat who offer solutions that sound at least interesting to me. But now the question is will these data marketplaces really improve our science? And I would say maybe and this has definitely potential. And I said as a bioinformatician this is kind of the paradise for me. If we have access to this kind of data, this would be really great. And I think it's a trade off between openness and I said I'm a very strong open science proponent and I still always have these debates and they could be stopped here because we can say here's a mode where we have kind of a trade off, we pay off a little or leave a little bit of openness away, but instead we have access to lots of data and this will push our knowledge dramatically in my opinion. But currently maybe I'm wrong and I'm very happy if somebody can correct me here, I see only a lot of white papers. And for this I could not test anything here. Maybe I'm wrong and maybe somebody can do a small demo later. I'm very happy to see that. And I completely agree that this should not be out too early in this case. Once again, this is basically your genome. And unfortunately the discussion is happening mostly by companies and not academics. So thank you for organizing this because here's a strong proportion at least from the academic field and this is important. I see a lot of potential here and it would be a pity if this is ending up in proprietary protocols and proprietary solutions and we're out of that again. And I mentioned this before, it's good that it's not too early out there in my opinion because once your genome or at least snips of that are out there, you will not get that back. So this is really crucial that we have a rock solid solution. I said with Bitcoin, in the worst case, yes, you lose maybe a large fraction of your money. But your genome, you will never, ever can replace. And if you have a disease and for whatever reason, somebody makes this accessible and this will be out there forever. And also, for example, your children or your family. So this can go back in your tree of life in a way. And there are also other issues that even if you have not direct access to the data, we can still use certain tools to, well, deanalyze data. There is a paper from 2013 where they used certain trades and could link this again to last names. This is, again, a crucial thing and it's too precious. And I made this rather simple with my little drawing, how this should work. But the complexity of these systems is dramatic. I said, this is a multi-game solution and you have a multiplayer solution. You have a lot of things to keep in mind here. And it's very complex and this means it might break easily. You might also have problems with different legal systems. Again, we had this, I think, several times before. It's clearly it's a global thing and nobody can forbid me to put data there. But maybe in the end, I have some issues if I want to use the data. And what might be also an issue is that we incentivize now people to, let's say, contribute their genome. But they are actually not aware of these problems. That they say, okay, this is good quick money for myself. I put my genome, I put my behavior data in there. And then they afterwards recognize that this is a problem, that they cannot get that back. So education is needed as crucial as always. Thanks, John, again, education is key. And also the data is, as said, stored off chain. And this is kind of my opinion outsourcing the problem to others. And this has to be solved. I read in one white paper, for example, they suggest to put the genome then on Dropbox. That's ridiculous. This is the most precious thing. Who would do that? I wouldn't at least, no? Well, and who makes sure that the claims that people write, for example, put into the blockchain, ordered to be found by companies incentivizing them that they are not wrong. If I'm a poor person, I have depth, and I know that if I'm belonging to a certain group, my genome is sequenced, and I can give this and get money for that. Maybe I lie when I fill out these forms. So this is my opinion not clearly solved. Bottom line, this is super promising. As said, as a bioinformatician, this is awesome. Having this access to data as a trade-off between openness and privacy. But I think there's still a long way to go, but we should go it. We should try to get it and see if this works out for us. So what are your questions? Thank you very much. Yes, please. Hi, thanks for this presentation. You mentioned this in the entities companies like from George Church and so on. I'm a bit critical about this and wonder how much this is not creating just new silos, because they are actually, to my understanding, all ICO-driven projects that just were there when you could still make money with this last year. They were quick, and George Church is always quick. Doesn't mean that he always has the idea by himself, because Origins 10 might prove that many of the ideas others had before, but he's very loud. That said, okay, now it's there, but I think just it creates new silos, and I don't think that this is actually in the interest of this movement that we are trying to do. I completely agree with you, and this is why I also suggest that we academics have to have the discussion, because it's already ongoing there, and we have the risk that we are outside of this and that this is just a big playground for pharmaceutical companies and not for the general public. Anyone else? No, we'll keep to good time. Thank you, Conrad.