 All right, so hi everyone, I'm Dennis Christian, I am a chief scientific officer at Nebula Genomics and I'm happy to be here today to tell you a bit about our work on combining genomics and blockchain in order to address some of the obstacles that the field of genomics has been or still is facing. So the foundation of what we're doing was essentially late about ten years ago, actually about the same time blockchain or Bitcoin was invented and the foundation is what's called next generation DNA sequencing. When the first genome, the first human genome was sequenced in 2000-2001, the whole process took over ten years and it cost three billion dollars. If that price didn't decrease then we wouldn't be talking about genomics today, but fortunately what happened over the past 15 years is a hyper-exponential decrease in price as sequencing technology advanced. So we went down from three billion dollars to less than one thousand dollars today, making personal genome sequencing affordable to many people. And this naturally opens up a lot of opportunities for individuals who can get sequenced to learn about various issues like health risks, medications they should take, whether they're carriers of certain genetic conditions and so on, but it also creates opportunities for pharma companies who are increasingly interested in accessing genomic data sets and using that data to facilitate drug development at different stages of the drug development cycle. And because of that there's an increasing interest in the industry of getting access to genomic data sets. So those are just a few examples of publicly known deals that happened over the past few years. Most recently, just a few months ago, there was an announcement that GSK, a big pharma company, invested 300 million dollars in 23andMe, which is currently one of the leading personal genomics companies to get access to the genetic data of their customers. And I briefly touched on different opportunities that exist. This is just a more comprehensive overview. So for patients and consumers, from sequencing your genome, you can learn about disease risk and increasingly be able to take preventive actions to actually not get sick. You can also learn about pharmacogenomics, which means that you can learn which drugs you should take and which drugs you should not take, because you are at risk of having side effects. And then you can also do something called carrier screening, which essentially means finding out which pathological genetic variants you have that do not cause a disease in you, but might cause a disease in your children if your partner also happens to have such a variant. And for pharma biotech companies, there are several opportunities as well. Drug discovery is one, so really do rational drug discoveries and just essentially random screens, as it's still commonly done today, to identify those genes that are associated with certain medical conditions and then designing drugs that modulate its activity of those genes. Then genomic data can also be leveraged directly during clinical trials. As mentioned during the previous talks, the costs of drug development are very high and getting higher because more and more drugs fail through clinical trials. And one potential approach to addressing this issue is really not recruiting people for those trials who are more likely to actually have a positive response to the drugs or people who the drugs are more likely to work on. So the basic idea is just to consider those people's genomic data when recruiting them and pick them based on that. So really build drugs that take into consideration people genetics. And genetics can also be used, or genomic data sets can also be used for what's called post-marketing surveillance when companies look at what side effects occur and how efficient the drugs are in different populations of people. But there are currently a few problems surrounding the utilization or generation and sharing and utilization of genomic data that are summarized on this slide. First of all, there's problems on both sides, on the side of patients, consumers, as well as on the side of the pharma and biotech companies. What we have with patients and consumers today is that essentially not enough people are getting sequenced. So if I asked, like now all of you, have ever used a genetic test or sequenced your whole genome? Maybe you just can raise your hand. How many people know their genome? Like two people. And I assume it's also probably not proper whole genome sequencing but rather one of those cheaper tests that look at only a small fraction of your genome. So that's a problem. It's already affordable today and you can learn a lot from it, a lot of useful information, yet you're still not doing it. So we need to fix that. And we're essentially trying... What we're building is essentially our main goal is really incentivizing more people to get sequenced and to share their data. And one core problem is that people still have to pay for their data, for learning their genetics, and most people just don't see the value of it or they're not aware of the opportunities. Then another issue is how many personal genomics companies today operate. So the business model revolves around essentially collecting their customers' genetic data and then monetizing it themselves. And what this means is that those people, to some extent, just lose ownership of the genetic data and have no control over who gets to see it and for what purpose it's used. And this lack of control and transparency just is not acceptable to many people. And given that system, it's also often not clear how exactly the data is protected and who's responsible for protecting it or if it's protected at all. And when companies monetize this collected data, they do not compensate the actual data owners, all the individuals, which is another issue. So all the things essentially lead to the result of the surveys that I just did that has shown that very few people are getting sequenced and sharing the data. But this model that we have here that essentially just shows this flow through these personal genomics companies, which act as middlemen, also creates problems for pharma and biotech companies. First of all, the issues with consumers that I just described lead just to the general lack of data. But this is made even worse by the fact that we have then such data silos essentially every personal genomics company becomes a data silo, and they don't share the data among each other and they often have data in different formats, which makes it for researchers very difficult to access data that they want. And because of due to those middlemen, researchers also don't have direct access to patients and consumers. It becomes more difficult to, for example, ask for additional information that is required to do for the studies. And prices increase as well. Those middlemen, you know, they want to make profits as high as possible and they're charging accordingly for the data. And they are all possible challenges associated just in this handling of genomic data related to just retrieving it, storing it, analyzing it, which I will briefly talk about as well. So this is a survey we did ourselves, and its results are similar to other surveys too. As expected, you know, very few people responded to have actually already used any kind of genetic test, only 2%. For most people, they just consider it too expensive or just not worse for what it offers about a shirt, 29%. And other shirts have privacy concerns and the remaining people have all different kinds of issues. And then in terms of cost, many people really think that, you know, $1,000 is too much. And what they're willing to pay is about $100, which is what today those cheaper genetic tests usually cost, but they are much less comprehensive and the results are much less useful to both the individuals as well as the researchers who want to use the data. Then the data access issue, as I mentioned before, data fragmentation is one issue. There are those for-profit biobanks that have their own data sets. The non-profit biobanks also have their own data sets. And currently there's no sharing between those happening, even in some cases when they're just, you know, publicly available online. Like for example, for the personal genome project, it's still often not integrated and it's difficult to make them interoperable. Then genomic data, at least raw genomic data when it comes out of sequencing machines, it's pretty big. It's about 200 gigabytes per person. So it's really difficult to just, you know, send it around on the internet due to this data size. Then there's a lack of automation. So as a researcher who wants to access a larger data set, it means that you have to manually go out to those different biobanks, talk with someone there, ask what data they have, tell them what you need or shade prices and so on, transfer payments after that, sign contracts. So it's a very manual process. It just makes it very difficult and slow to collect data. When we spoke with pharma companies, how long it just takes them to collect data, what you heard is that in just half a year to just get the data that they need. Then there are regular restrictions when it comes to accessing data. For example, some countries, like China, are quite strict about it, and simply do not allow genomic data of Chinese citizens to leave country borders. So all genomic data of Chinese citizens has to be stored in China, which obviously makes it difficult for any kind of non-Chinese western company to access this data. And of course, privacy risk as well, whether perceived or real. They just deter people from, you know, participating in any kind of data generation or sharing efforts. And when it comes, assuming you actually acquired the data that you want, and then you want to store it, manage it, analyze it, there are a number of other issues you have to deal with. Storage space is one. Genomic data is quite big. At least the raw genomic data. So as of now, I think the biggest source of data that we generate, I believe, like YouTube and Twitter is also pretty big. So text data and video data are pretty big. But there are some predictions saying that genomics are going to take over and be the biggest data source that we will be generating in the coming decade. So it's predicted by 2020, we'll be talking about exabytes and by 2025 about set-up bytes of data. And you need to store this data somewhere, so that's a problem. And obviously, then you also need to compute power to process all those data, some of the typical algorithms that are executed on genomic data are quite resource-intensive. So in addition to storage, you also need a lot of compute power. And when you do analyze the data and you're dealing with large data sets, it's always a question how to organize the data, how to keep track of what you're doing, how to make all the results reducible, and so on. So those are issues as well. Now, after talking for a while I want to tell you about what we think can help address these issues. So this is a model that we are proposing and are working on implementing. What it essentially is, it's a network, a peer-to-peer network that uses blockchain and that connects, that consists of individuals which can be consumers, which can be patients, which can be databases. And it connects all of them on a single network and makes them essentially accessible to researchers at pharma biotech companies or in academia. So what those researchers can do, they can query what data is available, figure out what they need by sending queries to the network. And then they actually also can compute on that data without moving the platform of the system. So it's a distributed computing platform. So they just submit their workflows, they make payments and then get results back. And what this arrangement enables is a number of benefits for both patients and consumers as well as pharma and biotech companies. For patient and consumers it means that having this kind of direct connection with researchers means that it comes in and let's say the data that he looks for is not there yet, he can actually subsidize the sequencing costs so he can fully pay the sequencing costs of that individual. So for people it creates an opportunity to get sequenced at a much lower price or potentially even for free. Then they also stay in control of the data and can decide who they want to share it with and when they do so do it very transparently and that's what blockchain can help with which I will talk a bit more about later. The model of keeping the data in place also contributes to data privacy since access to data can be restricted by the individuals themselves. And obviously individuals can then also get compensated for data sharing so this platform can function as a marketplace. We think that many people will share the data for altruistic reasons simply because they want to support biomedical research and drug development but they have the opportunity to share the data as well. And all this will, we believe, lead to an increasing genomic data availability. For pharma and biotech companies the system can function as we call it end-to-end genomic service platform so a researcher just comes in, queries the network finds the people who are of interest for the study if the data is already there pays for access then access the data, computes on it gets results. If the data is not there yet subsidize the sequencing costs the data is generated then access the data and analyzes it. And it also enables faster access to data because what we hope to create is really a single network with a single point of access where researchers can go and get access to a much larger data set and through that address all those issues about data fragmentation and data silos. What it also offers is direct access to patients and consumers so it's not just about dead data but actual access to a person behind the data and dynamic data generation. So when a researcher wants to know additional things that are necessary to conduct a certain study he can just go ahead and do a survey or just send a direct message to that individual on the network. And you know removing the middleman also makes consent management much easier which is right now quite an issue when it comes to health data in general. Blockchain which is you know a immutable public ledger it's very well suited for just managing access control and consent. So individuals can just add entries to it saying I allow my data to be used by that researcher for that purpose then it's immediately visible to all participants and then this concert can be revoked later by adding just another entry to the blockchain. So this just illustrates the genomic data generation part which is quite important because you know most people have not been sequenced to date so the data is simply not there actually generating the data is a core of our mission and we think we can automate and paralyze and make this data generation more efficient through blockchain and smart contracts so you know data buyers researchers can just query and identify people who they are interested in then deposit a certain amount of tokens in a smart contract this can then be accepted by those individuals through execution of that smart contract the tokens are transferred to a sequencing facility the data is generated and then made accessible to both the individuals who belong to as well as the researchers who pay for the data generation. This illustrates how the platform looks and works like so there are essentially three components to it the first one is the storage system so without going into much detail we're using a distributed storage system that essentially supports all available clouds you know like AWS, Google Cloud Microsoft Azure cloud and so on so the data can be distributed across all those different clouds stored there securely encrypted form and be computed on which is the second component of the system so it's a distributed computing platform that manages workflows in such a federated environment where the data is stored in different places and it uses different common standards for computing and integrating different bioinformatic tools and it supports because of that all those important things like version control and reproducibility and so on so the important thing is that blockchain is used for constant management access management so the data itself is stored off-chain because obviously blockchain is not suitable for storing petabytes of data and the third part is actually our blockchain that fulfills several functions as I just said mainly constant management it eliminates middleman and creates this public immutable or like append-only database through smart contracts it enables fast paralyzed data data purchases and the concept of the data access can essentially be managed by multiple parties who hold who hold split keys so that there's no single party relying on so that interviews don't rely on a single party to just manage access to the data so this essentially just distributes trust across a large number of network participants and then again once the data is generated data access can be purchased again smart contracts similar to the data generation scheme where payments are made accepted and then the blockchain so-called valid data nodes they just collectively decrypt the data each of those nodes holds a key share and then makes the data accessible to researchers who can compute on it and data protection is very important to us we touched on these different properties of the system that ensure data protection here they are again summarized one is just distributed storage and computing bringing computations to the data instead of sending the data to someone that's obviously helps protect data privacy because data can be analyzed in a controlled environment so what I haven't spoken about at all today is privacy and Kensington technologies that we're using for example right now we're working on a homomorphic encryption based scheme that enables the data to remain encrypted while it's being queried and later we look to add additional computations on encrypted data especially genome-wide association studies and then blockchain technology of course also adds to data protection as it enables a very transparent way of data sharing multi-party data access control and also governance of the whole network that incentivizes all participants to govern the network in such a way that maximizes data privacy and protection and what we're hoping to achieve by doing all this is essentially network growth driven by different factors sequencing costs first of all will keep going down which will make it more and more affordable and it will be a great sequence where people can easily pay themselves or researchers can come in and pay the costs for them then as the network grows its value will increase there will be more data bias will be attracted from the industry more researchers will come in will be willing to pay more and more for the data and as researchers also learn more about human genetics it will become more and more useful to know your personal genetic data and more people will be incentivized to get a sequence and share the data yeah, that's it, thank you alright thank you very much for this presentation in the interest of time maybe one question and then we'll just have Roman yeah thank you very much do you see this correctly your main audience for the whole thing is actually the industry or do you see any good connection to do this in science in the open is there a good approach or applying this approach there as well well, I mean researchers in academia they obviously also use I need genomic data the research they do is just less focused on developing new drugs but more focused on just basic understanding of human genetics we want this we will make this network accessible to them as well but the way we think about it is that in most cases when an individual is approached by such a new researcher for academia we think that individuals will be compelled to just share the data for altruistic reasons to support research while in cases where they're approached by farm and biodec companies individuals will actually be willing to wanting to monetize their data so I think the incentives are just a little bit different but we want the data to be shared with both those parties