 Hi, I'm Konstantin. I'm a research associate in Professor Ali Zendaya's research group of critical information infrastructures at Karlsruhe Institute of Technology. My research focuses on the convergence of blockchain and artificial intelligence, and that's what I'm going to talk about today with a focus on healthcare researchers and the benefits. So to begin my presentation, I want to start with some questions for the audience. What do you guess? How much time would a medical professional need to analyze such a cardiac MRI scan? So it's a video of the human heart and a medical professional wants to maybe detect certain diseases in the heart. What do you think how much time would they need for it? Any guesses? Ten minutes, half an hour, okay? So it's pretty close. I found an average value of 13 minutes, but obviously it depends on how complex the case is. And the second question is, what do you think how much would a pharma company pay to get access to your DNA data and maybe to ask you a few questions about your health or your lifestyle? Any guesses here? Fifty, nothing. Fifteen, one hundred. Okay, so for most people the range is between 100 and 200 US dollars, but there are also cases where it's much more. For example, Parkinson's patients, they get up to 20,000 US dollars or their data is up traded for up to 20k US dollars. So what I'm going to talk about today is the role of data and AI in healthcare. The current struggles that researchers have using AI in healthcare and then I will explore how a blockchain-based system could help. And then I'm going to present free, privacy-preserving technologies that are interesting in the context of AI and blockchain conversions and I'm going to talk about current and future research projects in our group. So how does data serve AI to improve healthcare? We have a lot of data in healthcare. We talked about genomics and medical imaging. There's also an emerging field of mobile health with our smart phones. We can measure how many steps we do each day, but we can also measure our pulse using our smartphone camera. And as we just heard from Markus, data is essential for AI. There are several approaches toward building AI systems. The most common way nowadays is machine learning, but there are also other approaches such as knowledge bases. Maybe some of you have used Wolfram Alpha, which at the core of its system uses a knowledge base approach. And this AI system can then generate positive impact in healthcare. It can, depending on the use case, for example, detect diseases, it can recommend therapies or it could even be used to develop new therapies. So what's the struggle? It sounds good, but there are also struggles. Health data obviously is very sensitive data. We do not like, in general, to share our health data. We also don't want to get any ads on social media, for example, because we shared our DNA or something like that. So we are really careful and cautious when we are dealing with sharing our health data. Sharing the health data also has implications on the data privacy of your relatives. If you think of genomics, for example, your DNA is very similar to the DNA of your parents or your siblings or your children. And if you share the data, it does not only maybe hurt your privacy, but also hurt the privacy of relatives. Obviously, there have been hacks and misuse of data. If we think of equity facts or Cambridge Analytica in other IT fields, so people are getting more and more aware of data privacy issues. There are also legal borders for sharing data, so it's hard, even in Germany, to share data across hospitals in Germany. And it becomes even harder if you want to share data across hospitals internationally. There's also limited availability of data on rare diseases. And now the second category I'm talking about, it's a bit less obvious, is the power of available computing resources. Maybe some of you know, machine learning can require large amounts of computing resources when we train models. So they are expensive, energy-intensive, and partially limited in their availability. And there are even tests in machine learning or in computational biology in general, which are really, really hard to do computationally and where there's barely available computing power. One example is protein folding. There's this folding at home project from Stanford University, which is a very large distributed network of users doing protein folding calculations. Obviously, nowadays the Bitcoin network is a larger distributed network, but it's very interesting from a distributed system's point of view. So the question here is, can blockchain help to incentivize data sharing and computing resource sharing? How would such a scenario look like? Well, the idea is that we have hospitals that have patient data and they upload data either onto the blockchain or at least on a distributed file. Storage service and manage access rights through the blockchain, and it's not just one hospital, but several hospitals, and then this data can be used. For example, it can be used by other hospitals, which maybe take, have a new patient, they measure some data, might be genomic data or medical imaging based data. And then they get a recommended therapy with calculations done in a decentralized and privacy-preserving way. It could also be that university researchers like myself want to make use of that data or maybe even pharma companies want to use that data and maybe they then pay some token, which is then distributed to the people whose data is used or to the hospitals. So what would be the benefits of such a blockchain-based system? There's a large collaboration across hospitals, researchers, and industry, so there's much more data available, which improves AI systems. As a result, we get higher accuracy of those systems. Ideally, we do it in a privacy-preserving and self-controlled way, so the people down here who share their data have the ability, for example, to revoke their data sharing and to control over who has access over their data. We can use financial incentives for the data sharing and also outsource computation. So could we build such a system on tools that are available today? Obviously, interesting here is Ethereum. Its distributed computing platform provides smart contracts, ability, and it's the second largest cryptocurrency by market cap. And if we check whether it would be feasible today, while Ethereum is immutable and has a global reach, we can also easily create or realize financial rewards over Ethereum. There's an emerging decentralized finance ecosystem around it, but there are two issues here. One is data privacy, so cryptocurrencies and cryptocurrency systems are pseudonymous but not really anonymous, and also scalability with regards to computing power but also data storage capability. So we need scalable and privacy-preserving computation technologies to bring blockchain and machine learning together. What's out there? What computation technologies could we use? Traditionally, we have used cloud computing if we need large amounts of computing power, but that's not really what we want to go here for when we think of decentralization of trust, of trusting nobody. Basically, in cloud computing you have to trust a cloud computing provider. Maybe they are certified, but you still need to trust. What's interesting here is the field of privacy-preserving computation technologies. There are three categories, one which I call edge computation and central averaging, which is federated learning, which we've heard so far from Markus and which I'm also going to explain. Then there's trusted hardware and there's advanced cryptographic techniques. To demonstrate what it means, how those technologies work with data and machine learning, I'm going to give an illustrative example from Cardiology with a very simple data set. We have the age of a patient down here on the x-axis and the systolic blood pressure here on the y-axis. Usually the older patients are, the higher the systolic blood pressure is. If it deviates, so if for example a young person has a very high systolic blood pressure, they might be at risk of having heart disease. So you would further investigate this patient and take special care of the patient. So for that data set, which is very simple, we want to build an AI model which predicts if we have a patient's age, what should his systolic blood pressure be. Obviously it's very similar here. We can use a linear model, which we all know from high school, but the question is how can we do it in a privacy preserving way? How can we calculate it? First I want to talk about the edge computation and central averaging approach. The idea here is that we have a hospital here which has let's say three patients with a certain age and a certain systolic blood pressure. Now this hospital can create a local model. So they can fit this linear function with two parameters, in this case m and b, the slope of the curve and the offset and calculate the model. As we learned, AI gets better with more data, but at this stage they don't share information, but they compute the models locally. So a second hospital has a few patients again, they calculate the model and the first hospital. And now what they do in this example is that they send those parameters, m and b, to either a blockchain for example or some central server and they average those parameters. So all those parameters get averaged and you get a new model which is basically the average model which is expected to be better, which is expected to be more accurate since it incorporates more data. And this new average model can then be sent back to the edge nodes. So here on the right side we only have the parameters m and b from the hospitals originally but the data stays at the hospital's place. This concept is also known as federated learning. It's used nowadays in millions or even billions of smartphones for the keyboard word predictions or if you type on your keyboard you get a prediction what the next word is, suggestions, and obviously you do not want to share your keyboard enterings with Google or Apple, it's private data, so they use such federated learning approaches here. And we can also use it or we use it for neural networks, so for complex machine learning models with millions of parameters. And blockchain-based federated learning uses blockchain as an immutable data storage for federated learning and communication media, so you get an auditable trail for those model updates and you can share and trade machine learning models, but you can also use financial incentives for example for people who provide new data, maybe unseen data which the model has not seen before. The second category which I would like to present is trusted hardware, which we also talked a little bit about today. Now in this case we again have those free hospitals with their patients data and they connect to a trusted execution environment hardware device. So they submit the data to the hardware device over a secure channel and this time the data is actually here on the right side in this trusted processor, but they trust this processor that the processor, the CPU, does not leak the data. We will come on that soon, so now the trusted execution environment has all the data and it can calculate a new model again, maybe it could even delete the data again and then send the model back to the hospitals which benefit from the new model. So trusted hardware gives trust basically that code that we are executing on a remote hardware device is the code, it executes the code which is intended to execute, it's isolated from other processes so you don't need to trust for example the operating system of that remote machine and there's also access control on who can get access to that trusted hardware. It's included in modern CPUs for example if you buy an Intel CPU nowadays it has SGX their trusted hardware system, you can remote attestate that so from the remote you can be sure if you connect to a trusted hardware device that it's the device which it should be, it's pretty difficult to build those trusted hardware devices in a secure manner so there are attacks, papers published which attack trusted hardware devices so that's a bit a challenging part of the moment how to build trusted hardware securely and also trusted hardware currently has limited computing power for example you can really do the inference of machine learning models so you can execute it but you can only train simple machine learning models in trusted hardware. Trusted hardware for blockchain again we can use the blockchain to manage data and also software which runs in the trusted hardware and we can also trade trusted hardware computing resources so we can share and trade resources. Additionally trusted hardware also provides new leadership election mechanisms other than proof of work we can use for example proof of elapsed time or proof of luck which use special features of trusted hardware and trusted hardware like for blockchain is also an active research area both in academia but also in industry or for startups there are projects either trying to build new blockchains with trusted hardware or also to bring existing, to bring trusted hardware onto existing blockchains so now I want to talk about advanced cryptographic techniques I have three categories here one is secure multi-party computation where multiple parties bring computer function together with their inputs I'm not going to explain in detail how it works because the example wouldn't be as simple and then there's zero knowledge proofs where you can prove to another party that you know a value x without revealing any other details and finally there's homomorphic encryption which provides a structure preserving encryption so you can calculate on encrypted ciphertext there's either partially homomorphic encryption so you can do some calculation operations on encrypted ciphertext or fully homomorphic encryption and overall there's cryptographic technologies maybe some of you also know zero knowledge proofs from Ccash from cryptocurrencies but overall I would say there is already ultimate goal so you don't need to deal with maybe side-channel attacks on trusted hardware or also on maybe some attacks which could be done on federated learning but they need a breakthrough for scalability so I want to talk about our research projects in our group there's two research projects which I would like to present one is block 3 which has the goal to bring patient data onto the blockchain for cancer patients oncology and we have strong partners from research industry for that project for example the Chavité hospital in Berlin is a partner so we together want to bring patient data on the blockchain in Germany and our focus here is on the blockchain system engineering and evaluation there's also another project DLT for life in our group which has similar goals to bring the access right management onto the blockchain with a specific focus on life sciences and genomics it's also sponsored by the Helmholtz association this is just the beginning so we are also exploring new fields and getting into projects for federated learning or trusted hardware for blockchain systems and I'm happy to talk to you about how we can collaborate if you have any questions we can now go over to the questions good nice talk and I have a question regardless of which platform we will use who is the owner of the data storage? the oldest hostels so ideally the patient itself is the owner of the data for example in the block 3 project we intend to develop an app once the patient leaves the hospital he can still manage his own data or he can also send updates like how he is feeling now with that system to the blockchain and to the system so ideally either the hospital or in ideal case the patient is the owner of the data thanks for the talk did you check the status quo in Estonia because as far as I know they are pretty progressed with their e-hels blockchain based system what they are using or which approach do you know anything about this? no so I know Estonia in general is very progressive with digital technologies I don't know about blockchain in healthcare but it will be interesting to check it out I also don't know the details but I know they are employing it somehow for the presentation thank you for the presentation I have a question about the trusted computing environments what is your prediction on mobile devices so will this become a thing practical environment for blockchain usage? I would say yes so there is ARM for example has trusted execution environment I know Apple is looking into it as well how to store for example passwords on an iPhone secure leader in the FBI case where the FBI wasn't able to get the passwords from a very bad person so I would say yes it's definitely interesting and increasing privacy also with the mobile space I'm just asking because everyone is talking about where the data is hosted if it's with the patients so could this be a thing to just do it on a mobile device? so you want to store the data on the mobile device and train the model there I mean that's done for example for the federated learning training the model there once you plug your phone into the power source I mean yeah you could train it on the mobile device maybe the owner of the mobile device might lose his device so maybe you still want to send it to a cloud server encrypted somehow the data okay thank you