 From around the globe, it's theCUBE. Covering HPE Discover Virtual Experience, brought to you by HPE. Hi, everybody, welcome back. This is Dave Vellante for theCUBE, and this is our coverage of Discover 2020, the virtual experience of HPE Discover. We've done many, many discovers, as you know. Usually we're on the show floor. You know theCUBE has been virtualized, and we talk a lot at HPE Discover's, you know, a lot of storage and server and infrastructure and networking, which is great. But the conversation we're going to have now is really, we're going to be talking about helping the world solve some big problems. And I'm very excited to welcome back to theCUBE Dr. Eng Lim Goh. He's the Senior Vice President of, and CTO for AI at HPE. Hello, Dr. Goh, great to see you again. Hello, thank you for having us, Dave. You're welcome, and then our next guest is Professor Joachim Schultz, who is the Professor for Genomics and Immunoregulation at the University of Bonn, amongst other things. Professor, welcome. Thank you, welcome. And then Prasad Shastri is the Chief Technologist of the India Advanced Development Center at HPE. Welcome, Prasad, great to see you. Thank you, thanks for having me. So guys, we have a CUBE first. I don't believe we've ever had three guests in three separate time zones. I'm in a fourth time zone. So I'm in Boston, Dr. Goh, you're in Singapore. Professor Schultz, you're in Germany, and Prasad, you're in India. So we've got four different time zones, plus our studio in Palo Alto, who's running this program. So we've actually got five time zones at CUBE first. Amazing. Very good. Such as the world we live in. So we're going to talk about some of the big problems. I mean, here's the thing. We're obviously in the middle of this pandemic. We're thinking about the post isolation economy, et cetera. People compare obviously no surprise to the Spanish flu early part of last century. They talk about the Great Depression, but the big difference this time is technology. Technology has completely changed the way in which we've approached this pandemic. And we're going to talk about that. And Dr. Goh, I want to start with you. You've done a lot of work on this topic of swarm learning. If we could, my limited knowledge of this is we're kind of borrowing from nature. You think about bees looking for a hive as sort of independent agents, but somehow they come together and communicate. But tell us, what do we need to know about swarm learning and how it relates to artificial intelligence and we'll get into it. Oh, Dave, that's a great analogy using swarm of bees. That's exactly what we do at HPE. So let's use the example here. When deploying artificial intelligence, a hospital does machine learning of their patient data, that could be biased due to demographics and the types of cases they see more of. Also, sharing patient data across different hospitals to remove this bias is limited, given privacy or even sovereignty restrictions. For example, across countries in the EU. Now, HPE swarm learning fixes this by allowing each hospital to still continue learning locally. But at each cycle, we collect the learned weights of their neural networks, average them, and send it back down to all the hospitals. And after a few cycles of doing this, all the hospitals would have learned from each other, removing biases without having to share any private patient data. That's the key. So the ability to allow you to learn from everybody without having to share your private patient data. That's swarm learning. And part of the key to that privacy is blockchain, correct? I mean, you've been involved in blockchain and invented some things in blockchain. That's part of the privacy angle, is it not? Yes, absolutely. There are different ways of doing this kind of distributed learning, which swarm learning is. However, many of the other distributed learning methods require you to have some central control, right? So Prasad and the team and us came up together to have a method where you would, instead of central control, use blockchain to do this coordination. So there is no more a central control or coordinator, especially important if you want to have a truly distributed swarm type learning system. Yeah, no need for a so-called trusted third party or adjudicator. Okay, Professor Schiltzow, let's go to you. You're essentially the use case of this swarm learning application. Tell us a little bit more about what you do and how you're applying this concept. Yeah, so I'm actually by training a physician, although I haven't seen patients for a very long time. I'm interested in bringing new technologies to what we call precision medicine. So new technologies, both from the laboratories, but also from computational sciences, marry them and then basically allow precision medicine, which is a medicine that is built on new measurements, many measurements of molecular phenotypes, how we call them. So basically we assess on different levels. For example, the genome or genes that are transcribed from the genome, we have thousands of such data and we have to make sense out of this. This can only be done by computation and as we discussed already, one of the hope for the future is that with the new wave of developments in artificial intelligence and machine learning, we can make more sense out of this huge data that we generate right now in medicine. And that's what we're interested in, to find out how can we leverage these new technologies to build new diagnostics, new therapy outcome predictors so to know whether the patient benefits from a disease, from a diagnostics or a therapy or not. And that's what we're doing for the last 10 years. And the most exciting thing I have been through in the last three, four, five months is really when HPE introduced us to swarm learning. Okay, and Prasad, you have been helping Professor Schiltzer actually implement swarm learning for specific use cases. We're going to talk about COVID, but maybe describe a little bit about what you've been, how your participation in this whole equation. Yeah, thank, as Dr. Anglim Go mentioned, so we have used a blockchain as a backbone to implement the decentralized network. And through that, we are enabling a privacy preserved decentralized network without having any control points. As Professor explained in terms of the Pristian medicines. So one of the use case we are looking at is looking at the blood transcriptomes. Think of it, different hospitals having a different set of transcriptome data which they cannot share due to the privacy regulations. And now each of those hospitals will train the model depending upon their local data which is available in that hospital. And share the learnings coming out of that training with the other hospitals. And we iterate over several cycles to merge all these learnings. And then finally get into a global model. So through that, we are able to kind of get into a model which provides the performance is equal of collecting all the data into a central repository and trying to do it. And we could really think of when we are doing it, them could be multiple kind of a challenges. So it's good to do a decentralized learning. But what about if you have a non-IID type of a data? What about if there is a dropout in the network connections? What about if there are some of the compute nodes which are stragglers or probably they're not seeing sufficient amount of data? So that's something we try to build into the swarm learning framework to handle the scenarios of having non-IID data or in a simple word, we could call it as saying having the biases. An example, one of the hospital might say if you are trying to look at in terms of let's say the tumors and many number of cases and whereas the other hospital might have very less number of cases. So we have kind of implemented some techniques in terms of doing the merging or providing the various different kinds of weights or the tunable parameters to overcome these set of challenges in the swarm learning. And Professor Schiltze, you've applied this to really try to better understand and attack the COVID pandemic. Can you describe in more detail your goals there and what you've actually done and accomplished? Yeah, so we have actually really done it for COVID. The reason why we really were trying to do this already now is that we had generated these transcriptomes from COVID-19 patients ourselves. And we realized that the signal of the disease is so strong and so unique compared to other infectious diseases which we looked at in some detail that we felt that the blood transcriptome would be a good starting point actually to identify patients but maybe even more important to identify those with severe diseases. So if we can identify them early enough that we basically could care for those more and find particularly for those treatments and therapies. And the reason why we could do that is because we also had some other test cases done before. So we used the time wisely with large data sets that we had collected beforehand. So other use cases learned how to apply swarm learning and we're now basically ready to test it directly with COVID-19. So this is really a stepwise process. Although it was extremely fast, it was still stepwise. We were guided by data where we had much more knowledge of which was with the blood leukemia. So we had worked on that for years. We had collected many data so we could really simulate swarm learning very nicely. And based on all the experience we gained together with Prasad and his team we could quickly then also apply that now to the data that are coming now from COVID-19 patients. So Dr. Goh, it really comes back to how we apply machine intelligence to the data. And this is such an interesting use case. I mean, in the United States we have 50 different states with 50 different policies, different counties. We certainly have differences around the world in terms of how people are approaching this pandemic. And so the data is very rich and varied. Talk about that dynamic. Yeah, if you, for the listeners who are, for viewers who are new to this, right? The workflow could be, a patient comes in, you take the blood and you send it through an analysis and our DNA is made up of genes and our genes express, right? They express in two steps. The first they transcribe, then they translate. What we are analyzing is the middle step, the transcription stage. And there are tens of thousands of these transcripts that are produced after the analysis of the blood. The thing is, can we find in the tens of thousands of items, right? Or biomarkers, a signature that tells us this is COVID-19 and how serious it is for this patient. Now, the data is enormous, right? For every patient. And then you have a collection of patients in each hospitals that have a certain demographic or a certain, you know, and then you have also a number of hospitals around. The question, the point is, how do you get to share all that data in order to have good training of your machine? The issue is, of course, privacy of data, right? And as such, how do you then share that information if privacy restricts you from sharing the data? So in this case, SWAM learning only shares the learnings, not the private patient data. So we hope this approach would allow all the different hospitals to come together and unite, sharing the learnings, removing biases so that we have high accuracy in our prediction as well as at the same time, maintaining privacy. So, I would like to add. There was extremely well explained and I would like to add, at least for the European Union, that this is extremely important because, you know, the lawmakers have clearly stated in the governments that even under these crisis conditions, they will not minimize the rules of privacy laws. They, you know, the compliance to privacy laws has to stay as high as outside of a pandemic. And I think there's good reasons for that because if you lower the bar now, why shouldn't you lower the bar in other times as well? And I think that was a wise decision. Yet, if you will see in the medical field how difficult it is to discuss, you know, how do we share the data fast enough? I think SWAM learning is really an amazing solution to that because this discussion is gone, basically. Now we can discuss about how we do learning together but rather than discussing what would be a lengthy procedure to go towards sharing, which is very difficult under the current privacy law. So I think that's why I was so excited when I learned about it in the first place. There's, we're faster, we can do things that otherwise are either not possible or would take forever. And for a crisis, that's key. That's absolutely key. And the byproduct of this is also the fact that all the data stay where they are at the different hospitals with no movement. Learn locally, but only share the learnings. Right, very important in the EU. And of course, even in the United States, people are debating about contact tracing and using technology and cell phones and smartphones to do that. Versa, I don't know what the situation is like in India, but nonetheless, the doctor goes point about just sharing the learnings, bubbling it up, trickling just kind of metadata, if you will, back down, protects us. But at the same time, it allows us to iterate and improve the models. And so that's a key part of this, the starting point and the conclusions that we draw from the models are gonna, and we've seen this with the pandemic. It changes daily, certainly weekly, but even daily. We continuously improve the conclusions and the models, don't we? Absolutely. As Dr. Go explained well, so we could look at like that we have the clinics or the testing centers, which are done in the remote places or wherever. So we could collect those data at that time. And then if you could run it through the transcriptome kind of a sequencing. And then as in when we learn through these new samples and the new pieces, all of them could kind of have that in the local data, participate in the kind of a collective swarm learning, not just within the state or in a country, it could participate into a swarm learning globally to share all this data, which is coming up in a new way and then also implement some kind of a continuous learning to pick up the new signals or the new insight which comes up with new set of data and also help to immediately deploy it back into the inferencing or into the practice of identification. To do this, I think one of the key thing which we have realized is to making it very simple. It's making it simple to convert the machine learning models into the swarm learning because we know there are subject matters experts who are going to develop these models on their choice of platforms and also making it simple to integrate into that complete machine learning workflow. From the time of collecting a data, pre-processing and then doing the model training and then putting it on to inferencing and looking for performance. So we have kept that in the mind from the beginning while developing it. So we kind of developed it as a pluggable microservices kind of architecture with containers. So the whole library could be given it as a container with a kind of a decentralized management command controls which would help to manage the whole swarm network and to start and initiate or ensure the enrollment of new hospitals or the new nodes into the swarm network. At the same time, we also looked into the task of the data scientists and then tried to make it very, very easy for them to take their existing models and convert that into the swarm learning framework so that they can convert or enable their models to participate in a decentralized learning. So we have made it through a set of callable rest APIs and I could say that the example which we are working with a professor either in the case of leukemia or in the COVID kind of things, the neural network models so we're kind of using the 10 layer neural network things we could convert that into the swarm model with less than 10 lines of code changes. So that's kind of a simplicity we are looking at so that it helps to make it quicker, faster and leverage the benefits. So the exciting thing here, Dr. Goh is this is not an R&D project. This is something that you're actually implementing in a real world, even though it's a narrow example but there are so many other examples that I'd love to talk about but please you had a comment. Yes, the key thing here is that in addition to allowing privacy to be kept at each hospital you also have the issue of different hospitals having data that is skewed differently. For example, demographics could be that this hospital is seeing a lot more younger patients and another hospital is seeing a lot more older patients and then if you are doing machine learning in isolation then your machine might be better at recognising that condition in the younger population but not older and vice versa. But using this approach of swarm learning we then have the biases removed so that both hospitals can detect for younger and older population. So this is an important point, the ability to remove biases here and you can see biases in the different hospitals because of the type of cases they see and the demographics. Now, the other point that's very important to re-emphasise is what Prasad, Professor Schultz mentioned, is how we made it very easy to implement this. This started out being a... So for example, each hospital has their own neural network and they're training on their own. All you do is we come in, as Prasad mentioned change a few lines of code in the original machine learning model and now you're part of the collective swarm. This is how we want to make it easy to implement so that we can get, again, as I like to call, hospitals of the world to unite. Without bearing private patient data. So let's double-click on that, Professor. So tell us about sort of your team, how you're taking advantage of this. Dr. Goh just described sort of the simplicity but what are the skills that you need to take advantage of this? What's your team look like? Yeah, so we actually have a team that comes from physicians to biologists, medical experts up to computational scientists. So we have early on invested in having these interdisciplinary research teams so that we can actually span the whole spectrum. So people know about the medicine, they know about the biological basics but they also know how to implement such new technologies. So they are probably a little bit spearheading that but this is the way to go in the future and I see that with many institutions going this way, many other groups are going into this direction because finally medicine understands that without computational sciences, without artificial intelligence and machine learning, we will not answer those questions with these large data that we're using. So I'm here fine but I also realized that when we entered this project, we had basically our model, we had our machine learning model from the leukemias and it really took almost no effort to get this into the swarm. So we were really ready to go in a very short time. What I also would like to say and then goes towards the bias that is existing in medicine between different places. Dr. Goh said this very nicely is one aspect is the patient and so on but also the techniques how we do clinical essays. We're using different robots, we're using different automates to do the analysis and we actually tried to find out what the swarm learning is doing if we actually provide such a bias by purpose. So we just did the following thing. We know that there's different ways of measuring these transcriptomes and we actually simulated that two hospitals had an older technology and a third hospital had a much newer technology which is good for understanding the biology and the diseases but it is a new technology is prone for not being able anymore to generate data that can be used to learn and then predict in the old technology. So there was basically, it's deteriorating. If you take the new one and you make a classifier model and you try all data, it doesn't work anymore. So that's a very hard challenge. We knew it didn't work anymore in the old way. So we pushed it into swarm learning and the swarm recognized that and it didn't take care of it. It didn't take care of it, it didn't care anymore. The results were even better by bringing everything to better. I was astonished, I mean, it's absolutely amazing that although we knew about these limitations on that one hospital data, the swarm basically could deal with it and I think there's more to learn about these advantages. And I'm very excited. It's not only transcriptomes that we will do. I hope we can very soon do it with imaging. The DZNE has 10 sites in Germany connected to 10 university hospitals. There's a lot of imaging data, CT scans, NMRs, radiograms and this is the next domain in medicine that we would like to apply swarm learning, absolutely. Well, it's very exciting being able to bring this to the clinical world and make it in sort of an ongoing learnings. I mean, you think about, again, coming back to the pandemic initially, we thought putting people on ventilators was the right thing to do. We learned, okay, maybe not so much. The efficacy of vaccines and other therapeutics, it's going to be really interesting to see how those play out. My understanding is that the vaccines coming out of China are more built to for speed, get to market fast, and US maybe try to build vaccines that are maybe more long-term effective. Let's see if that actually occurs. Some of those other biases and tests that we can do, that is a very exciting, continuous use case, isn't it? Yeah, I think so. Go ahead. Yes, in fact, we have another project ongoing to use transcriptome data and other data like metabolites and cytokines type data. All these biomarkers from the blood, right, of the volunteers during a clinical trial, with the whole idea of looking at all those biomarkers, we're talking tens of thousands of them, same thing again, and then see if we can streamline the clinical trials by looking at the data and training with that data. So again, here you go, right? We have very good that we have many vaccines on candidates out there right now. The next long poll in the 10th is the clinical trial, and we are working on that also by applying the same concept for clinical trials. Right, and then Prasad, it seems to me that this is an example of a sort of an edge use case. You've got a lot of distributed data, and I know you've spoken in the past about just the edge generally, where data lives, moving data back to sort of the centralized model, but of course you don't want to move data if you don't have to, real-time AI inferencing at the edge. So what are you thinking in terms of other edge use cases that where this swarm learning can be applied? Yeah, that's a great point. We could kind of look at this both in the medical and also in the other fields as we talked about. Professor just mentioned about these radiograms and then probably using this with the medical image data. Think of it as a scenario in the future. So if you could have an edge node sitting next to these medical imaging systems, very close to that. And then as and when the systems produces the medical images be it could be an X-ray or a CT scan or MRI scan type of things. The system next to that sitting attached to that, from the model it is already built to the swarm learning it can do the inferencing. And also with the newer set of a data if it looks some kind of an outliers or sees the newer images or probably a new signals, it could use that new data to initiate another round of a swarm learning with all the involved or the other medical images across the globe. So all this can happen without really sharing any of the raw data outside of that systems but just getting the inferencing and then trying to make all of these systems to come together and try to build a better model. So the last question, if I may, we got a wrap but I mean, I first I think heard about swarm learning maybe read about it probably 30 years ago and then just ignored it and forgot about it. And now here we are today. Blockchain of course first heard about with Bitcoin and you're seeing all kinds of really interesting examples but Dr. Go, I'll start with you. This is really an exciting area and we're just getting started. Where do you see swarm learning by the end of the decade? What are the possibilities? Yeah, you could see this being applied in many other industries, right? So we've spoken about the life sciences, the healthcare industry. You can imagine the scenario of manufacturing where a decade from now you have intelligent robots that can learn from looking at a craftsman building a product and then to replicate it, right? By just looking, listening, learning. And imagine now you have multiple of these robots all sharing their learnings across boundaries, right? Across state boundaries, across country boundaries provided you allow that without having to share what they are seeing, right? They just need to, they can share what they have learned. See, that's the difference. Without having to need to share what they see and hear they can share what they have learned across all the different robots around the world, right? Or in the community that you allow. Imagine that time, right? That world where even in manufacturing you get intelligent robots learning from each other. And Professor, I wonder if as a practitioner if you could sort of lay out your vision for where you see something like this going in the future? I'll stay with the medical field at the moment being. Although I agree, it will be in many other areas. You know, medicine has two traditions for sure. One is learning from each other. So that's an old tradition in medicine for thousands of years. But what's interesting and that's even more in the modern times we have no tradition of sharing data. It's actually no, it's just not really inherent to medicine. So, and that's the mindset. So yes, learning from each other is fine, but sharing data is not so fine. But swarm learning deals with that. We can still learn from each other. We can help each other by learning and this time by machine learning. We don't have to actually deal with the data sharing anymore because that's kept with us. So for me, it's a really perfect situation that medicine could benefit dramatically from that because it goes along the traditions and that's very often very important to get adapted. And on top of that, you know, what also is not seen very well in medicine is that there's a hierarchy in the sense of certain institutions rule others. And swarm learning is exactly helping us there because it democratizes onboarding everybody. And even if you're not so, you know, a small entity or a small institution or small hospital, you could become men when the swarm and you become as a member important and there is no central institution that actually rules everything. And that this democratization, I really love, I have to say. Versaib, we'll give you the final word. I mean, your job is really helping to apply these technologies to solve problems. What's your vision for this? I think Professor mentioned about one of the very key point to use saying that a democratization of AI, I'd like to just expand a little bit. So it has a very profound application. So Dr. Go mentioned about the manufacturing. So if you look at any field, it could be a health science, manufacturing, autonomous vehicles and those through the democratization and also using that blockchain. We are kind of building a framework also to incentivize the people who own certain set of data and then bring the insight from the data into the table for doing and swarm learning. So we could build some kind of also a monetization framework or an incentivization framework on top of the existing swarm learning stuff which we are working on to enable the participants to bring their data or insight and then get rewarded accordingly kind of a thing. So if you look at eventually, we could completely make it as a democratized AI with having the complete monetization, incentivization system which is built into that to make all the parties to seamlessly work together. So I think this is just a fabulous example. We hear a lot in the media about the tech backlash, breaking up big tech, how tech has disrupted our lives but this is a great example of tech for good and responsible tech for good. And if you think about this pandemic, if there's one thing that it's taught us is that disruptions outside of technology, pandemics or natural disasters or climate change, et cetera are probably going to be the bigger disruptions than technology, yet technology is going to help us solve those problems and address those disruptions. So gentlemen, I really appreciate you coming on theCUBE and sharing this great example and wish you best of luck in your endeavors. Thank you. Thank you for having me. And thank you everybody for watching. This is theCUBE's coverage of HPE Discover 2020, the virtual experience. We'll be right back right after this short break.