 Hello, everyone. My name is Guillaume Moutier. I'm from Red Hat, a little bit of background just to set the stage. I joined Red Hat two years ago, prior to that I was the city of Laval University in Canada. I'm from Canada, by the way, so that's why it's so hard for me right now with this weather. And as part of an academics institution, I had the chance to work with or affiliated hospitals around the city of Quebec. So I know a little bit of what people and researchers are doing in healthcare and today we're going to see how Red Hat is helping them with different use cases. So and I don't see anything here, not there. So we go first through the Red Hat and OpenShift in data science just to see how Red Hat is helping with the kind of tools that we are providing and then I will go through different healthcare projects I had the chance to work with either as fully open source community projects or with our customers, okay? So Red Hat and Open Source for those who don't know anything about Red Hat, we are the leading word provider of Open Source enterprise solutions, okay? Meaning we take open source projects that we productize and that we fully support afterwards, okay? So think of the Linux kernel OpenShift, which is our Kubernetes distribution and tons of different applications that come from the open source world. And of course everything that we do is also for the open source, even if it's with a subscription, which greatly enhances the chance that you build a successful data science platforms because of course you have to interact with myriads of different solutions. That's where open source definitely helps. So perception versus reality on the left side, that's what we would like machine learning to be. That is okay. Yes, as a data scientist I can learn on cool models and can train models and the likes. Unfortunately, that's not what exactly what you find in real life, where the machine learning part, the model training in the rest is a little bit smaller than we would think, but there is tremendous work to be done on data. Data acquisition, gathering, cleaning, preparation and the rest, and then putting all of those together to work in a production environment. Because at the end of the day what we want is something actionable, especially in the healthcare industry, this can save lives depending on the accessibility of this. If we have only models that we can use after six months of working with the data and the rest, maybe it's too late for the patient. So that wouldn't help. So what we are trying to do is to provide as many tools as possible in our environments to help with everything that is around the IML. I try that. We don't train models. We don't provide you with models, but we provide you with all the tools that you would need to create those models and put them into production. How we do that? We take this machine learning life cycle, the standard life cycle, where you have this data preparation, then the training, then the deployment, the monitoring, and the feedback loop where you put back your retrain models on production. Of course, all of this has to run almost everywhere. That's why we have also the sub-and-chief platform, which is our Kubernetes distribution, to be able to run those things almost anywhere. So that's the base infrastructure and the base architecture on which we are building our solutions. And for data science, what we strongly advise is the use of containers. Containers in standard application development, it was a huge change for everyone just spinning up the life cycle of your applications. So they come with this simpler, lighter way of packaging your applications, which is cool. It's still useful in that science world, but it brings also an interesting characteristics, which is the immutability. When you work in data science, and I'm not a data scientist myself, so maybe most of the crowd here knows this better than me, but the idea is that whenever you do some experiments, you want to be able to reproduce it, whenever you want to do it. So meaning you can have the same data, but one year later, it must run exactly in the same conditions. So meaning you have to reuse the same libraries, you have to use the exact same code and everything. Which with the newest fancy stuff, like TensorFlow, iPyTorch and the rest, mathematically the way it works, you won't have the exact same result, whether you use TensorFlow 2.7.1 or 2.7.2 or 2.8, it's the same library, you call it in the same manner, but given the same code and the same data, you don't have the same result. That's where you have to make sure you package your applications and your data in some way that it's totally reproducible. That's the base of science, is this reproducibility, and that's what containers are bringing. So I would say that containers are totally adapted to the data science world to have, of course, application development, but even more for pure data science because of this reproducibility that you bring. Of course, its standard characteristic is reusable, portable and everything, so that means anything that you have developed somewhere will be able to run somewhere else, and that's where we are going to see in some examples. So based on that, we have created Open Data Hub. Open Data Hub is our solution at Red Hat developed at the office of the CTO to take the most widely used data science tools and data engineering tools and package them in a way that they can run on OpenShift, again, our Kubernetes distribution, which comes with some enterprise-grade capabilities, especially regarding security and the rest, so that means you just kind of take any container in any form and just put it on OpenShift and it will run. Now, there are some things to do to make them properly run, and that's what we do with Jupyter Notebooks and Kubeflow Argo, Triton from NVIDIA, all those kind of things. We make sure they run, and we package it as an operator, so that means, and we can have a demo for you at the booth anytime in under a few minutes, you can fully deploy your data science platforms for your data scientists. Just take a few clicks, go from the marketplace on OpenShift, deploy data science platform, and bam, you have it. You can provide Jupyter Notebooks as a service. Either in the cloud, on-prem, wherever OpenShift runs, which is everywhere, you can do this kind of thing. So, based on this, we created Red Hat OpenShift Data Science, because remember, the first project OpenData Hub is a full upstream project, community-driven, a sort of product that you can buy, but Red Hat OpenShift Data Science is the product. So, we take this code, we will package it differently, and OpenShift Data Science is a managed service, meaning it runs fully in the cloud, you don't have anything to install neither OpenShift nor the Data Science platform itself, so that it's an easy way for your data scientists to be able to work. It has two different effects. First one, you are sure that each and everyone is using the same tool, at the same version, with the same packages and everything. Most of times, organizations, they have small Data Science teams, and everyone is, you know, hacking their workstation and installing this or that that they've seen, you know, they've seen an article on this or seems cool, I'm going to do this, and then they want to give it to someone else in the team, and it just doesn't work, because the installation is not exactly the same, the packages are not exactly the same, and so on. So, here we solve the situation by providing an easy-to-use, serve-service environment where everyone is at the same level. Then the second thing, of course, is the security, because as it fully runs inside this contained environment, the data never leaves this environment, okay, you're not at risk to lose your laptop or whatever, with whatever is on it. And the other interesting thing, the increased capabilities, meaning you cannot buy two GPUs and 256 gigabytes of RAM to each and every data scientist in your team, because sometimes they need to do heavy model training. No, that's just not true. So, here, by centralizing those resources, you are able to provide a serve-service on-demand thing where they can select, I want today to work with eight CPUs and one GPU, because I know I will do model training, but then I will release those resources so that they are available to others. So, it's a more convenient way and secure way to work. Again, if you want to demo about that, we can meet in the booth. That's the first release with different components. So, we have the layer. It runs right now in the cloud on AWS, soon coming to Azure and GCP and also coming in a few months on-prem. So, you have your accelerator layer with NVIDIA GPUs that are fully supported by NVIDIA. You can do, of course, GPU slicing and everything to share those GPUs because they are quite costly among people. And then you have this layer with the managed services from Red Hat, with the OpenShift data science services, the notebooks and everything, the model training, model serving part. And, of course, we partner with other ISVs like Starburst Galaxy for distributed SQL queries. So, that means if you have terabytes of data to query, you can definitely use this kind of tools. Packyderm for data management, data line age and all those kind of things. IBM Watson and the tools and libraries from Intel. So, if you don't have any GPUs, you still can have accelerated model training and serving using those libraries. So, that was what we do at Red Hat and those are some of the dashboards that you will see. So, if you subscribe to the service as a data scientist, you have access to directly to those environments where you can select, okay, today I want to work with this flavor of Jupyter notebooks, with PyTorch or whatever, you just launch, click and launch, and you are in your environment. So, that's all for Red Hat. I had to do this talk about Red Hat and our services and offers, but I'm pretty sure you will prefer this part about concrete use cases. So, those are use cases I've worked on. So, those are for real, it's not a demo or anything. So, let's review that. The first one I selected is how's my flattening. So, at the beginning of the pandemic, there was a team in Ontario of researchers and doctors who were quite fed up with the way the government of Ontario, it's Ontario, it's not Quebec, it's the government of Ontario are bad, but they were, you know, giving the data to the data scientist in some way that was just not usable, okay. Quick example, every day they would publish some results about the pandemic, you know, statistics and everything, but each and every day on their website, they would publish the data for the day, but everything else was lost. So, you couldn't look for yesterday and the day before and the week before and so on. So, a researcher began, you know, to gather this data, put up some spreadsheets and everything and share with friends and so on. So, the community began to be created and they went up to about 300 people. So, researchers, practitioners, people from different government agencies and the rest, and they set up this platform where they would be able to share their data, to work together on the data and to publish this website where people would have tons of dashboards and analyzes and COVID-19, but it was also a platform where those people could collaborate and they would publish the results, as in a way that was actionable by the people in research or in medicine in general. So, the way we did this is that we used open data hub deployed in the cloud in the U.S. So, that's what, and we did it for free all this time. That's kind of, it was kind of, right, that's a fort towards the pandemic. We helped them set up this platform where they could easily collaborate. So, that means each researchers, you know, they would be separated with different, does it work? No, no, no, different projects and everything. They would have access to the data science tools, so Jupyter notebooks and Argo to create workflows and everything. They would work together on this and then at the end they would publish their thing. The interesting part was that, of course, there was some data that was freely available. There was some other data that was more restricted and some data that was heavily restricted. So, we variable by separating the access rights and everything to have each and every researchers based on their group and authentication and the rest have access to only the data sets that they needed only to the notebooks that they would want to work on and so on. So, they were able to work. Now this project, now that the pandemic is almost ending, it's really scaled down. But it was a good example on how you can bring many from people from different organizations working together in a shared on-demand data science environment. And because there were data scientists, researchers, MDs and everything, they wouldn't have known how to create this. Originally they were just exchanging spreadsheets and working through MSN messenger or stuff like that, just no, just total hell. So, if they wanted to be able to scale down to those 300 people, that's the kind of platform that they needed. Okay. Another example, HTA healthcare, I know they are present also in the UK, that's the largest healthcare company in the US. Here with them we've worked on substance detection. We help them create a model and a platform to deliver it, to take all the data from the patients and make some predictions about the risk of sepsis. And that's a quote from a data scientist here who said that they now know up to five years in advance that there is a risk of sepsis for specific patients. Just look, not waiting for the sepsis to happen, but acting on some data. And this is a correlation of many different data from the patients and the patient history and everything, but they are able to do that. So, that's quite cool. We've worked a lot with the veterans affairs in the US. So, they are the one taking care of healthcare for all the veterans in the US. If you don't know this community, I have a few figures here. So, that's 9.2 million people. That's huge cohorts. Any data scientist would die to have access to the kind of data that they have. That's totally crazy. Tons of practitioners, of medical centers and the rest, but you see the disturbing thing here at the bottom is the 17 years it takes from research translated to practioning. So, they want to, of course, speed up this way, speed up the time from new knowledge being acquired to something that is really actionable into their environments. So, they are working on different projects and everything is based exactly as what I described on the open shift data science platform. In their case, it's with open data hub, the open source version because everything is heavily secured and on prem, but still they are leveraging open shift and this kind of data science platform. For our first project, which is the agile MDA card, it's a model that follows a patient all along their journey. Just to make sure that the medication they are given follows exactly what it shows because, of course, the MDs cannot have everything in their head. So, here we are helping with this model to make sure that the treatment they are receiving is adapted to their current medication, to the ones that they had in history, so it's based on the historical patient data, it's based on their day-to-day data and metrics and the rest, so that they make sure that they receive the right medication at the right time. It's like an overseeing control method about what's happening for those patients. Another project they are working on is this conversion from long phone calls to video consultations because they had this idea that, you know, because the US is huge, they are doing a lot of consultation by phone, but they have detected this trend that the longer the phone call is, there is a good chance that you should definitely see someone. So, here by analyzing the conversation in the phone calls, analyzing the length paired with the data of the patients, they are trying to improve this to early detect which patients should directly go to a video interview instead of calling and spending long hours on those phone calls. That's what they are trying to achieve. Natural language processing, here it's to reduce the administrative burden. They have tons of affiliated hospitals or clinics or whatever. They are still sending fax. I don't know if it's the same in the UK. I can tell you it's the same in Canada. My wife is a researcher in cardiology. Whenever she wants to do something, the doctor always says, oh, send me a fax. That's just not workable data. It's such a pity that in 2020 people are still exchanging data like that with faxes. That doesn't make sense. So, here they want to use natural language processing to be able to receive directly those fax, process them. Of course, if it's only a prescription, that's pretty easy. But when the doctor writes a full summary of the interaction with the patients and everything, that's what they want to analyze to be able to directly categorize the patient, add some metadata in its files and everything. So, that's what they are doing with NLP. Another project is VSmart. That's an application that they have for veterans. Here it's much more on taking information about the patient and then giving suggestions, nudges, to change their life habits and everything. So, it can be anything from diet recommendation to sports and everything. So, exactly as Facebook is doing with your data and then pushing you some ads or whatever, here it has a more ethical goal, I would say. They are gathering this data, the interactions, what the veterans are doing exactly in their life and then nudging them into better habits. That's the idea here. I have another example and here, because of time, I won't do the full demo live but there is a recording. It's at the end. I guess the slides will be shared so you will have the links and everything. This is a demo that I created before the pandemic. I didn't know that it would be so popular afterwards but it's about a data pipeline for X-ray diagnosis. Turning a model to be able to recognize risk of pneumonia from chest X-ray, that's pretty straightforward nowadays. There are tons of libraries and frameworks that will do it almost automatically for you. So, what I wanted to demonstrate in this demo is how you can put this into real production at scale. So, the idea is that we have this model but we want to accelerate the processing and the veteran's affair is a good example for that. They have every month about one million images to process. That's the kind of scale they are operating at. Of course, they won't put radiologists or a doctor in front of each and every of these images so they have to devise a new innovative way to do that. So, the workflow is this one. You have at each medical facility, you have a container image with a machine learning model that will automatically process the incoming image and make a risk inference of pneumonia. It will classify all those images between the high risk, low risk of pneumonia and also a low confidence in the prediction because, of course, not every... Well, all models are not 100% accurate. That just doesn't work this way. So, sometimes you have images that the model doesn't know what to do with them. So, what I have in the pipeline, I am anonymizing the images. Of course, it's a demo but you will see on these images there are some personal informations which are fake for the purpose of the demo but they're just to illustrate that you can modify anything, the metadata that are associated based on the rest. So, in the case where the model is not sure about the inference, it will anonymize those new images and it will send them to a central location where you can retrain the model and then send it back to each and every location where it's used. So, that's the pipeline. That's what you have when you want to train the model with OpenShift Data Science. So, as you can see, I'm launching a specific notebook server. So, as a data scientist, I'm working in my notebook environment and I'm creating the model and the code that will do this inference risk. And then at the end of the demo, you end up with this which is full dashboard which shows you in real time the images coming in, being processed, eventually being anonymized. And as part of the demo, I also showcase how we can change the model midway. Let's say you have a new version of the model, model version two that you want to deploy. It also showcases how easy it's done in OpenShift. You just push your container to your repository and then it's automatically deployed wherever it's used without any downtime or anything. So, that's an interesting way. We have a demo on this. If you want to do the workshop for yourself, get in touch with Reddit people. This is something that we are giving, full workshops on how to create those pipelines. Or if you want to simply discuss this kind of things, I'm here till tomorrow, even Friday morning. And I'd be happy to discuss your use cases and how we can help with that. I will leave you with some links on how to have access to all those information. So, first, OpenShift data science itself. If you want to try it, there is a 60 days free trial for you so you can register and you have full access to OpenShift data science if you want to try it for yourself. And then all the case studies that we have for HCA has my flatening and the demo for the expert pipeline. And I'm in time and we even have two minutes or three or four questions. Thank you. Hi, Gim. Thank you. So, Barry Liddy, I'm a UK lead for AI assurance with Deloitte. I guess my question is linked to, given my background, more regulation, I guess the proposed EUI Act suggests that healthcare AI is high risk and should be regulated. I just wanted to get your view as to whether you agree with that and if so, what should the regulation be focused on? That's a tricky question, especially in Europe. There are legislations that are being passed right now about the explainability of the models and the rest. So, there are tons of things that you can do even with this kind of legislation, you know, like the boost or the kind of thing, that's okay. When you are beginning to use neural networks, that's where the great part is. So, as it is now, you see like in this example for the expert pipeline, the goal is not to replace the radiologist, the goal is just to speed up the process. So, ultimately, a practitioner as it is right now and as it should be, if you ask my opinion on this, should review the final thing. But if we can from the start, you know, tag the patients that you should see right now and not in two hours or, you know, that's what we are looking at right now, a way to more easily process those kind of data, okay. I have another example, if I have two minutes, children's hospital in Toronto, they are working on something to speed up times at the emergency room because, you know, how it goes. You come in, you wait two hours, you see a nurse, you wait two hours and ultimately you see a doctor. But what they found is that normally, with given your symptoms, your history and everything, there's a good chance that any way the doctor, what they will say is that you should have these more exams. So, what they have done, they have trained the model, given all those data, that from the start predicts that, okay, there's a 95% chance that any way the doctor will ask for those exams. So, the nurse now is abletated to directly ask for this exam. So, that when the doctor is there, everything is there and he can make the diagnosis. He or she can make the diagnosis. So, that's about speeding up thing, but not replacing. We're not there. We're not there. Yeah, that's okay. You know, he didn't jump from his chair. So, it's okay. Thank you. So, you talked a lot about the platform, the red hot platform for data science. This is also deployed now together with AWS, right? So, that was one of the points. What are the main advantages of these overlays, SageMaker, which is already fully integrated on AWS? Okay. We definitely not competing with SageMaker or Azure ML or whatever. It's just that our clients were asking for some data science solutions on OpenShift, because that's their platform to go for containers and container orchestrations and the rest. Of course, you will find the exact same tools, Jupyter Notebooks, TensorFlow Nimit, but here it's our way to port it to OpenShift in a more integrated way with OpenShift. If you come and see our workshops, for example, we guide you from training a model on this and then directly deploying the model on OpenShift. Okay. Where SageMaker will tell you, okay, the model is trained. Now figure out how to containerize and how to deploy and how to do the life cycle. Because OpenShift was a platform for DevOps, it totally suites for MLOps and the rest, because it's the same tools and it's the same way of thinking about deploying the things. So, that's where we differentiate a little bit. Plus, that's the managed service that we have for now that will come on all cloud platforms and also on-prem. But right now, on-prem people, they are heavily using OpenData Hub, which is the upstream version. So, it's not supported yet, but they're still using it. Yeah. Welcome. Thank you.