 Hi everybody and welcome. I'm Doug Davis. Theodore and I are going to be talking today about mining large biometric omics data inside of a Kubernetes cluster. And with that, we go hand it over to Theodore so we can go and introduce himself and then kick it off. Cool. Thanks, Doug, for the introduction. I'm very excited to be here and to present the work that we're doing at European Molecular Biology Laboratory, or EMBL. And EMBL is international research organization focused on life sciences. It has 27 member states, mainly focused in Europe, and with different sites in Heidelberg headquarters, but also in many other cities throughout Europe. And I present here the work of the team that I'm leading at EMBL Heidelberg, which is focusing on developing new methods for life sciences. And here, you see our faces, you see who we are, we're a team of scientists, but also software engineers and computer scientists. And what we are working on, we are trying to develop methods to really improve the use of this new technology called spatial metabolomics for life sciences with applications in biology and medicine. So I will not go much into the detail what spatial metabolomics is about, but you can see our scientific officer, Prasad, taking a very thin section of a tissue and putting it on a glass slide and putting it into this instrumentation, which is called imaging spectrometry. And there will be some magic happening there, producing very unique data. Why is it important? Because currently, this technology is becoming essential for biology, for medicine, for drug development, across a variety of biological questions, but also diseases such as cancer, diabetes, NASH, and also developing novel types of therapies like immunotherapy. So and the technology, Prasad, and also the tools and the software that I'll show you next is used by scientists from across the globe, from universities, university hospitals, from governmental organizations, pharma, contract research organizations, startups, you name it. So let me get you one glimpse into what this data, spatial metabolomics data can represent and what the challenges are behind. So first of all, it's very special data because you can think of this as an image, but not with just three RGB channels, but with over 10,000 channels. And you can start thinking that, oh, yeah, so then probably this data is pretty big. And yes, it is. Just from one tissue section or from a sample, we can easily generate more than 10 gigabytes. And often we generate the data which is close to one terabyte coming from one experiment, which we're generating in a few hours. Why is it so big? Because every channel corresponds to a particular molecule. And here I'm showing you just one example from a public data set from a mouse section, which is used by Janet Pack, a pharma company in the USA for drug development. And what you are looking at here, you're looking at the localization of four different molecules across this tissue section. And why is it important? Because by getting this information, we understand where does the drug go? What does it do? Does it have any side effects? And if there are side effects happening, where are actually which organ they are associating? And this is just one application of this technology. So our team at Emboll over the past years, we have developed MetaSpace. And MetaSpace is a very core of this presentation. So let me take one slide to explain it. So it is a cloud engine and also a knowledge base where a lot of users from across the globe, they put the data in. There is some computation going on there on high performance computing. They get images, they look through the images, they exchange, they publish them, and so on and so forth. And they also allow others to use them and reuse as public data. So we have a number of submissions, more than 10,000 submissions over the past three years, many users, many labs, most of the labs in our field to use this essential resource, they publish results, they get new biology, they get new medicine. So it's pretty important platform in our field, which enables data analysis, data reuse, data sharing, and overall open science in biology and medicine. So let me show you how it looks, just to get you the feeling about what's real behind it. So you can go to metaspace2020.eu, you can select one of the projects or folders that we have there. Here it's exactly that project with the public data from pharma companies. You can pick one data set here, and once you pick it, then you will see a lot of molecules which were found by our engine hidden in this data. And now you can go through this data, molecule by molecule, it will visualize it, you can share it, you can understand in the prep and so on and so forth. It's pretty user-friendly technology and platform. So recently we got some need and appetite for new computer technologies used to be used in metaspace, and why? Because the platform is actually growing now. So there is a pretty good uptake in the field and there is also growth in the number of submissions, and in particular over the past half a year we experienced super linear growth in the number of submissions. And also what's very special about our processing is that the submissions they come irregularly. So sometimes there is like hundreds of them, sometimes there is nothing. So we experienced a few pain points in our software system, and one of is deployment delay. So scientists they want to see results immediately, and if we need to wait for the deployment of some system we use Apache Spark in the past, it's actually not very convenient. The second one is infrastructure management and queue management. If some data set gets stuck in the queue, we need to go, we need to fix it because it's kind of a painstaking process. And also resource planning, because we don't really know how many data sets will be coming because, again, it's used by the community and sometimes there is demand, sometimes not. All this brought us to motivate, motivated us to actually source for a new solution, in particular, because all competing problem and the process is embarrassingly available. So it was very interesting and kind of like happy coincidence for us that we were invited into a project cloud button, a European project academic funded by European Commission to work in particular together with IBM to co-develop and apply user-friendly serverless framework. And here you see the leading people who are working on this project, Gil Verning from IBM and Lachlan Stewart from our team. And what they've been working on to apply the technology which was developed by Gil and others called Lithops, which is serverless competing framework, and a key result in this project to apply it for our system meta space. And in the next, you will see how these three pieces are connected, our platform cloud engine meta space, Lithops as serverless framework, and code engine from IBM. Over to you, Doug. Thank you Theo. Okay, before we actually start talking about a little bit of deep dive into code engine itself, I think it's important to talk a little bit about sort of what led up to the creation of code engine itself as a platform in which to run different types of workloads. And you actually saw a lot of this already with what Theo was saying, and I'd like to look at it sort of like lessons learned or the state of the community kind of thing, right? We realize that users or developers really want to focus on producing value, either for their customers or to get their job done, whatever that actually is, right? They don't really want to work on managing and setting up infrastructure, right? To them, infrastructure is just a means to an end. It's just a tool that they need to use to get their job done. For the most part, they don't even want to think about it, right? They're there to get something finished and completed, okay? So the entire idea of being asked to manage, you learn, whatever you want to call it, the infrastructure is really, like I said on here, an undesirable burden, right? It's a nuisance to them, for the most part, because it's a distraction, okay? Now, from this K-native community, I'm sorry, not K-native, from the cloud native community perspective, the community itself is also starting to realize this as well, right? Managing infrastructure is actually hard, regardless of what that infrastructure is, whether it's Spark computing, like a lot of AI systems or machine learning type systems use, or Kubernetes itself, those are wonderful systems, but they're actually kind of complex, especially for people who aren't necessarily developers, right? They're people who have other jobs to get done, like research scientists and stuff like we're seeing, right? So it's difficult for them to come on board to actually use these things. And of course, there's this other problem where there are many different platforms out there to get your job done, right? You have generic Kubernetes, you have things like a serverless environment, and stuff like that. And each platform is actually going to present a different set of features. And with those different features also come different sets of constraints. So oftentimes, people feel the need to conform their design of their workload to what the platform sort of boxes them into. And that's a bit of a problem, because that means they don't necessarily take the right solution for their job. And so in our point of view, really what it means is developers should code, not manage infrastructure, if you get down to the brass tacks of the thing. And IBM, from IBM's side, we have a better solution to start to solve this. And that's what code ends all about. So we looked at this problem and said, what if at a high level, the developer could basically focus on just their workload, meaning their function, their application, their batch job, whatever the workload is, as long as they can containerize it in some fashion, that's all they should really focus on. Okay, now, whether they give that to us as source code, or as a pre-built container image, doesn't really matter, they choose what they want to give it to us, right? But along with that, if they give us the runtime semantics that they want, right, something like, does it scale or not, you know, how many, what are the minimum number of instances there? Things like that, right, those are the runtime semantics, and that's the only set of inputs into a platform, right? Then under the covers, if they give us a container image, we'll just use that directly when we deploy the workload. If they give us source code, we'll build it for them, right? And that's what you see today with some platforms like platforms of service functions and stuff like that, right? But then of course, obviously, we take those runtime behaviors, and all three of those variants feed into the actual cloud native workload engine itself or hosting environment itself, right? And for the most part, the user's job is done, right? It's now the workload's job to properly host and manage the workload, the infrastructure that goes around it, and everything else. Now, obviously, that's not quite all you need, right? As a workload, I'm sorry, yeah, you oftentimes workloads need to actually connect up to things like managed services, so we need to make sure you can do that. A lot of these things are going to respond to incoming messages and stuff like that. So obviously, you have to have networking that does automatic setup for you, scaling, traffic splitting, if you're going to do a blue green deployment type stuff or upgrades, that kind of stuff. If you're connecting up to event producers, you're going to have to have some sort of event orchestration engine in there. Even in the simple case of deliver the event just once to one particular destination, that's the simple case. But what if you want that one event to get spanned out to lots of different services you have running in there, you need some sort of orchestrator at some point. But then, of course, as the backbone of all this, you, of course, need security and compliance built into the system so that the end user or the developer doesn't need to think about it. It's just there. So this is sort of our dream platform from an abstract perspective. But of course, the main point here is the stuff in sort of this pinkish box on the inside, that's all a developer needs to worry about. They just worry about their workload from a coding perspective, a container image perspective, and what is the runtime behavior they want. Everything else on the right hand side should be basically hidden from them and just managed under the covers for them. And that's exactly what Code Engine is doing for them. And that's what you see, I feel, basically talking about here. So let me just sort of quickly mention in a little more detail exactly what Code Engine does for you. Because Code Engine is a managed, hosted environment in the cloud. And with Code Engine, you give us your source code and what do you get? For applications, meaning things that actually respond to HTTP requests, you get an internet-exposed securely hosted workload. Now, if you don't want it to be internet-exposed, you don't have to if you don't want to. You can make it private, which means it's only available to other things inside your workspace. That workload that will automatically scale up and down based upon the load coming in, even down to zero, which means if it's not being used at that exact moment, scales down to zero, you don't pay for it. You have zero downtime for upgrades. We'll slowly shift the traffic over from your old version to the new one. So your users of your application never see the shift over or never see a downtime between the shift. If you happen to have a workload that doesn't respond to HTTP requests, this is what we typically call batch jobs. These are jobs where your container comes up, does a particular thing, and then goes away when it's done. Data processing type stuff, exactly what Theo is doing in his workloads. Those are batch jobs. You can scale those up as well. Just tell them how big you want, what is the size of your workload. We'll scale it, bring up the resources. When it's done, scale it back down. You don't need to worry about it. Just tell us what you want. Obviously, you can then connect to the host of services, but of course, one of the key things here is you only pay when your code is actually running. Your batch job scales down to zero because it's all done, you stop paying. Your internet-facing application that's receiving requests or events, only pay for when it's actually running and processing those events. Very important for people. Of course, there's a simplified user experience around all this, which we obviously really like, and so that's part of the height of the infrastructure from you. But obviously, you could only do some customizations, and these are those runtime semantics that I talked about. How do you want your application to scale from a minimax perspective? When do you want it to scale? What is your resource usage, memory, CPU, those kind of stuff? How do you want the traffic splitting handled? All those kind of things, you still have available to you through our simplified user experience. Of course, because it's all in one particular platform, regardless of the type of workload you have, applications, functions, source code, batch jobs, they all live together, which means you get built in security from a networking perspective. They can all talk to one another securely, and you don't have to figure out how to make those happen, because they're not on separate platforms anymore. Just to wrap this thing up here, because I really like this slide, I think it presents everything in a nice simple fashion. With Code Engine, as a developer, you can deploy any type of application, whether it's a container, batch job, your source code, your function, on a single unified platform without provisioning, managing, securing, configuring anything in the infrastructure, meaning clusters, networks, VM certificates, you name it. It's all hidden from you. You don't need to touch it. And best of all, you only pay when your application or workload is actually active. And with that, I believe I'm done. Let me go ahead and hand it back to Theo to kind of wrap this up. Thanks, Doug. So let me wrap up this talk. So I would like to share our story and how actually both stories of us developing this serverless technology for our needs and the Code Engine, which was recently released, converged into this trio. So what we got and what we learned out of this is that right now we have implemented technology which provides scalable solution for science and our users from science, from academia, from industry. For us, it was very nice and also will be very useful in the future, because right now we have less infrastructure overhead, overhead in terms of our time, in terms of cost, and in terms of delays as well. And also we don't need to plan for the resources ahead. And this is very important for us, because we can actually use and provide the system to our users as they go. And I think it was very nice that we got this opportunity to work with a very talented team at IBM who not only set up Code Engine, which is great for us because it provided our success to native support for serverless, but also the on-demand scaling definitely pay as you go a very handy feature. And easier for us to code because there are no minimal artificial constraints on memory and CPU. And this link by technology or framework, Lithops, and I'm pretty sure there are many other technologies which allow to use the power of Code Engine. But for us, Lithops was instrumental because it kind of removed all these bottlenecks and simplified the coding files, allowed us to really hide the complexity and focus our business logic action scientific process and scientific algorithms, and also hide the complexity of the runtime platform and all the overhead connected to it. With this, I would like to thank everyone at the IBM team who contributed to this work, and particularly Gil Verning. And it was also a pleasure to work on this with the rest of IBM team with a doc on this work. And if you are interested, you can check out our joint blog post, Decoding Dark Molecular Matter in Spatial Metabolomics with the IBM Cloud Functions, which was for the previous generation, and now it will replace Decoding the Dark Molecular Matter with the IBM Code Engine. Thank you for your attention. All right, so as Theo said, we'll jump into the Q&A now. Just one final note. If you are interested in Code Engine, you can stop by the IBM booth here at KubeCon, and we'll answer any of your questions for you. And of course, there's a link here on the screen. You can actually go play with it yourself at cloud.ibm.com slash code engine. And with that, we'll now jump into the live Q&A portion. Thank you, everybody.