 Good morning, right? I don't know if I can top the opening remarks. Good morning, but thanks for joining us for our first keynote. Machine learning on Kubernetes. And how Kubeflow made this process easy for us. I'm extremely excited to be here and share the game-changing experience I have had using Kubeflow in my day-to-day job as a lead data scientist at Shell. Kubeflow makes a machine learning process very easy for data scientists and machine learning engineers. And it's going to create very efficient platforms for data scientists and machine learning engineers to collaborate, share ideas, and learn from their own projects and experiences. And finally, it's going to reduce the cost of model building by managing the computational and storage resources efficiently. But before getting to that, I'm going to pass it to Jimmy so he can describe and give a brief history of Kubeflow and its relationship to machine learning and data science. Back to you. Hi, my name is Jimmy, and I run developer relations at EriKto. For those of you who may not be familiar with EriKto, EriKto was a key contributor to the Kubeflow 1.3, as well as the recently cut 1.4 release. Besides participating in a couple of the Kubeflow working groups, they're also the primary maintainers of the Minikef, as well as the Kale Projects plus EKF, which is the enterprise Kubeflow MLOps platform. Now, I know that we're at KubeCon, so this is going to be a conference. It's all about cloud-native architectures, not necessarily data science or machine learning, which is to say that, likely, the majority of the audience or the viewers that are joining us virtually are going to be cloud-native developers or architects, not necessarily MLOps practitioners or data scientists. So it's not going to hurt to spend just a minute to talk about why the combination of Kubernetes and machine learning are actually a match made in heaven, and here's why. First, containers allow us to create, test, and experiment with machine learning models on our laptops, and we know very well that we can take those same models to production using containers. The idea here is not new, right? We want to write once, reproduce, and run everywhere. Second, a machine learning workflow on our laptop may be written entirely in one language, let's just say Python. But when we take those models to production, we're probably going to want to interact with a variety of different services and applications. So these are going to be things that are going to be doing data management, security, maybe front-end visualizations, et cetera, and here we're going to probably want to go with a microservices-based container architecture. Here again, Kubernetes is going to be a slam dunk for us. Finally, machine learning loves GPUs, but GPUs are expensive, right? So it's not always about how quickly can we spin up an environment and get access to all the resources that we need. In this case, it could be just as important how quickly we can spin down that environment back down to zero. So here again, containers are going to be a perfect fit. Unfortunately, there's an open secret in the industry that a lot of machine learning models are not being successful in making it to production. And the question is, why is this? Well, there's a combination of factors going on here that involve skills, software methodology, and the ability to efficiently collaborate in an organization, and big organizations being what they are. So skills in the sense that we're often asking data scientists to be Kubernetes experts, and we're asking Kubernetes experts to be data scientists, therefore finding the right methodology, the right software, and perhaps a little bit of empathy that's going to be needed in order to collaborate across these teams and be successful can prove to be a little bit elusive. So what are we to do? Enter Kubeflow. Kubeflow is the open source project smack dab in the middle of this big convergence in IT. And here I'm talking specifically about the combined ubiquity of cloud-native architectures and the needs of machine learning workflows. As we know, Kubeflow was originally launched by Google back in 2017 and has since become the most robust open source cloud-native by design, not as an afterthought, ML platform for data scientists, as well as operations folks. It's a complete toolkit of components that allow both data scientists and operators to manage, train, model, and tune and even monitor their workflows. Now that I've said a little bit of context, I'm going to hand it back over to Massoud, who's going to walk us through part of Shell's data science and machine learning journey so we can understand how Kubeflow and its ecosystem of integrations helped solve many of the challenges that they were facing. Massoud, over to you. Thank you. Most of you might know Shell as the old giant. However, in these recent years, Shell has expanded its focus and to other sources of energy, so green and renewable. And to that effort, actually, roughly spent $2 billion annually through 2020 for these kind of new resources. And expected to expand this expenses to even more for years to come. So like you know, it's obvious, stepping into this very, very large-scale environment that you need to get your resources from different sources of energy and distribute and transmit to users that are increasing day by day. And they have drastically different consumption patterns. Need a smart, very fast agile control system. And without having artificial intelligence at the scale, this is not going to be achievable. But having AI at Shell's scale can create some challenges. And our team at Shell faced some of these challenges. The first challenge that we had was creating a development environment, proper development environments for these kind of problems. So as a data scientist, I used to work on the local environments like build some machine learning, simple machine learning models, and using the local data. Now we are going to use a large-scale data set from the Greece, from different countries, all around the US, Europe. And we want to build a model. So it's going to be extremely hard if you want to create an environment like that to be capable of doing some modeling like that. Well, the second challenge, we want to work in these environments. These environments require the specialized skills. If you want to work on your local machine as a simple model, it's going to be really, really easy. But when you want to go and grab these data sets, let's say you want to forecast price. You want to create load consumption. You want to figure out whether you want to figure out the generation. It's the consistency in the Greeks. You need to have a graph of your network. And you need to combine and get the data. It's going to be extremely hard. And now you want to run them on Kubernetes. This is great. But before that, you need to know about containers. You need to know about how to scale. You need to know about GPUs before even getting to the modeling. This can take very, very long term. It's very challenging. And of course, the last part, we don't want to actually bankrupt our IT system. I'm a data scientist. I'm a very selfish person. I wish I could have all the GPUs around the world for me dedicated to me so I can work on it. But is it possible? I wish. We cannot give any couple of GPUs to every data scientist. On top of that, machine learning is a very spiky process. When you are in the development environment, when your code is ready and it's in production, you just need a couple of CPUs to have it running. But when you are in the modeling phase, as I mentioned that the problem is very huge. So you need to have a very huge search space. You need to tune so many parameters. And you need to have a huge computational power. If you are in the production environment and you have a huge computational power, you lose money because you don't need that. On the other hand, when you are in the modeling phase, you want to have a huge computational power because if you don't have it, you're going to lose money by wasting the expensive time of your data scientist. And now we want to see how Kubeflow actually help us to address all of these challenges. The first thing is going to be that Kubeflow actually creates a self-serving model for us. So data scientists can go and grab computational power and storage. And they have pre-configured ML toolkits that exist in the secure cloud environments. How cool is that? Now we can actually bring all of those things and do the machine learning projects easily from minute zero. If you wanted to do that in the old-fashioned, so it could take weeks or even months, now we can do it in just a couple of minutes. The second one, we have Kubeflow Automated Pipeline Engine or KLSCK. We are going to fill in the gap between the data science and software engineering and ML ops. Now our data scientists are capable of using the simple code and bring it and pass it to the ML ops. And it makes the process much faster for us to put the things into production. And finally, since we are using Kubernetes, we can smartly manage our computational and storage resources. For example, we can monitor our notebook servers how they're using our computational power. And if they don't need it, we are going to release those and put it in the port so some others can use it. If we monitor our notebook servers, as I mentioned, so if they are sitting idle for more than 24 hours, we are going to create a snapshot. And we are going to release the resources. And if the data scientists need to use that old server, it's going to use the snapshot. And it's going to start working where he or she stopped earlier. Now I'm going to give you a demo of how easy it is to run a notebook server in the Kubeflow UI. So first thing, we need to create a new server. We just need to give a name, let's say KubeCon. And you can see we have Jupyter Notebook Environment. We have Visual Studio, RStudio. And if you remember, I mentioned we have different ML Toolkits, pre-configured ML Toolkits. We have something for deep learning, different version of TensorFlow, PyTorch. We have something for Spark. We have GPU version of that. And if there is something that doesn't exist here, it's easy to actually bring it up here for other applications. After that, with some simple configuration, we are ready to go. We just need to say, how many CPUs I need, how much memory I need to have for my server. And if I need GPU or not, for example, here, I don't need GPU for the simplicity. And after that, I'm just going to say, how much storage part I need for my notebook server. And I'm going to skip, like you know, for the simplicity, for some other configuration. And we are ready to go. Just click on this beautiful launch button. And you're going to see my notebook server is going to start in a couple of seconds, which could take me a couple of months without having these things. Now we are ready to connect. And as you can see, I just need to have a web browser and a secure internet connection. Now I'm in my server. I have Visual Studio. I have Jupyter Lab. And now if you go to the Jupyter Lab, you're going to see it's very similar to our lovely Jupyter notebook. And but there's something more to that. We have securely connected to AWS and all of my data is located there. So I can bring and drag everything to my Jupyter notebook and I can start doing some data science and cool stuff from minute zero. I'm going to suffer a little bit in this graph and I'm going to share the beauty I see in this graph. For you, it just might be a very simple flow graph. But this graph was very, very lovely to me. It gave me one of those aha moments when I saw it for the first time. I was super excited. When I joined Shell as a data scientist, my first assignment was to build a predictive model. I needed to grab data from different sources. I needed to sub-sample them and I needed to use different model configuration. But I couldn't have a huge search space because I was working locally. Long story short, it took me a month or two months actually to come up with the model in the proof of concept and Jupyter format. I passed it to my coworker and I said, can you productionize that? It took me a month to come up with a model and the performance was not that great. I was lucky at that time to be acquainted to Arikto and Kubeflow. With the help of my coworker, we built the machine learning discipline and we repeated the same experiment in just 35 days from the data processing to deployment and exponentially reduce the time in our second effort to a couple of days. Now in our team, we have some team members with basic programming skills that can apply cutting edge machine learning and deep learning in just a couple of hours. And the story doesn't end here. It's getting even simpler and better for data scientists. We data scientists love Jupyter Notebook. Now, we just grab this Jupyter Notebook at some taxidose cells like import, pipeline, skip and some others and we can push a button and create a pipeline from it. Cale is going to take over that quote for us and create a valid pipeline and it's going to take care of all the data dependencies and it's going to manage the lifecycle of this Jupyter pipeline. And of course, finally, snapshot policies allow us to release idle resources without losing any work. And this ladies and gentlemen was the game-changing experience I wanted to share with you as a data scientist at Shell and how Kubeflow actually helped me to focus on my work and avoid all of those distractions that I was always hesitating to touch at. So I could focus on my work and challenges that we have and huge projects that we have at Shell and be productive and deliver the projects in a timely manner. Thanks everyone for attending and enjoy the rest of the show.