 From around the globe, it's theCUBE with coverage of KubeCon and CloudNativeCon Europe 2021 virtual, brought to you by Red Hat, the CloudNative Computing Foundation and Ecosystem Partners. Hello, welcome back to theCUBE's coverage of KubeCon, CloudNativeCon 2021, part of the CNCFs, continuing Kube partnership, KubeVirtual here because we're not in person. Soon we'll be out of the pandemic and hopefully in person for the next event. I'm John Furrier, your host of theCUBE. We're here with Ricardo Rocha, Computing Engineer at CERN. Kube alumni, great to see you, Ricardo. Thanks for remoting in all the way across the world. Thanks for coming in. Hello, it's a pleasure. Happy to be here. I saw your talk with Priyanka on LinkedIn and all around the web. Great stuff. As always, you guys do great work over there at CERN. Talk about what's going on with you and the two speaking sessions you have at KubeCon. Pretty exciting news and exciting sessions happening here. So take us through the sessions. Yeah, so actually the two sessions are kind of showing the two types of things we do with Kubernetes. We have a lot of services moving to Kubernetes, but the first one is more on the services we have in-house. So CERN is known for having a lot of data and requiring a lot of computing capacity to analyze all this data. But actually we have also very large community and we have a lot of users and people interested in the stuff we do. So the first session will actually show how we've been migrating our group of infrastructure into Kubernetes and in this case, actually OpenShift. And the challenge there is to run a very large amount of multiple websites on Kubernetes. We run more than 1,000 websites and there will be a demonstration on how we do all the management of the website, lifecycle, including upgrading and deploying new websites and an operator that was developed for this purpose. And then more on the other side, I will give with a colleague also a talk about machine learning. Machine learning has been a big topic for us. A lot of our workloads are migrating to accelerators and can benefit a lot from machine learning. So we're giving a talk about a new service that we've deployed on top of Kubernetes where we try to manage the whole lifecycle of machine learning workloads from data preparation all the way to serving the models, also exploring the Kubernetes features and integrating accelerators and a lot of accelerators. So the one session, it's a large-scale deployment, Kubernetes key there and now the machine learning essentially service for other people to use, is that right? Okay, take me through the first large-scale deployment. What's the key innovation there, in your opinion? Yeah, I think compared to what the infrastructure we have before is this notion that we can develop an operator that will manage a resource, in this case, a website. And this is something that is not always obvious when people start with Kubernetes. It's not just an orchestrator. It's really the API and the capability of managing a huge amount of resources, including custom resources. So the possibility to develop this operator and then manage the lifecycle of something that was defined in-house and that fits our needs. There are challenges there because we have a large amount of websites and they can be pretty active. We also had some scaling issues on the storage that search these websites and we'll give some details during the talk as well. So Kubernetes storage, this is all kind of under the covers making this easier and the machine learning plays nicely in that. What have you, take us for the machine learning use case. What's going on there? What was the discovery? How did you guys put that together? What's the key elements there? Right, so the main challenge there has been that machine learning is quite popular but it's quite spread as well. So we have multiple groups focusing on this but there's no obvious way to centralize not only the resource usage and make it more efficient but also centralize the knowledge of how these procedures can be done. So what we are trying to do is just offer a service to all our users where we help them with infrastructure so that they don't have to focus on that and they can focus just on their workloads and we do everything from exposing the data systems that we have in-house so that they can do access to the data and data preparation and then doing some iteration using notebooks and then doing distributed training with a potentially large amount of GPUs and then storage and serving up the models and all of this is managed with a Kubernetes cluster underneath. We had a lot of knowledge of how to handle Kubernetes and all the features that everyone likes scalability, the reliability, auto-scaling is very important for this type of workload, this is key. Yeah, it's interesting to see how Kubernetes is maturing. Congratulations on the projects. They're going to probably continue to scale. I remember this reminds me of when I was coming into the business in the late 80s, early 90s with TCPIP and the OSI model. You saw the standards evolve and get settled in and then boom, innovation everywhere and that took about a year or two to gestate and scale up. It's happening much faster now with Kubernetes. I have to ask you, what's your experience with the question that people are looking to get answered which is as Kubernetes goes the next generation as the next step, people want to integrate. So how is Kubernetes exposing APIs to say integration points for tools and other things? Can you share your experience and where this is going? What's happening now and where it goes? Because there's no debate. People like the Kubernetes aspect of it but now it's integration is the conversation. Can you share your thoughts on that? I can try. So I would say it's a moving target but I would say the fact that there's such a rich ecosystem around Kubernetes with all the cloud-navig projects, it's like a real proof that the popularity of the API. And this is also something that we, after we had the first step of deploying and understanding Kubernetes, we started seeing the potential that it's not reaching only the infrastructure itself, it's reaching all the layers, all the stack that we support in-house on premises. And also it's opening up doors to easily scale into external resources as well. So what we've been trying to tell our users is to rely on these integrations as much as possible. So this means like the application lifecycle being managed with things like Helm and GitOps but also like the monitoring being managed with Prometheus. And once you're happy with your deployment in-house, we have ways to scale out to external resources including public clouds. And this is really like, I don't know, a proof that all these APIs are not only popular but incredibly useful because there's such a rich ecosystem around it. So talk about the role of data in this. Obviously the machine learning piece is something that everyone's interested in as you get infrastructure as code and DevOps and DevSecOps as everything's shifting left. I love that narrative, day two operation. All this is all proving maturation. Data is critical, right? So now you get real-time information, real-time data. The expectation for the apps is to integrate the data. What's your view on how this is progressing from your standpoint? Because in machine learning, as you mentioned, acceleration or being part of another system, caching has always done that with say databases, right? So you got, now as databases get slower, caches are getting faster, now they're all the ones. So it's all changing. So what's your thoughts on this next level data equation into Kubernetes because stateless is cool but now you got state issues. Yeah, so yeah, we've always had huge needs for data. We store and I think we are over half an exabyte of data available on premises. But we kind of have our own storage systems which are external and that's for like the physics data, the raw data and one particularity that we had with our workloads until recently is that we call them embarrassing parallel in the sense that they don't really need very tight connectivity between the different workloads. So if we deploy say tens of thousands of jobs to do some analysis, they're actually quite independent. They will produce a lot more data but we can store them independently. Machine learning is posing a challenge in the sense that this training tends to be a lot more interconnected. So it can benefit from systems that we are not so familiar with. So for us, it's maybe not so much the caching layers themselves is really understanding how our infrastructure needs to evolve on premises to support this kind of workloads. We had some smallish, more high-performance computing clusters with things like Infiniband for low latency but this is not the bulk of our workloads. This is not what we are experts on these days. This is the transition we are doing towards supporting this machine learning workloads. Just as a reference for the folks watching, you mentioned embarrassing parallel and that's a quote that I read on your CERN tech blog. So if you go to techblog.web.cern.ch or just search CERN tech blog, you'll see the post there. And good stuff there. In there, you lay out a bunch of other things too where you start to see the deployment services and custom resource definitions being part of this. Is it going to get to the point where automation is a bigger part of the cluster management, setting stuff up quicker. As you look at some of the innovations you're doing with brooch machines and Kubbert and databases and thousands of other point things that you're working on there. I mean, I know you got a lot going on there. It's in the post but we don't want to have the problem of it's so hard to stand up and manage and this is what people want to make simpler. How do you answer that when people say we want to make it easier? Yeah, so for us, it's really automate everything. And up to now it has been automate the deployments in the Kubernetes clusters. Right now we are looking at automating the Kubernetes clusters themselves. So there's some really interesting projects. So people are used to using things like Terraform to manage the deployment of the clusters. But there are some projects like cross-playing, for example, that allows us to have the clusters themselves being resources within Kubernetes. And this is something we are exploring quite a bit. This allows us to also abstract the Kubernetes clusters themselves as Kubernetes resources. So this idea of having a central cluster that will manage a much larger infrastructure. So this is something that we are exploring. The GitHub's part is really key for us to, it's something that eases the transition from people that are used already to manage large-scale systems but are not necessarily experts on Kubernetes. They see that there's an easier path there if they can be introduced slowly through the centralized configuration. You mentioned cross-playing I had to saw him on earlier. He's awesome dude, great guy. And I was smiling because I still have flashbacks and trigger episodes from the Hadoop world. When it was such so promising that technology but it was just so hard to stand up and manage to be like really an expert to do that. And I think you mentioned cross-playing this comes up to the whole operator notion of operating the clusters. So this comes back down to provisioning and managing the infrastructure which we all know is key, right? But when you start getting into multi-cloud and multiple environments, that's where it becomes challenging and I like what they're doing. Is that something that's on your mind too around hybrid and multi-cloud? Can you share your thoughts on that whole trajectory? Absolutely. So I actually gave an internal seminar just last week describing what we've been playing with in this area. And I showed some demo of using cross-playing to manage clusters on-premises but also manage clusters are running on public clouds, AWS, Google Cloud and Azure. And it's really like the goal, there are many reasons we want to explore external resources. We are kind of used to this because we have a lot of sites around the world that collaborate with us. But specifically for public clouds, there are some motivations there. The first one is this idea that we have periodic load spikes. So we really have international conferences. The number of analysis and job requests goes up quite a bit. So we need to be able to like scale on demand for short periods instead of over provisioning in-house. The second one is again, coming back to machine learning, this idea of accelerators. We have a lot of CPUs. We have a lot less GPUs. So it would be nice to go and fish for those in the public clouds. And then there's also other accelerators that are quite interesting like TPUs and IPUs that will definitely play a role. And we probably, or maybe we will never have them on-premises, we'll only be able to use them externally. So in that respect, actually coming back to your previous question, this idea of storage then becomes quite important. So what we've been playing with is not only managing this external cluster centrally, but also managing the wall infrastructure from a central place. So this means making all the clusters, whatever they are, look very much the same, including like the monitoring and the aggregation of the monitoring centrally. And then as we talked about storage, this idea of having local storage that will allow us to do really quick software distribution but also access to the data. What you guys are doing as we say, cool and relevant projects. I mean, you got the large scale deployments and the machine learning to really kind of accelerate, which will drive a lot of adoption in terms of automation. And as that kicks in, I mean, you got to get the foundational work done. I see that clearly the right trajectory. You know, it reminds me, Ricardo, you know, again, not to do a little history lesson here, but you know, back when network protocols were moving from proprietary SNA for IBM, Decnet for digital back in the old days, the OSI open systems interconnect standard stack was evolving. And yeah, when TCPIP came around, that really opened up this interoperability, right? And Basam and I were talking about this kind of cross cloud connections or inter-clouding as Luke Tucker and I talked at an open stack in 2013 about inter-networking or inter-connections and it's about integration and interoperability. This is like the next gen conversation that Kubernetes is having. So as you get to scale up, which is happening very fast, as you get machine learning, which can handle data and enable modern applications, really it's connecting networks and connecting systems together. This is a huge architectural innovation direction. Could you share your reaction to that? Yeah, so actually we are starting the easy way, I would say we are starting with the workloads that are loosely coupled, that we don't necessarily have to have this tight inter-connectivity between the different deployments. I would say this is already giving us a lot because the bulk of our workloads are this kind of batch embarrassing parallel. And we are also doing like co-location when we have large workloads that meet this kind of close inter-connectivity, then we kind of co-locate them in the same deployment, same clouds in region. I think like what you describe of having cross cloud inter-connectivity, this will be like a huge topic, it is already I would say. So we started investigating a lot of service mesh options to try to learn what we can gain from it. There's clearly a benefit for managing services, but there will be definitely also potential to allow us to kind of more easily scale out across regions. We've seen this by using the public cloud, some things that we found is, for example, this idea of infinite capacity, which is kind of sometimes, it feels kind of like that, even at the scale we have for CPUs, but when you start using accelerators, you start negotiating, like maybe use multiple regions because there's not enough capacity in a single region and you start having to talk to the cloud providers to negotiate this, and this makes the deployments more complicated, of course, and so this inter-connectivity between regions and clouds will be a big thing. Yeah, and again, low hanging fruit is just the kind of existing market, but I was throwing the vision out there, mainly to kind of talk about what we're seeing, which is the world's a distributed computer, and if you have the standards, good things happen. Open systems, innovating in the open really could make a big difference. It's going to be the difference between real value for the society or global society, or are we going to get into these siloed worlds? So I think the choice is the industry, and I think CERN and CNCF and Linux Foundation and all the companies that are investing in open really is a key inflection point for us right now. So congratulations, thanks for coming on theCUBE. Appreciate it. Thank you. Okay, Ricardo Rocha, computing engineer at CERN here in theCUBE coverage of the CNCF, KubeCon, CloudNativeCon Europe. I'm John Furrier, your host of theCUBE. Thanks for watching.