 From around the globe, it's theCUBE with coverage of KubeCon and CloudNativeCon North America 2020 virtual brought to you by Red Hat, the CloudNative Computing Foundation and ecosystem partners. Hey, welcome back everybody. Jeff Frick here with theCUBE, coming to you from our Palo Alto studios for the continuing coverage of KubeCon CloudNativeCon 2020 North America. There was the European version earlier in the summer. It's all virtual, so the good news is is we don't have to get on planes and we can get guests from all over the world. We're excited to welcome back for his return to theCUBE. Ricardo Rocha, he is a staff member and computing engineer at CERN. Ricardo, great to see you. Hello, thanks for having me. Absolutely, and you're coming in from Geneva, so you already had a good Thursday, I bet. Yeah, we're just finishing right now, yeah. Right, so in getting ready for this interview, I was looking at the interview that you did. I think it was two KubeCons ago in May of 2019. And it just strikes me, a lot of people know what CERN is, but a lot of people don't know what CERN is. So I wonder if you can just give kind of the 101 of what CERN's mission is and what are some of the work that you guys do there? Yeah, sure. So CERN is the European Organization for Nuclear Research. We are the largest particle physics laboratory in the world and our main mission is fundamental research. So we try to answer big questions about why don't we see antimatter? What is dark matter or dark energy? Other questions about the origin of the universe. And to answer these questions, we build very large machines, particle accelerators, where we try to recreate some of the moments just after the universe was created, the big bang to try to understand better what was the state of the matter at that time. The result of all of this is very often a lot of data that has to be analyzed. And that's why we traditional have had huge requirements for computing resources during the start of CERN. We always had this large, large requirements. Right. And so you have this large particle accelerators, as you said, large machines. The one that you've got now, the latest one, how long has that one been operational? Yeah, so it started like maybe around 10 years ago. The first launch was a bit before that. And it's a very large, it's the largest one ever built. So it's 27 kilometers in perimeter. We inject protons in two different directions and then we make them collide where we build this huge detectors that can see what's happening in these collisions. The main particle accelerators, this one we do have other experiments. We have an anti-matter factory that is just down from my office. And we have other types of experiments as well going on. 27 kilometers, that's a big number. And then again, just so people get some type of sense of scale. So then you speed up the particles, you smash them together, you see what happens, they collect all the data. What types of data sets are generated off just a one kind of event? And I don't even know if that's a relative, if that's a valid measure. How do you measure kind of quantities of data around event? Just kind of for orders of magnitude. Right, so the way it works is, as you said, we accelerate the particles to very close to the speed of light and we increase the energy by having the beams well controlled. And then at specific points, we make them collide. We have these gigantic detectors on the ground. All of this is 100 meters on the ground. And these detectors are pretty much a very large camera that would take something like 40 million pictures a second. And the result of this is a huge amount of data. Each of these detectors can generate up to one petabyte a second. This is not something we can record. So what we do is we have hardware filters that will bring this down to something we can manage, which is in the order of a few tens of gigabytes per second. Wow, so you've been, you've got a very serious computing challenge ahead of you because you're the one that's on the hook for grabbing the data, recording the data, making the data available for people to use on their experiments. So we're here at KubeCon, CloudNativeCon. Where did containers come into the story and Kubernetes specifically? What was the real challenge that you were trying to overcome? Yeah, so this is a long story of using distributed computing at CERN and other types of computing. So as I mentioned, we generate a lot of data. We generate something like 70 petabytes of data every year. And we have accumulated something over half an exabyte of data by now. So traditionally, we've had to build this software ourselves, which was because there was not so many people around that would have this kind of needs. But this revolution with containers and the clouds appearing kind of allowed us to join other communities and benefit also from their work and not have to do everything ourselves. So this is the main probe for us to start doing this. The other point is more containerization. We traditionally are a very, we have a lot of needs to share information, but also share resources between physicists and engineers. So this idea of containerizing the work, including all the code, all the data, and then sharing this with our colleagues is very appealing. The fact that we can also take this unit of work and just deploy it in any infrastructure that has standardized API like Kubernetes and scale that, monitoring the same way. It's also very appealing. So all of these things kind of connect with our way of working, our natural way of working I would say. Right, so you've talked about this upgrade is coming to the particle accelerator in a couple of four, five years sort of that timeline is relatively soon. This, as you've said before is a huge step function in the data that's going to come off these experiments. I mean, how are you keeping up on the compute side with the fundamental shift in on kind of the physics side and the data that's going to be generated to make sure that you can keep up. And I think you said it in a prior interview somewhere along the way that you don't want to be the bottleneck when there's all this great work being done, but if it's not captured and made available for people to do stuff with the data, then it's not the greatest experiment. So how are you keeping up and what's the relative scale to have what you got to do on the compute side to keep up with the guys on the physics side? Yeah, so the idea, well, what we will have to deal with is an increase of 10 times of more data than we have today. We already have a lot and very soon we'll have a lot more. But this is not, I would say this is not the first time this kind of a step happens in our computing. We always kind of found a new technology or a new way to do things that would improve. In this case, what we do is we do what we always do, which is we try to look for all sorts of new technologies or all sorts of new resources that we could make use of. In this case, a lot is involving improving our own software to replace what we currently use with the hybrid triggers, to replace that with software based using accelerators, GPUs and other types of accelerators. This will play a big role and also making our software more efficient in this way. The second thing that we are doing is trying to make our infrastructure more agile. And this is where our cloud native Kubernetes plays a huge role so that we can benefit from external resources. We can always think of like expanding our on-premises resources, but it's also very good to be able to just go and fish around if there's something available externally. Kubernetes plays a very big role in that respect as well. Yeah, I'd love to dig into that a little deeper because the cloud native foundation is a super active foundation, obviously a ton of activity around Kubernetes. So what does that mean to you as an infrastructure provider to your own company being on the hook to have now kind of an open source community that's supporting you indirectly via ongoing developments and ongoing projects and having, as you said, kind of this broader group of brain power to pull from to help you move your own infrastructure alone? Yeah, I think this is great. We've had really good experiences in the past. We've been heavy users of Linux from a very little time. We've used OpenStack for our private cloud and we've been heavily involved in that community as well. We not only contribute as end users, but we also offer some manpower for development and helping with the community. And we are doing the same with Kubernetes. And this is really, we end up getting a lot more than we were putting in the community. We are quite involved, but it's so large and with such big players that have very similar needs to ours that we end up having a lot more back than we are putting in. We try to help as much as possible, but yeah, we have limited resources as well. Now, open source is an amazing, it's just an amazing innovation machine and obviously it's proved as it's value over a lot of things from Linux to Kubernetes being one of the most recent. I want to shift gears a little bit and ask you just your take on public cloud. One of the huge benefits of public cloud is the flexibility to add capacity, shrink capacity as you need it. And you talked again in a prior thing I was looking at that you definitely have spikes in demand, spikes whether there's a high frequency of experiments, I don't know how frequently you run those things versus maybe a conference or something where you said people want to get access to the data run experiments prior to a conference. Where does public cloud play in your thoughts and maybe you're there today, maybe you're not? How do you think about kind of public cloud generically but more specifically that ability to add a little bit more flex in your compute horsepower or are you just going up into the right, up into the right and not really flexing down very much? Yeah, so this is something we've been working on for a few years now. I would say it's an ongoing work, it's a situation that will not be very clear for the next few years. But again, what we try to do is just to explore as much as possible all kinds of resources that can help us. What we did in the coupon last year was this demonstration that we can actually scale, we can scale out and burst for this spiky workloads we have. We can burst to the public cloud quite easily using this kind of cloud native technologies that we have today. And this is extremely important because it kind of changes our mindset instead of having to think only on investing on premises. We can think that maybe we can cover for the majority of use cases but then explore and burst to the public clouds. This has to be easy in terms of infrastructure and that we are at that point right now with Kubernetes. We also have kind of workload that is maybe easier to do these things than the traditional IT where services are very interconnected. In our case, we are more thinking of batch workloads where we can just submit jobs and then fetch the data back. This also has a few challenges but I would say it's easier than the traditional IT service deployments. The other aspect where the public cloud is also very interesting is for resources that we don't have in large quantities. So we have a very large farm with CPUs. We have some GPUs and it's very good to be able to explore this new accelerator technologies and maybe expand our available pool of accelerators by going to the public cloud, maybe to use them but also to validate to see which ones are best for use cases and explore that option as well. It's not only general capacity, it's really like dedicated hardware that we might not even have ever. Like we think of GPUs or IPUs. It's something that is very interesting that we can scale and just go use them in the public cloud. Yeah, that's a really interesting point because the cloud providers are big enough now, right, that they're building all kinds of specialized servers, specialized CPUs, specialized GPUs. DPUs is a new one. I've heard a data processing unit and as you said, there's FPGAs and all kinds of accelerators. So it is a really rich environment for as you said, to do your experiments and find what the optimal solution is for whatever that particular workload is. But Ricardo, I want to shift gears a little bit as we come to the end of 2020, thankfully for a whole bunch of reasons. As you look forward to 2021, I mean, clearly anticipating and starting to plan to get ready for your upgrade as a priority. I'm just curious, what are your other priorities and how does kind of the compute infrastructure in terms of an investment within CERN, kind of rank with the investment around the physical things that you're building, the big machines, because without the compute, those other things really don't provide much data. And I know those are always talked about how expensive the particle accelerators is, it's an interesting number and it's big, but you guys are a big piece of that as well. So what are your priorities looking forward to 2021? Yeah, from the compute side, I think we are keeping the priorities in similar to what we've been doing in the last few years, which is to make sure that we improve all our automation to improve efficiency as well to prepare for these upgrades we have. But also there's a lot of activity in this new area with machine learning popping up. We have ton of services appearing where people want to start doing machine learning in many, many use cases. In some cases they want to do the filtering in the detectors, in other cases they want to generate simulation data a lot faster using machine learning as well. So I think this will be something that will be a huge topic for next year, even for the next couple of years, which is to see how we can offer our users and physicists the best service so that they don't have to care about the infrastructure, they don't have to know about the details of how they scale their model training, their serving of their models, all of this. I think this will be a very big topic. It's something that it's becoming really a big part of the world computing for high energy physics and for CERN as well. That's great, we see that a lot, just applied machine learning to very specific problems you talked about, you still can't even record all that information that comes off those things. You have to use some compression technology and other things, so real opportunities we barely scratched on the surface of machine learning and AI, but I'm sure you're going to be using it a ton. Well, Ricardo, I'll give you the last word. We're at CNCF's, KubeCon, CloudNativeCon. What do you get out of these types of shows and why is it such, again, kind of why is it such an important piece of your way you get your job done? Yeah, honestly, with all this situation right now, I kind of really miss this kind of conferences in person. It's really a huge opportunity to connect with other end users, but also with the community and to talk to the developers, discuss things over a coffee beer. This is something that is really useful to have these kind of meetings every year. I think what I always try to say is this wall infrastructure is truly making a big impact in the way we do things, so we can only thank the community. It allows us to kind of shift to focusing on a higher level, to focus more on our use cases instead of having to focus so much on the infrastructure. We kind of start giving it as a given that the infrastructure scales and we can just use it and focus on optimizing our own software. So this is a huge contribution. We can only thank the CNCF project and everyone involved. Great, well thank you for that summary and that's a terrific summary. So Ricardo, thank you so much for all your hard work, answering really big, helping answer really big questions and for joining us today and sharing your insight. Thank you very much. All right, he's Ricardo. I'm Jeff, you're watching theCUBE from our Palo Alto studios for continuing coverage of KubeCon CloudNativeCon 2020. Thanks for watching, see you next time.