 Live from Barcelona, Spain, it's theCUBE, covering KubeCon CloudNativeCon Europe 2019. Brought to you by Red Hat, the CloudNative Computing Foundation and ecosystem partners. Welcome back to theCUBE here at KubeCon CloudNativeCon 2019 in Barcelona, Spain. I'm Stu Miniman. My co-host is Corey Quinn and we're thrilled to welcome to the program two gentlemen from CERN. Of course CERN needs no introduction. We're going to talk some science, going to talk some tech. To my right here is Ricardo Rocha who's the computer engineer and Lucas Heinrich who's a physicist. So Lucas, let's start with you. If you're a traditional enterprise we talk about your business but talk about your projects, your applications. What piece of fantastic science is your team working on? Right, so I work on an experiment that is situated with the Large Hadron Collider. So it's a PARC accelerator experiments where we accelerate protons which are hydrogen nuclei to a very high energy so that they almost go with the speed of light and so we have a large tunnel underground, 100 meters underground in Geneva so it's rattling the border of France and Switzerland and there we are accelerating two beams, one is going clockwise, the other one is going counterclockwise and there we collide them and so I work on an experiment that kind of looks at these collisions and then analyzes this data. Yeah, Lucas, if I can, you know, when you talk to most companies, you talk about scale, you talk about latency, you talk about performance, those have real world implications for your world. Maybe do you have anything you can share there? Yeah, so one of the main things that we need to do so we collide 40 million times a second these protons and we need to analyze them in real time because we cannot write out all the collision data to desk leaders, we don't have enough disk space and so we essentially run 10,000 core real time application to analyze this data in real time and see what collisions are actually most interesting and then only those get written out to this so this is a system that I work on called the trigger and yeah, that's pretty dependent on latency. All right, Ricardo, luckily your job's easy. We say most people you need to respond to what the business needs for you and don't worry, you can't go against the laws of physics. Well, you're working on physics here and boy, those are some hefty requirements there. Talk a little bit about that dynamic and how your team has to deal with some pretty tough challenges. Right, so as Lucas was saying, we have this large amount of data. The machines can generate something on the order of petabyte a second and then thanks to their hardware and software level triggers, they will reduce this to something that's 10 gigabytes a second and that's what my side has to handle. So it's still a lot of data. We are collecting something like 70 petabytes a year and we keep adding. So right now we have the amount of storage available is on the order of 400 petabytes. We're starting to get to a pretty large scale and then we have to analyze all of this. So we have one big data center at CERN which is 300,000 cores or something like this around that but that's not enough. So what we've done over the last 15, 20 years, we created this large distributed computing environment around the world. We link to many different institutes and research labs together and this doubles our capacity. So that's our challenge is to make sure that all the effort that the physicists put into building this large machine that in the end it's not a computing that is breaking the whole system. We have to keep up, yep. One thing that I always find fascinating is people who are dealing with real problems that I push our conception of what scale starts to look like. And when you're talking about things like a petabyte a second, that's beyond the comprehension of what most of us can wind up talking about. One problem that I've seen historically with a number of different infrastructure approaches is it requires a fair level of complexity to go from this problem to this problem to this problem and you have to wind up working through a bunch of layers of abstraction and at the end result is, and at the end of all of this, we can run our blog that gets eight visits a day and that just doesn't seem to make sense. Whereas when you're talking about that level of complexity is more than justified. So my question for you is as you start seeing these things evolve and looking at other best practices and guidance from folks who are doing far less data intensive applications, are you seeing that a lot of the best practices start to fall down as you're pushing theoretical boundaries of scale? Right, that's actually a good point. Like the physicists are very good at getting things done and they don't worry that much about the process. As long as in the end it works but like there's always this kind of split between the physicists and the more computing engineer where the practices we want to establish practices but at the end of the day, like we have a large machine that has to work. So sometimes we skip a couple of steps but we still need, there's still quite a lot of control on like data quality and the software validation and all of this. But yeah, it's a non-traditional environment in terms of IT I would say. It's much more fast-pacing than most traditional companies. Now you mentioned you had how many cores working on these problems on site? So in-house we have 300,000. If you were to do a full migration to the public cloud you'd almost have to repurpose that many cores just to calculating out the bill at that point. Just because all the different dimensions everything winds up working on at that scale becomes almost completely non-trivial. I don't often say that I'm not sure public cloud can scale to the level that someone would need to. In your case that becomes a very real concern. Yeah, so that's one debate we are having now and it has a lot of advantages to have the computing in-house and also because we pretty much use it 24-7 it's a very different type of workload. So we need a lot of resources 24-7 like even the pricing is kind of calculated differently. But the issue we have now is that the accelerator will go through a major upgrade just in five years time where we'll increase the amount of data by 100 times. So now we're talking about 70 petabytes a year and we're very soon talking about like exabytes. So the amount of computing we'll need there is just going to explode. So we need all the options. We're looking into GPUs and machine learning to change how we do computing and we are looking at any kind of additional resources we might get and there the public cloud will probably play a role. Yeah, can you speak to kind of the dynamic of how something like an upgrade of that, how do you work together? I can't imagine that you just say, well, we built it whatever we needed and everything and throw it over the wall and make sure it works. Right, I mean, so I work a lot on this boundary between computing and physics and so internally I think we also go through the same processes as a lot of companies that we're trying to educate people like on the physics side, how to go through the best practices because it's also important. So one thing that I stress also in the keynote is this idea of reproducibility and re-usability of scientific software is pretty important. So we teach people to containerize their applications and make them reusable and stuff like that. Anything about that relationship you can expand on? Yeah, so like this keynote we had yesterday is a perfect example of how this is like improving a lot at CERN. So we were actually using data from CMS which was one of the experiments. Lucas is a physicist in Atlas which is like a computing experiment kind of. I'm in IT and all this containerized infrastructure kind of is getting us all together because computing is getting much easier in terms of how to share pieces of software and even infrastructure and this helps us a lot internally also. Yeah, so what particular about Kubernetes helps your environment? You talked for 15 years you've been on this distributed systems build out so sounds like you were the hipsters when it came to some of these solutions we're working on today. So that has been like a major change. Lucas mentioned the container part for the software reproducibility but I've been working on the infrastructure for joint CERN as a student and I've been working on the distributed infrastructure for many years and we basically had to write our own tools like storage systems, all the batch systems over the years and suddenly like with this public cloud explosion and open source usage, we can just go and join communities that have requirements sometimes that are higher than ours and we can focus really on the application development. If we base, if we start writing software using Kubernetes then not only we get this flexibility of choosing different public clouds or different infrastructures but also we don't have to care so much about the core infrastructure all the monitoring, log collection, restarting. Kubernetes is very important for us in this respect. We kind of remove a lot of the software we were depending on for many years. What's, so these days as you look at these at this build out and what you're looking not just what you're doing today but what you're looking to build in the upcoming years, are you viewing containers as the fundamental primitive of what empowers this? Are you looking at virtual machines as that primitive? Are you looking at functions? Where exactly do you draw the abstraction layer as you start building this architecture? So, yeah, traditional we've been using virtual machines for like the last maybe 10 years almost or I don't know, eight years at least. And we see containerization happening very quickly and maybe Lucas can say a bit more about the physics how this is important on the physics side. Yeah, what's been mentioned so currently I think we are looking at containers for the main abstraction because it's also we go to things like functions as a service, what's kind of special about scientific applications that we don't usually just have our entire code base on one software stack, right? It's not like we've deployed no JS application or Python stack and that's it. And so sometimes you have a complete mix between C++, Python, Fortran and all that stuff. So this idea that we can build the entire software stack as we want it is pretty important. So even for functions as a service where traditional you had just a limited choice of runtimes, this becomes important. And like from our side, the virtual machine still had a very complex setup to be able to support all this diversity of software and the containerization just all the people have to give us is like run this building block and it's kind of standard interface. So we only have to build infrastructure to be able to handle these pieces. Well, I don't think anyone can dispute that you folks are experts in taking larger things and breaking them down into constituent components thereof. I mean, you are quite obviously the world experts on that. But was there a challenge to you as you went through that process of, I don't necessarily even want to say modernizing, but in changing your viewpoint of those primitives as you've evolved, have you seen that there were challenges in gaining buy in throughout the organization? Was there a push back? Was it culturally painful to wind up moving away from the virtual machine approach into a containerized world? Right, so yeah, a bit of course, but traditionally we like physicists really focus on their end goal. We often say that we don't count how many cores or whatever we care about events per second, how many events we can process per second. So it's a kind of more open-minded community maybe than traditional IT. So we don't care so much about which technology we use at some point as long as the job gets done. So yeah, there's a bit of traction sometimes, but there's also a push when you can demonstrate that we get a clear benefit, then it's kind of easier to push it. Yeah, well it's a little bit special maybe also for particle physics that it's not only CERN that is a researcher, we are an international collaboration of many, many institutes all around the world that work on the same project which is just hosted at CERN. And so it's a very flat hierarchy and people do have the freedom to try out things and so it's not, we have like a top-down mandate what technology we use and then somebody tries something out if it works and people see a value in it, you get adoption from it. The collaboration with the data volumes you're talking about as well has got to be intense. I think you're a little bit beyond the, okay we ran the experiment, we put the data in Dropbox, go ahead and download it, you'll get that in only 18 short years. It seems like there's absolutely a challenge for that. That was one of the key points actually in the keynote is that, so a lot of the experiments at CERN have an open data policy where we release our data and so that's great because we think it's important for open science but it was always a bit of a issue like who can actually practically analyze this data for people who don't have a data center and so one of the part of the keynote was that we could demonstrate that using Kubernetes and public cloud infrastructure actually becomes possible for people who don't work at CERN to analyze this large scale scientific data. I mean maybe just for our audience, the punchline is rediscovering the Higgs boson in the public cloud, maybe just give our audience a little bit of taste of that. Right, so basically what we did is so the Higgs boson was discovered in 2012 by both Atlas and CMS and so part of that data, so we used open data from CMS and part of that data has now been released publicly and basically this was a 70 terabyte data set which we, like thanks to our Google cloud partners could put onto public cloud infrastructure and then we analyzed it on a large scale Kubernetes cluster and a lot would have been in just a few minutes. Like we publish it and then we say you probably need a month to process it but we had like 20 minutes on the keynote so we kind of needed a bit larger infrastructure than usual to run it like down to five minutes or less. In the end it all worked out but that was a bit of a challenge. How are you approaching, I guess, making this more accessible to more people? By which I mean not just other research institutions scattered around the world but students, individual students, sometimes in emerging economies where they don't have access to the kinds of resources that many of us take for granted, particularly work for prestigious research institutions. What are you doing to make, I guess, this more accessible to high school kids, for example. Folks who are just dipping their toes into a world they find fascinating. We have entire programs, outreach programs that go to high schools. I've been doing this when I was a student in Germany. We would go to high schools and we would host workshops and people would analyze a lot of this data themselves on their computer so we would come with USB sticks that have data on them and they could analyze it and so part of also the open data strategy from Atlas is to use that open data for educational purposes and then there are also programs in emerging countries. Lucas and Ricardo, really appreciate you sharing the open data, open science mission that you have with our audience. Thank you so much for joining us. Thank you. Thank you. For Corey Quinn, I'm Stu Miniman. We're in day two of two days live coverage here at KubeCon, CloudNativeCon 2019. Thank you for watching the Kube.