 Live from Orlando, Florida, it's theCUBE. Covering Pentaho World 2017. Brought to you by Hitachi Ventura. Welcome back to theCUBE's live coverage of Pentaho World brought to you by Hitachi Ventura. I'm your host, Rebecca Knight, along with my co-host, Dave Vellante. We are joined by Derek Matheson. He is the group leader at CERN. Welcome, Derek. Glad to have you on the show. Well, I'm glad to be here. Thank you very much. So CERN, which is of course the European Organization for Nuclear Research and we think of it as this place of physicists and engineers working together to solve these problems and probe the mysteries of the universe. But in fact, CERN is a technology organization. Absolutely. I mean, I think that's the, CERN has this reputation of being exclusively physics. I mean, it is the world leading particle physics research laboratory. But in fact, in the end, yeah, we're an infrastructure organization who provides all the technology, all the science, and all the scientists and engineers come to CERN to do their work. But CERN itself provides the facilities. So our main focus, in fact, is technology, computer science, civil engineering, construction. I mean, we build cathedral-sized concrete structures 450 feet underground, 17-mile-long tunnels. I mean, this is civil engineering in the grand scale. And that's actually one of the major focuses is that CERN, although it's a physics organization, one of the difficulties we have as an organization is to explain to people, in fact, what we're looking for when we're recruiting, when we're contacting other universities, is all about the fact that we're not looking for physicists, we're looking for engineers and technology specialists to come and work at CERN. So talk to us about some of the new, exciting projects that you're working on there. Oh, I mean, there's a lot going on. Obviously, the reason I'm here today is all about the work that we're doing with Pentaho. So we're building a new data warehouse. And my group's actually responsible for the administrative computing of CERN. So basically running CERN as a business. I mean, this is, there's a budget of around about one billion US dollars going into CERN every year in order to do all this physics research. So obviously we have a responsibility to treat effectively these tax dollars carefully and spend them wisely. So a lot of my work is to make sure that we have the appropriate infrastructure controls and proper technology there to make sure that it's used effectively and wisely. So paint a picture of that infrastructure for us, if you would, what does it look like if we took a peek under the tent? Well, I mean, it's, what's quite nice about it is with the technology infrastructure that we have. So we have a huge computer center. There's 100,000 CPUs in our computer center. That's mainly used for doing physics. But because we have all this infrastructure there, we can use part of it to also run the administration, which gives us the ability to run a real world-class technology stack to actually run the organization. So we have a huge data warehouse which gives very rapid responses to the physicists and engineers who actually want to go on and do their work. And my job is to make sure that the administration of CERN doesn't get in their way. So we want to provide them with a facility so they just get on with their job and all the other things to do with actually running the organization are my problem and the team that works for me. And good examples is that CERN literally sits on the border between France and Switzerland. So we have, you know, we care about things like, there's 80 different customs forms that we have to worry about on a daily basis just as we move materials around the site. So it's such an unusual organization, but it's unique in the world. And that's what attracts people to work there. It's all these new challenges that we get. It's really a fantastic place. And the view is pleasant, I bet. Oh yeah. Okay, so tell us more about the infrastructure. So you talked about this really fast data warehouse, 100,000 CPUs. Is it all sort of on-prem? Is it a mix of sort of on-prem in the cloud? What's the data warehouse? You know, give us a sense of what that infrastructure is because people hear data warehouse, they think, you know, kind of old, clunky data warehouse. You're talking about this super high performance. Exactly. In fact, that's one of the challenges that we face is we've got scientists who are used to dealing with high volumes of data with high-filtration. Our particle detectors produce around two petabytes of data per second. So they're used to dealing with large amounts of data. So immediately, when they start looking at the administration of the organization of the same high expectations, they want it to be fast, they want it to process the data, large quantities of data very quickly indeed and give the answers in a split second. So to do that, we have to obviously put quite a lot of hardware behind it and also use good technical stack as well. We're quite big users of Oracle at CERN. So we have a big Oracle database which is for the principle where we keep most of our data. And then we spent all on top of that in order to do all the reporting, the analytics, the building the cubes, all this kind of thing. And their user base is very transient. So there's around 15,000 people who are actually working at CERN at any one time. Half of the world's particle physicists work at CERN. So they're coming and going all the time. They don't want to worry about how to get the data. So it has to be there, it has to be there right away. It has to be easy to use and easy to understand. These people live and work and breathe particle physics. They don't worry about the budget and the details about how to do all this stuff. This is something where the accountants have to get it in such a way that it's easy for them to do the right thing and make sure that we stay compliance with the various regulations and make sure that the organization continues to function as a business while still getting on with our primary mission of particle physics research. And that infrastructure is primarily on-prem, is that correct? It's on-prem is the vast majority of it. In fact, we have two main data centers. So there's one physically located at CERN in Geneva and then there's another one over in the Wigner Institute in, oh. The other place. Okay, yeah. And that, I presume, because you've got such massive volumes of data, you can't just be moving that stuff around up into the cloud. Right, in fact, yeah, we have a lot of high-speed data links between the different data centers in order to, we have a copy of quite a lot of the data. In fact, the principle physics data is copied not only at CERN, which is what's called the Tier Zero site where we have all the data to start with, but we also copy it to, I think it's around about seven different institutes around the world. So they have a first-line copy as well. Altogether, we have a network around a hundred computer centers working for CERN in some way or other, as part of what we call the LHC computing grids, which is exactly a planetary data center and computer infrastructure to do all this processing of the LHC data. I want to ask you to go back to about the organizational structure. I mean, you describe this office situated on the border of France since Switzerland, we're half the world's particle physicists work. What is the culture like? And how do you get, and as you said, also the administration's job is to really get out of their way so they can do their thing. So what is the culture like there? How do people work together? How do people collaborate? Are there, what do you do when there is disagreement? I mean, this is one of the unique aspects of CERN is bringing people together. There's around about 90 different countries represented at CERN, around about a hundred different nationalities, all working on site. It's very much like a university environment. So we have a canteen where people will come and they're always saying that probably most of the physics and most of the science discoveries are actually happening within the canteen as people meet together from all over the world. We have countries, India, Pakistan have just joined as associate members. We've got 22 member states around, mainly around Europe, but now we have a policy of enlargement. So we're actually trying to make the organization even larger touching more countries around the world. United States is an observer now within the organization. So they actually participate in the CERN council and they're also major players in some of the large LHC experiments as well. But yeah, on a day-to-day basis, I'll be sitting in the restaurant and there will be Nobel Prize winners. We have our director general, she will be there as well, having lunch with everyone else. So it's a very much leveling organization where everyone feels free to speak to each other and discuss the matters of the day in particle physics. So what do you guys talk about? What's the canteen conversation? Yeah, I mean, I think this is the utter geek-speak usually. That's the main problem in CERN is that, I mean, people are passionate about what they do. So they come to CERN, they love what they do, they talk about it all the time. So I mean, people will be talking about the latest generation of the CPU architecture, GPU programming, how do we do simulations with petabytes of data? This is lunchtime conversation. And evening and everything else. So you're not talking about the soccer game, the football game, right? You're talking about the sort of, you're talking shot, mostly, right? There is a football team, there's a rugby team as well. There's real life as well at CERN, but yeah, I mean, most people are there because they're passionate about what they do. And so, obviously you're listening to those conversations. You must pick up a lot of it. Yeah, I mean, I think it's, if you work at CERN and you're in a dinner party, someone will ask, so you work at CERN, tell me all about physics. So you pick up a bit about it. Of course, everyone can speak a little bit about what we're doing at CERN, and I think that's an imperative because we work there, of course, you hear about what's going on and understand a little bit about it, but I would never claim to be a physicist, of course. You can fake it though. I have lunch with physicists. I'm not one myself. There you go. How about Pentaho? You've painted the picture of the infrastructure before. Where does Pentaho fit in that? And how are they adding value? Yeah, we've been using Pentaho now for the last few years. And we started, I mean, what really attracted us actually was this combination of open source plus proprietary software. We like the core and the open source nature of it, which is very much fits with the values of CERN, as well as being an open lab and sharing everything that we do. So we started, as I say with Pentaho a few years ago. Now it's a core component, it's a core strategic component of the administration and also used in other areas as well. So it's also used in some of the more technical infrastructure areas in terms of how do we actually run the lab, parts of the infrastructure in terms of monitoring the different parts of the accelerator complex, and even in terms of the maintenance of the buildings, all of that. So it's really core within the organization as a core component for us. So CERN is an organization that is, I'll use the word insistent, if you will, on open source as a component. So that puts pressure on companies like Pentaho to pay attention to the next project, maybe contribute, maybe not, but it certainly integrate. So scorecard, how have they done on that? What would you like to see them do better in that regard and what kind of open source projects do you, and you may not be able to answer this, but might your organization see on the horizon that you want Pentaho to capture? I mean, obviously 8.0 you heard about Spark and bringing in Kafka and the like, but maybe you could comment. Absolutely, I think this is one of the areas which really attracted us was the open source nature and certainly Pentaho's movement in that direction, particularly I think with the integration with the Hattachi as well, they're seeing many other products now being integrated into the Pentaho world. This is something that was interesting to us, of course, because of their cloud-based infrastructure, the idea of scaling up and scaling out and they're going with the open source projects specifically around Apache projects, which are really interesting to us as well, something that we've been working on a bit ourselves and now to hear that Pentaho is doing that as well, really, that was quite a good piece of news for me because it's something that we've been struggling with is basically spreading out, we've got 15,000 users, we want to have a dynamic infrastructure where we can actually provision more servers where necessary in order to be able to take load when we need it, but at the same time, we don't want to waste the resources when they're off doing something else. Over the course of the last decade, let's say, has there ever been a tendency for, because you've got so many alpha geeks running around, to say, hey, I could take these open source components and kind of do it myself. Yeah. I don't need the Pentaho load balancer, I got yarn to negotiate my resources and start to go, look what I built. How do you manage that? No, I mean, absolutely right. It's a problem there, there's always the risk of the not invented here syndrome where I could do it better and we have to pressure against that, but I mean, I think the important thing is you take the bigger picture, if it's already done well, we don't need to do it again, build on top of it, make something better on top of something that already exists and that's the message that we can give to any of the engineers working at CERN, is you can do so much more if you already use the infrastructure that's already solid and that's part of this new reuse across open source software allows us to build on things which are already solid, we don't need to make another one of them, we'll make something on top of it and that's a primary message that we try to give. So here we are at Pentaho World and you're with a bunch of other practitioners, sharing best practices, talking about how you use the product, learning from them too, what are some of the takeaways and how much are you actually talking to them versus talking to the Pentaho product people? Oh, I mean, we did a presentation yesterday, the focus of our presentation was managing Pentaho. So one of the things that we've been using now for a number of years is you have to have an infrastructure to actually take care of all the different artifacts, all the different reports, we have many, many different users who want to be able to use Pentaho and at the same time, creating their own artifacts and we have to have some way of managing to actually manage all this landscaping, although Pentaho has got some tools in this area, that was one of the areas that we felt we could add some value in there, so we've been building on top of the existing Pentaho APIs, building an infrastructure to make it easier to support for other people and what was quite nice is we were speaking to some of the other attendees and that's exactly the kind of thing they've been worrying about as well and there was even some presentations of people doing a similar approach in their own organizations on how they were actually trying to build some kind of architecture under, well, on top of Pentaho, just to manage the whole thing. When you have hundreds of reports and hundreds of artifacts and very complicated data warehouse cubes, you need something on top of that to actually just manage the whole thing and that's something that we've been focused on and I see other people are doing the same kind of thing I can imagine that Bento will be taking note of this and probably incorporating some of the ideas. Yeah, it's sending a loud and clear message to Pentaho. Yes, absolutely. How about the event? You've been to at least two that I know of, I don't know if you're at the original. I've been to three all together. Okay, so you've been, I think all of them, right? There could have been all of them, yeah. I think the first one was 14, I'm pretty sure. Things you've taken away, interesting conversations. Yeah, I mean, I think it's the main reason we come in. It's a long way for us to come all the way from Geneva to come here. It's really important for us to touch base with other people using the product. It is an open community. People do like to talk to each other about the new things that are happening within the Pentaho community and I think face-to-face contact in the end is very hard to beat and that's where coming to an event like this, you actually get the opportunity to speak to people over lunch or in the evening events. You can talk to them and actually really find out what it's really like to use Pentaho. Great, well, thank you so much, Derek, for coming on theCUBE. Thank you very much. I'm Rebecca Knight for Dave Vellante. We will have more from Pentaho World just after this.