 So Lauren do you remember who the first super user award winner was well of course it was the CERN team It's hard to believe it's been two years since they won it in Paris Wow, okay, so I think that's Tim Bell. I've been kind of talking about him lately, but you know you can say I guess that OpenStack synonymous with science and science and OpenStack go together So OpenStack was founded by Rackspace and NASA so you could say that Scientific research is really part of the DNA of the community Amazing. Well without further ado, I will introduce you to Tim Bell from CERN to tell us what they're up to these days Hello, and thank you for the chance to come along and give you an update on where we are with OpenStack at CERN So if you were driving between Geneva and the Jura Mountains, you might go past this strange globe This is a CERN conference center But behind it are the Atlas experiment control buildings So these are the surface buildings that are a hundred meters above the largest machine on earth This is the Large Hadron Collider 27 kilometers around for experiments and We fire around beams of protons in opposite directions and then collide them at the experiments The experiments so there are four this one is CMS It stands for compact muon solenoid It's a bit of a strange term given that it weighs 14,000 tons to call it compact So when we fire these beams of protons round, what do we get? We get around one billion collisions every second So each beam has bunches around a hundred billion protons They pass through each other at the experiments and then out of that We then get simultaneous collisions occurring inside the experiments and this is one of the things that's driving the computing needs Which is that we have to be able to handle all those collisions and then separate them out into separate different and distinct collisions But CERN isn't just the Large Hadron Collider I Have the honor of having an antimatter factory just down the road from my office and there what we do is we take Anti protons positrons anti electrons and slow them down Put them into orbit around each other and create anti hydrogen This allows us to study items like does antimatter go up or down under gravity We host at CERN also the control center for the AMS experiment, which is actually on the outside of the International Space Station Looking at the solar winds and particles from space without the problems of them having to come through the atmosphere 2016 has been a great year for the LHC We've had extremely good performance the beam has been very successful in staying in for extended periods of time which leads to more collisions We've got about half a petabyte a day coming in at the moment And with this we're accumulating more currently the data stores about 160 petabytes in total But looking out When we have a look at how we're going to be distracting these collisions from each other Then we're looking at about 60 times larger compute capacity required by 2023 Moore's law will only get us about a factor five less than that even if we manage to keep that going So how are we looking to address this need for scalability? So we start a production with OpenStack in 2013 in 2014 in Paris. We had 70,000 cores We're now 190,000 Which is roughly 90% of the compute capacity at CERN running on top of OpenStack We do migration of long-running service VMs. We're doing around 5,000 this year and We're currently looking in place the process to get around another hundred thousand in the next six months So with this we have to have a platform that is scalable and that allows us to grow But at the same time the users are looking for more functionality not just more capacity So we've been looking at containers. The users have been very enthusiastic about reworking their applications for microservices We've also had a number of collaborations with rack space and with the European Union Indigo data clouds To try and work out how best to apply containers to science We've used the OpenStack Magnum project This is attractive for us because we can use the existing OpenStack infrastructure our security arrangements our capacity planning our accounting I just add Magnum as an additional functionality rather than having to do the same thing with mesos one-place Kubernetes and other technologies But at the same time we have to look at how can we grow and we've been looking at public clouds here For a couple of years we've been running the Large Hadron Collider workloads on public clouds We've tried around 10 in total The vast majority of these are OpenStack based and what this allows us to do is to take the in-house tooling that we've Been using for the on-premise cloud and use the same tooling for running on the public clouds So thank you very much for all of your help With communities like this working groups like the scientific working group and the large deployment teams We're going to be able to take on the computing challenges of CERN's experiments going forward. Thank you Thanks Tim So a few weeks ago I had the opportunity to visit the University of Cambridge and not only met with their Infrastructure team who is working on OpenStack, but got to meet with some of their Researchers who are using the OpenStack clouds So Tim and CERN are working on one of the largest research projects in history right now But we're about to hear from someone who might top that from the University of Cambridge. Please welcome Dr. Rosie Bolton Hi everyone It's really good to be here to talk with you today So I think it's fair to say that the LHC that Tim was talking about is one of the most exciting thing that's happened in Physics recently I'm going to tell you about the most exciting thing in physics that happened hasn't happened yet But it is going to happen So the square kilometer array project is a billion dollar plus project to build a 50 year lifetime radio observatory I'm going to talk about the first phase of that So the first phase of SKA will be finished in 2023 and will consist of two separate instruments So two instruments one observatory these instruments will be in very remote desert sites So the first one will be an array of antennas spread out in clusters along the wet in the Western Australian desert on baselines up to 80 kilometers each of the little white specks is an antenna an area of the size of me So that's in Australia all these receivers work together as an interferometer and the second instrument in the first phase will be an array of 197 dishes collecting radio signals and they will be spread out across the South African Karoo Desert Again, another interferometer now SKA will be much more sensitive than any radio Radio telescope that we have today and that gives us access to lots of different science So I want to just wet your appetite with two examples of the science we can do with SKA today I have very little time to tell you more, but do you find me afterwards looking look me up So if you have sensitivity you can look deep into space We can use the fact that if we can collect photons from very far away We can see far back into the beginning of the universe So this is a graphic showing how the universe is expanding so that down to the to the left-hand side We're looking at the very early universe and then it's expanding along We can see lots of structure in the universe here SKA will be able to probe right back to the time when the first stars were switching on and we will use the fact that SKA is going to be designed with 65,000 frequency channels to distinguish different emission from different parts of the universe very clearly So an overall goal is to build up a survey of around a billion objects in three dimensions And we can look at how the structure in the universe has evolved over time and compare it to our Cosmological models to see if it's behaving as we would have expected it to do and if not to amend our models So that's looking deep But also if you have sensitivity you can look quickly as well and a nice tool for doing that is to look at pulsars I hope many of you will have heard of pulsars But essentially a pulsar is that a dead star spinning very very rapidly with the same mass as the Sun Spinning round a very high angular rate and often with a magnetic field Misaligned to the spin axis which means that it can send out a beam of radio emission rather like a lighthouse beam If it happens to line up with us in earth and we see this rotating very clearly like a like a clock ticking along Now we already know about pulsars. We found several thousand of them So the yellow points on this graphic show the pulsars that we've currently found in our own galaxy You can see that they're actually clustered locally to our own location, which is the red circle at the top with Ska We predict we will find every single pulsar in our galaxy that is pointing towards us that gives us access to a whole array of Surveys that we can do using the pulsars themselves as tools and there's much many different types of science I'm going to just talk about my favorite one which is represented by this graphic here So if we have a nice survey of pulsars We can choose a few tens of the best ones and and choose them So they're spread out across the galaxy and we can look at how their timing pips come in regularly We can keep monitoring these timing pips if a gravitational wave works its way across the fabric of our galaxy Some of the pulsars from one side will have the metric of space time squashed between us and them and on the other side They'll have the metric of space time stretched That means that there will be an offset in the time delays of the pips coming in one side to the other side We'll see one half of the sky coming in early whilst the other half is coming in late And we'll be able to infer a gravitational wave rippling through the galaxy. I think that's pretty nice science So that's all the science I have time for Let's talk about why it's difficult to work out how these things happen So luckily I said the two sites are very remote. They're in the desert in in South Africa and in Australia but the Processing centers are not in a desert. They are they are down in Cape Town and in Perth, so we don't have to deal with off-site HPC The read the the science data processors for SKA that I'm working on There's the thinking brain of the of the site of the SKA So it takes the signals that have been combined and then analyzes full data sets to ask What is the model of the sky in the telescope that best fits these data? After the science data processor, we then have to pass the data to SKA regional centers Which will be globally distributed which allow the scientists access to the data products to ask was my experiment successful How does it compare with the theory? So to make an analogy the STP the science data processor is like the conscious mind of the telescope And we then have to pass the products on to the regional centers where the kind of cultural aspects and community comparisons can be made in regional centers Okay, so let me give you some numbers. I'm going to skip this slide being a bit too wordy We have many challenges for STP the science data processor. The first is complexity. We have multi-axis data sets We have iterative convergent pipelines that need to run and we have to be able to predict how much time they will take to run We need about half an exa-flop of compute to do this That's quite big and we have to orchestrate the ingest the processing the control the preservation and delivery of these data products And we have to keep up with the incoming data We are of course cost constrained. This is a public funded science project The first phase has a has a budget of 650 million euros and the science data processor will be around 10% of that in total So we don't have lots of money to throw around And we're also going to be power constrained for the same reasons We can't afford to switch on all of the compute that we might need so we need to make things much more efficient than our current Assumption of a 25% efficiency. We need to find ways of making things scale better But we also have to design a system that has to allow for software and hardware refreshes over the 50 year lifetime And when we think about the regional center and the delivery of the products to the scientists We have to consider which facilities might be available in national infrastructure projects as well And how we build a federated system for that That this is my summary slide We have 400 gigabytes per second of data to ingest into the science data processor Each see each graph of tasks for a six-hour observation will have around 400 million tasks in it And we require around half an exa-flop in total of peak And this is my favorite We need 1.3 zettabytes of intermediate data products for each six-hour data set These are data that get created and then destroyed every six hours And in terms of final products We need a petabyte a day of science data products to deliver to the rest of the world. So hopefully that sounds interesting. Thank you So I'm now going to hand over to my colleague Paul Paul Collegia. So he's going to talk about medical informatics. Thanks Hi there So it's really exciting today to be here talking about research computing at the OpenStack summit because I feel OpenStack technologies are now poised to make significant impact in the research and innovation domains so We started our OpenStack journey in Cambridge about 18 months ago When we started to build a new research computing platform called the Cambridge Biomedical Cloud The task of this platform is to take Clinical data and applications from the University Hospital environment and move those across to the research computing environment to drive medical analytics and computationally intensive biomedical research So why would we use OpenStack in the research computing environment? Well, from my perspective from a provider's perspective OpenStack technologies make computing Data and applications more accessible more flexible and more secure From the researcher's perspective It makes research computing and data easier to use Easier to share for collaboration and this decreases the time to science and increases innovation So within our University Hospital environment, we produce huge amounts of data Which is all now held in electronic record systems This provides us with an ideal opportunity to apply big data technologies for improved health outcomes But to do this we need an IT platform, which is secure flexible elastic and Lends itself to a sandboxed research computing environment and OpenStack really meets all those requirements So if we take a look at the biomedical cloud, we can see that it's a heterogeneous architecture with free main Elements, there's a 2000 core OpenStack element using 50 gigabit ethernet that's RDMA enabled for performance We have a traditional HPC cluster static image-based system using 56 gigabit infinity band That's a thousand cores, and we have a currently quite a small Hadoop cluster, which will be growing at the beginning of next year and There's quite a complex Ethical sign-off and data sharing platform that takes data From the hospital network and moves that across to the research network under the correct regulatory compliance regimes And then once we have that data in the research network, we can do interesting things with it So how do we use that platform to develop new predictive medical analytics techniques? So firstly, we take various data warehouse products from Epic on the upper right hand part of that diagram These are patient test results medical records live telemetry feeds from the operating theater and we run that through predictive modeling Techniques to produce predictions We can then test those predictions and enter into a device trial loop Where we assess and refine that model until in the end we have something that we think can stand up as a new clinical treatment So a really good example of this is work done by Dr. John Cromwell at Iowa University Hospital John's developers developed a statistical model looking at surgical site infections that they run live While a patient is undergoing a procedure and by using this model they can cut surgical site infections by 58% That's a really good example of how quite low-hanging fruit can develop large benefits in the healthcare environment So another use case we're working on in Cambridge is population scale Genomics analysis and we developed a new genomics and analysis platform called open CB using Hadoop infrastructure This has been developed with Genomics England to undertake one of the largest population studies in the world the UK 100k genome project where we would be looking at the genomes of a hundred thousand people This open CB technology was already deployed on the bio cloud and we're running that over a UK 100k precursor project called the bridge project where we're looking at ten thousand Patients genomes of rare diseases and we're seeing two orders of a magnitude performance increase over previous platforms So the last use case is actually again very interesting medical use case from the hospital They've deployed a new brain imaging machine that's the brain imaging machine being installed on the left-hand side This produced vast amounts of data. So the center needed a step function increase in its computer data capability Open-stack medical imaging VMs provides that step change and that's now going into production. I think in about a month's time So when we started our journey, the scientific computing community was really quite nascent that community has now been developed through the Scientific working group. I'd like to present Stig Telfer who'll tell us a little bit about that now Thank you The science we've seen today ranges from the subatomic to the breathtakingly cosmic From the beginning of time to the future of health care Yet the computational challenges that these projects face Have more in common than indifference at an infrastructure level. They all face pretty much the same problems These use cases are pretty typical in research computing, but they are not the default open-stack use case When you deploy open-stack out of the box, this is not what you get The scientists have to work a little bit harder at their configurations than the rest of us This is the driving force behind open-stack scientific working group Our open membership is drawn from institutions around the globe Who use open-stack to support science and research of this kind We share knowledge with each other We help each other out We know what works and what doesn't we fix a few things and we share the results Together as a working group, we advocate our use case for research computing Among the wider open-stack and research computing communities. I Cannot believe that it has just been a short year and all this has happened. I Cannot believe how much it has already helped us at Cambridge and our other members With our working group now established I know that we want to make an even bigger impact in the year to come our quest to understand the universe and to improve our little corner of it is going to be moved forward by ambitious scientific projects such as these we've just heard to help them achieve this Compute hardware is going to get deployed on a massive scale and upon that hardware a platform must be built That meets the scientists needs In our group, we believe that platform should be open-stack and if you're interested in the problems and the solutions I hope you'll join us in the working group and together we'll make it happen Thank you. Well, thank you so much Paul and state for all of your contributions We really appreciate it and I just wanted to highlight one of the Collaborative efforts that we did recently with the scientific working group and including the foundation staff Put together a book about HPC and open-stack Maybe you can tell us a little about it. Yeah, we started out We thought we'd we do a survey of what was out there get a baseline of knowledge And we thought we'd write a few papers and get get a few articles together We found so much that we actually put together a book on the subject and Here it is So this is a book which is drawn from the expertise of a lot of the subject matter experts who are members of the working group We've got together. We've contributed to create this publication. Well, excellent. Well, thank you so much for your work We're actually gonna have this Available online and at the super computing conference in just a couple of weeks So excited to participate there as well in your industry And if you're interested a lot of the authors will be at the scientific computing Boff, which is Wednesday afternoon at 2 15 Great. Thank you so much. Thank you. All right Well, I hope you all enjoyed hearing from all these amazing members of the community and the users We've got a very unique opportunity This week here in Barcelona to come together and do a lot of work that matters for these incredible users I mean if you think about Rosie said she needs a half a exa-flop of compute in a few years and Zeta bite every six hours. So we've got a lot of work left to do but it really matters It's gonna make a huge impact in the science and research community But while we're out having work, let's also never forget to have a good time with OpenStack. Thank you very much