 So good morning everybody My name is Ian Gable. I'm from the University of Victoria in Victoria, BC Which is just 60 kilometers across the Strait of Georgia In the beautiful city of Victoria highly recommend it you can get there from the float planes that are on this You know the conference logo just underneath the conference center quick trip Today I'm going to be telling you about Federating clouds for high-energy physics This was a work done by many people and supported by many different organizations not just me they're listed on this title slide So the outline for today is I'm going to give you a quick overview of experimental physics Experimental high-energy physics that is in terms of the technology and this is what really personally grabbed me I'm going to talk about what our workloads look like then I'm going to talk about the components of our distributed cloud Those are cloud scheduler glint and shoal, and then I'm going to talk to you about some results So this is the very picture that really got me hooked on high-energy physics If you look at the top of that red ring you see a major international airport That's the Geneva International Airport, and then you have Lake Geneva next to it, and you see that red ring That's the that's the ring of the Large Hadron Collider It's 27 kilometers in circumference, and I saw a poster with this hanging up in a professor's office And then I kind of shifted my career to focus on this for many years, and this was really the picture that got it started for me So if you look down in that ring You'll see this device, and this is the you know the beam line of the Large Hadron Collider Here's the guy doing some work. This was in repairs done around 2008 If you look at this in the far distance, you can see this slight turn to the left This kind of always blew me away that the scale of this is that that the tunnel is nearly straight But you just have this slight curve. It really gives you an idea of the scale and From a more technical perspective Looking underneath this what you have is two counter rotating beams of protons One's going around clockwise the other one's going around counterclockwise in that beam pipe And what we do is we intersect them at different points along this collider ring I'm sure lots of people have heard about this from Tim Bell and other Cern guys But what I'm going to tell you a little bit about next is the Atlas detector It's sitting right here conveniently close to the CERN cafeterias If you're on these various different experiments that are on the other side of the ring not quite so convenient Let's take a quick look Inside that detector So this is a picture of this detector in 2005 there's a big hole in the middle you'll see in a minute why I'm showing you a picture from 2005 To set the scale the guy at the bottom. I mean, there's a human sitting at the bottom In 2014 when you get down there The cavern is now so packed full of equipment that you can't get a picture like that anymore So you get a picture of yourself standing in front of a wall of the technology the detector itself is 7,000 metric tons It's five stories tall It's sitting on that beam line in this ring about a hundred meters underground Different parts of this detector were actually built in Canada this piece here the hadronic n-cap Calorimeter as it's called was built five kilometers from where we're sitting today. So this is a truly international collaboration There's a hundred and seventy one different institutions in the world that are working on this particular detector So At this point you're probably wondering so why are you hearing about this at an open-stack conference and The one of the major challenges with these higher-energy physics experiments is in fact Storing analyzing moving all the data associated with this Collecting the information about these collisions So this is a cross-section of that detector that I just showed you a bit of a 3d cross-section You can see what happens at a collision point You have this shower of particles flying out of this collision point. This is in fact a candidate Higgs boson this is this particle that we've been looking for for a number of years and We do these collisions 40 million times per second You know at 40 megahertz work we're crossing bunches of protons and then we have to read out that detector I Want to make it clear that the Atlas detector is not the only high-energy physics detector in the world There's many. There's in fact four at CERN for large experiments But this is another one where we're working on the computing towards clouds Called Bell to this is at the KK laboratory in Japan it has vastly different physics goals but the common theme is The data the format of the data is very similar Between these types of experiments Even if the physics goals is quite different so we can share a lot of the technology across these experiments so the scale of this is Atlas has roughly a hundred and seventy petabytes of disk today. That's increasing That's only the Atlas experiment. There's many other detectors with what large amounts of data We can really expect that high-energy physics is going to cross into this exascale in the coming years You know, this is a virtual certainty So now I want to talk a little bit about the details of why I'm talking to you today. So Let's talk a little bit about the type of Computing that we have to do for this these experiments most of the jobs are so-called embarrassingly parallel Some people don't like that word some people want to use pleasingly parallel But the message is these are not tightly coupled computing jobs We don't need things like MPI interconnects. We don't need low latency interconnects the These events that we've collected at this 40 million times per second can be parallel Parallelizable across individual events Most of the computing jobs the tasks that we run Are sort of one to twenty four hours in length and the jobs are either Monte Carlo simulations where we're simulating this particle Interaction or we're actually taking the data from the detector and analyzing it and so today most of the workload for these experiments is Done on a collection of Linux clusters. There's sort of the 500 core to 10,000 core There's very large sites like CERN and others in the US, but that's sort of an average There's on the order of 200 of these sites around the world that that work on these experiments and On an any given day on these Linux clusters, you can see something like 300,000 cores Running at one time. So this is I want to emphasize this is not all on clouds today However, one of the things that we had with this these classical grid computing model is all these sites were federated We had federated storage single sign-on for all the services a physicist could run across all 200 sites and not need credentials for any of them You'll see why that's important in a minute So now on to infrastructure as a service timeline I want to tell you a little bit of a story on why we're here because I think it might be Different from the typical industrial use cases, you know, we weren't looking to sell anything or Use our resources in a different way But what we wanted to be able to do was run our particular types of jobs on other resources Which we didn't control with less work. So, you know, typically in the past. It's been Package your code to run on someone else's Linux cluster pretty difficult to do so in 2005 when Zen came along We started looking at that initially As a way to package our stuff up like today you would do with a container of VM And we discovered we could do that with really not a lot of penalties So we started looking at virtualization very early And the you know the first naive thing we did was start writing Pearl scripts for our Linux cluster to do all these sort of things then we discovered a project called Nimbus that was providing a very early Infrastructure as a service API before Amazon way before OpenStack So then we saw that there was a future in this Amazon came along this really didn't have a lot of impact for us because we couldn't afford it We didn't have high bandwidth networks to Amazon. So that really didn't change anything Then we started seeing multiple of these clouds emerge These Nimbus clouds and then eventually OpenStack arrived and this really started to change things for us Because this thing that we'd been hoping for there was finally enough industrial momentum on it and when you started talking about virtualizing Research and education clusters people didn't think you were crazy anymore Then we started seeing multiple OpenStack clouds And then the big thing that really enabled us to get a lot of traction in this is when CERN had the vision to Move into OpenStack and start virtualizing their entire infrastructure And this is this is what really you know enabled us to start moving on this very hard so Here's today's Problem and it's also an opportunity These yellow dots that you see scattered in Australia Europe and North America are the OpenStack clouds to which we have access today these are primarily at universities and Research institutions around the world. We're also using EC2 and Google Compute Engine But we're primarily using OpenStack so The problem that we have is we want our jobs to run across these Many different clouds and there's not there wasn't a particularly easy way to do that We also have some constraints that are different from maybe other situations We're getting often contributions in kind It's often from institutions that have nothing to do with high-energy physics But they have spare resources spare capacity within their OpenStack clouds So we can't go to that cloud provider and say please implement this Federation solution Please do this because they've given us an account the resources and they're not there to provide support for us in many cases Many cases they are but you know, it's it's a mix I just wanted to point out a few of the clouds involved. There's canary clouds. This is a research network institution In Canada chameleon in the US There's CERN Imperial College and Nectar and I see people in the audience from many different of these clouds that have supported us So now I'm going to talk to you about The components of this distributed system What we've done is mixed up many different Already available components and then added a few small things where we needed to to close the gaps So to manage jobs we use HD condor, which is an extremely Stalable Scalable batch computing system that's quite well known. I'll talk about that in a second Then we have this product called cloud scheduler which monitors the job cues of this batch system and then Will create VMs for those needs. I'm going to go over that then we have show for webcast discovery and a thing called glint, which I think will be quite interesting to this community For pushing out VMs to many different Open-stack clouds and then of course we have the virtual machine itself. We use CERN VM and a special file system called CERN VM FS which I'll tell you about so Condor probably many people in the room know quite a bit about condor It uses this collection of demons which you can run on multiple different nodes to scale out the Capacity to run jobs across multiple different nodes and it has a daemon called the start D creatively that when a when a machine starts up it registers with a collector and This mechanism called matchmaking occurs where you match the requirements of the computing job With the resource that's advertised itself. You make that match and then you execute the job on that resource There's there's tons of information about this and this is a very scalable product It's been around since 1984 believe it or not So this is really the key architecture diagram for our system down at the bottom you have HT condor the batch scheduling system and at the top you have cloud scheduler Which is the the piece of software that's monitoring the job queues of condor When it sees a job waiting in a queue in condors queue what it does is it makes a API call to open stack EC2 or Google Compute Engine to boot a VM And then we pass in the contextualization information into that VM to cause it to boot and then register with that condor head node like you saw before and so while there are jobs Remaining that require that type of worker node as we call them that job That VM will stay running condor will cloud scheduler will keep that VM Running when there are no jobs that need that type of VM. It will shut that VM down So what happens is? Cluster bursts into existence to run your batch jobs when they're sitting in the queue when the queue is drained it gets shut down So I wanted to show a few Details of this This looks pretty impenetrable, but there's only really a few important things So this is this is the way many you know many scientists around the world do their batch computing It's not just high-energy physics If you have batch jobs to need to you need to run you compose You know a text file effectively that tells you what binary you want to execute and a few properties of the VM Or the machine that you need to run. So we're talking about memory and disk we're talking about which executable you're gonna run and What you're gonna do with the jobs when they die or when they finish and Then there's these extra set of attributes you're seeing in this bottom part here one of them is The AMI so this is the image you want you want this particular job to execute on an image of this type Then you want to use an instance type of this Then there's this other thing Target clouds which in this case is a List you're gonna see it in a second But it's a list of other clouds that this job will run on and I'll show you that in a second so on The cloud scheduler component you configure it to describe the resources that you have available to run jobs. So You basically describe what type of API access you have the Endpoint URL Some things like network properties. So what you're seeing in here is two separate clouds, you know, we have a list of a dozens This is an arbitrarily assigned name that you can give your cloud for easy reference And so you you create this list of clouds that cloud scheduler can execute the VMs for your condor job on So I want to give you an example operational task and this is this is Having an easy way to deal with clouds coming in and out of your infrastructure you saw we had these dots all over this map earlier the yellow dots and That's by no means a static list We'll have some this week none Some this week and some will drop out the following week And it's an ever-evolving list of who is making cloud resources available to us over time So we need good operational ways to bring those clouds in and out and we need to do that without affecting users So I'm giving you an example email here. It's a one of our admins and You know, we'll get a typical request like we want to take a cloud down for maintenance For two days, you know next week sometime and then the state of the cloud will be we have say a thousand cores Worth of VMs running on a particular cloud And but we also have a whole bunch of jobs waiting in our job queue that need those type of VMs so the first step is using a command like this to Prevent any more VMs from booting on that cloud And then the really nice thing that we can do is start draining that cloud So we can set each one of the VMs Such that when all the jobs that are running on that VM keep in mind There might be eight different jobs running on an eight core VM And so it waits till every job is finished on that VM and then shuts it down So this effectively drains all the jobs out of those VMs and kills them when they're done So we can we can maintain operations without users feeling the hit of different clout clouds going up and down so our next problem that we have to address is We have too many clouds to manage VMs Manually on VM images manually on we used to do this when we had four or five of them We have a new version of the image we go and we go we go push it out It was fine for expert users, but we have other users come in and it's just an incredibly error-prone process to go push VMs out so it was workable say at the five cloud level, but when you get up to the 20 cloud level It's just not even it's not feasible anymore to do this in any kind of a useful way So we developed a service called glint notice. It sounds a bit like glance And I'm going to tell you a bit about that and some of our goals to Integrating this or adding these features within the OpenStack ecosystem So this is what it does in very very simple terms So say cloud one is your home cloud This is the one you log in on to on daily basis to develop your VM images and test your jobs before you want to run at scale So you have an image sitting there on your home cloud cloud one Then a user uses the glint service to say okay Now I want this image propagated to all the clouds for which I have credentials so glint will suck in authenticate with Keystone and Suck in that image from glance from your home cloud and then push it all out to all your remote clouds Let me show you where that fits within the architecture so You know glint is another component in OpenStack, although we'd like we will likely Want to move some of these features into other existing components our goal was not to add something new It was just a neat way to demo The features that we wanted although when I say demo we're using this in production today so we also add We also add pages to horizon. I'll show you a screenshot for that in a second So on the left here you have a Screenshot of adding the various different other clouds to your home cloud where you want this propagated out to I want to emphasize that you don't need to and do anything to these other clouds You can have this service installed just on your home cloud and have your images pushed out to your remote cloud So you don't add a requirement You don't add a requirement to clouds that you're running on and then so after you've added clouds to this list of available clouds you can just go in and Select which clouds you want your image propagated out to As I showed you in that previous architecture diagram There's also an API for this and there's also a CLI Tool for this which I'm not going to show you So the goals for glint we have learned a lot this week we've seen That there's big progress being made on Federation and we want to leverage that as much as possible. We want to take advantage in particular of Keystone Federation we saw that there's people from CERN that are working heavily on that and we're going to track this We also saw that there's now glance tasks available So we think we can do a lot of the things that we've done with glint using Glance tasks, but our ultimate goal is to have this functionality Integrated either within Keystone or glance in some way that's suitable to the the open-stack developer community And this is something that we've learned a lot about this week and we're going to continue to work on All the code for this is available. It's in pi pi. It's on launchpad. You can get it from you can get it from GitHub as well So now I want to talk about The virtual machine image we use We use an appliance that's a rel Compatible on clients that's made by CERN. It's a tiny image. It's a 20 megabyte image when you download it It's similar. You can think about it a bit like core OS However, it uses this thing called CVMFS the CERN VM file system and it's a file system designed around Millions of small files that's really all about heavily catching and multiple levels of caching and in fact when you start using this file system You get a CDM that was built by CERN across research institutions And his resources provided by other research institutions for distributing the files that are in that file system And it works extremely well for large software stacks like we have in her energy physics I mean to give you an example one one binary one release of the Alice experiment is a seven gigabytes worth of software and you have many different users requiring many different versions and Using CVMFS you can basically push out this nightly to all these VMs without releasing new versions of this Virtual machine I would guess that there's many different industrial applications in particular for this file system and You can get all sorts of information about it at this at this URL here so Once you're using this file system one of the things you require if you're going to boot large numbers of these virtual machines is a very fast HTTP cache because this is an HTTP HTTP caching file system. So what we use is a squid cache So what we want to be able to have is a virtual a VM or and we want to be able to locate the nearest the nearest squid cache to our running virtual machines So what will happen now and what does happen or what did happen? I should say is that we'd boot up a virtual machine that had some baked-in configuration and that configuration was to use an HTTP cache on the other side of the world, which is you know, totally not workable. So we developed a quite simple Discovery service that uses AMQP and some very very simple agents on The squid caches squid is an HTTP cache and the reason this thing is called shoal is because shoal is the Collective noun for a group of squids. That's where this name Shoal comes from so we have agents that run on the squid caches the the agent is a hundred and seventy lines, right of Python and it sends AMQP messages to the shoal server heartbeat messages effectively with some load information and then we use a G.o.i.p. Library within the shoal server Such that when a VM that's booted up on one of these arbitrary clouds Contacts a rest interface they end up with the closest VM and so we can have these web caches disappearing and coming live Regularly they advertise every 30 seconds We really depend on the scalability of AMQP to do this We didn't we didn't want to write a lot of code There's probably other services that exist out there or that are coming online that may do the same thing This one was really focused on simplicity. So Now I want to show you some evidence that this is all all working smoothly so Here's a snapshot of activity from Just the CERN instance of Condor and cloud scheduler the vertical axis here is number of VMs These are eight core VMs. You can see the number of VMs booted changing This is happening basically based on load at the time for the Atlas experiment The the bottom axis here is days of the month ending on the 17th of May, which was the Sunday before I flew here So the UVic instance, this is showing a little bit more interesting activity here. We had Another group request a bunch of resources. So we had to This is just this operational task like I was talking about earlier Where we needed to give back a whole bunch of these resources for other users at the time so we moved this large blue allocation from a particular cloud down and To give that allocation to other users that usage isn't plotted here because we're plotting our own Usage, but you see that the system is flexible and then you can see more jobs coming in the queue and the system ramping up Here's a similar plot for the bell to experiment. That was that second detector that I showed you So what you can see is that the the system is portable between different types of experiments really the key thing is That it has to be that your problem has to be formatted as a batch job And you need to so if you're using a cloud and you have a problem that's formulated in terms of an embarrassingly parallel batch job The system will will work for you usually, you know depending on on what the What the experiment is but it will you know, it'll work for any basically any generalizable batch load so Here's a here's the cumulative cumulative work on the Atlas experiment for this system. This is starting in January 1 2014 in fact the history goes back Quite a bit further. So the the vertical axis here is three million jobs. So the system has executed three million jobs Since January of last year So it's working pretty well This is chunked out into different instances that are roughly continental base where we have queues on different continents With different groupings of clouds within them. This is you know the list of the if I go back here This is on on the bottom I neglected to point out that this is the list of List of clouds on the bottom So now this is The cumulative load for the bell to experiment. This is a much newer experiment So the computing isn't quite as established and it's looking at new technologies faster, so this this is From week of 11, so this is in the last little while You can see that cloud is making up The second biggest fraction of the total contribution to the computing to this Experiment right now. These are all the institutions around the world that are that are doing computing for the bell to experiment This is the number of jobs so you can see we've got around 500,000 jobs that are executed on this cloud system the the very big guy here is a big cluster in Germany and in fact the institutional labs cluster where the Detector is actually located is this slice right here so you can see that on on on Newer experiments. We're making a larger impact So in summary this combination of cloud scheduler and HD condor is quite a flexible way to do batch computing on On clouds and there's a some key enabling technologies There's this CERN VM plus CVM FS for us there's this squid cache discovery and this glint system which I think will be of the most Interest to the open-stack community and we're going to try and put as much of that back in as possible the current users of this system are atlas Bell to can far which is an observational astronomy project. They in fact have completed four million jobs I didn't show you the plots in the results here, but they're also a big user and also a Large computing consortium called compute Canada which has sites across the country in Canada So I think that's it. We've got a few minutes for questions I've been asked to direct people to the microphone for questions And I also wanted well people are lining up there I also wanted to make sure that I acknowledge the many different groups that were participating in this Without their without their contribution None of this is possible because we're taking great advantage of the support and help of all these different organizations if you would have had Admin on all the other clouds if you would have owned the other clouds Have you thought about what you might have done differently? Well, in fact, we would have done virtually the same thing There wouldn't be much difference if we had admin on the on the other clouds In fact, it probably would have created a worse Operational system for us because we'd have more clouds to support now We put in tickets to other people's malfunctioning clouds, so I'm not sure I'm not sure if it would have helped us to actually have admin access on these With while you're scheduling jobs on different clouds You also have to to get the data as a generated data back from this cloud to use a place to launch a job How do you achieve that? So one of the one of the advantages we have with these open-stack clouds that are located within Research institutions is they're connected usually by very high-speed network. So for example at UVic We have a hundred gig connection And in fact, we have hundred gig all the way over the Atlantic to Geneva that is used mostly for research and education So many of these institutes have quite high-speed networks So what we do is as the job finishes we stage out the data However, I want to mention that most of the jobs that you saw running there are these Monte Carlo simulation jobs So the amount of output is much smaller than you would get from an analysis job So this is one advantage we get with open-stack running on Research institutions as we can leverage that network to push the data back Couple of questions. I think one answer a little bit So on the networking side because you are running VMs in different locations Have done any optimization to access the data remotely caching? So what we're one of the one of the reasons we have this segmented in Continental Basis is we don't want to push our data back across continents in general So we there is a lot of work left to be done on optimizing the data Placement and this is one of the big challenges that's coming up The term that gets used a lot for this in physics is data federations So we need to a lot of this in the past has been statically configured because we know where the sites are and where The data should go and now that we're more dynamic There's a lot of work left to be done there to make sure that the data hits the optimal cloud Mostly it's far from optimal at the moment. The second question Have you looked at or there's a need to move the workloads in between? So we have we don't move the work I mean one of the nice things that we that you get from condors of a job dies or it's killed It will get rescheduled somewhere else So if we have if we have some kind of catastrophe on a cloud and all the jobs die and all the VMs die which which happens Then condor will reschedule them somewhere else, but we haven't done any migration of jobs One of the difficulties we have is there's no way to suspend a hynergy physics job They're making connections the databases all over the place And they might be in the middle of doing something and there's it's different from a normal high performance computing job That you could snapshot and take the state of but you know, nobody knows how to do that at the moment What what tools do you use to to manage your hybrid cloud? And are you equally happy with AWS and GCE for your high throughput and? Is it just the cloud scheduler or do you have other decision points? It's this I'm sorry that it's called cloud scheduler by the way We named it before everything was called cloud something but Yeah, this is the main thing that's making the decision about where to run the VMs and in terms of experiences on Amazon and GCE in general. We've had very good experiences where we run into mixed issues is Getting the data back out of Amazon depending on where you are sitting in the world You can get very high-speed peering out of Amazon back to your data center So we've had we've had sort of mixed results with that Energy Sciences Network in the US now has a direct hundred gig peering with Amazon so if you've got a stage your data back out of Amazon to To a site that is on the energy sciences network. This is Department of Energy labs You could do that very quickly. So Many of our challenges around that have been associated with data in and out. I Don't have a lot of experience with GCE. So I don't think I can comment on that It's other people in the group who have a couple slides back You had a total number of jobs by cut by country where it's kind of you Canada UK on there I was surprised not to see the US on that list. Yes, okay, so The reason we haven't done that there's in fact cloud groups within the US that are Running the Atlas jobs on clouds in the US We didn't show them here because they're using systems that are not cloud scheduler if I was giving this as a Collaboration talk similar with US collaborators. I would also show all their work, which is make sense of okay So they don't use condor. They use some sort of different They've There's they've taken various different approaches that usually involves condor in some fashion, but it can also involve Bringing up rather than trying to dynamically bring up VMs as jobs come in You can also bring up a whole cluster have it join at the same time So yeah, there's different different approaches being taken at different within different countries. Okay. Thanks Okay, I think we're almost exactly 40 minutes here. So I'm pretty happy about that Okay, thank you very much