 Okay, so thanks a lot for coming on everybody. So my name's Tim Bell. I'm responsible for the infrastructure at the CERN Computer Center. Along with that, I'm also been... have the honor of being elected as a member of the OpenStack Management Board and recently also responsible for getting the user committee together. As Johnston was so kind to post my email address up in the middle of the keynote, I've also added it here just for reference, but if there is any one of the 1,300 people who hasn't sent me a mail, then please do so because it's been really great to hear from you all. So what is CERN? CERN is the Conseil Européen de Recherche Nucléaire. So in English, it's the European Laboratory for Particle Physics. We were founded in 1954, following the Second World War as a place where scientists from around the world could get together and work on fundamental research. The laboratory itself is situated between France and Switzerland on the border, straddled between the Lake of Geneva and the Jura Mountains. We have a very simple job, which is to work out how the universe works and what's it made of. So what do people come in and worry about? The first thing they worry about is why we have mass. This is a pretty important question because if you don't have mass, you're going to lose a lot of light. This is one area where we've made significant progress in the past 18 months and in July we were proud to announce that there's something there, almost certainly something called the Higgs boson, which is something that's been projected for the past 50 years. If this turns out to be correct, it will probably be the most significant scientific discovery since landing on the moon. Other things we think about, we've lost 96% of the universe. We can account for 4% when we look out there by adding up the planets and stars, but when we look at how they're moving, we know that there is 96% of the mass of the universe that we can't account for at the moment. In addition to that, we're trying to work out why we're not half antimatter and half matter. The two, it's very good we're not because they would instantly just decompose into a huge puff of energy. But there's no good explanation as to why following the Big Bang is half antimatter and half antimatter. And then finally, trying to work out what was the universal like just after the Big Bang, before they were atoms, before they were protons. What was it that matter looked like? So how do we do that? We've got together about 10,000, 11,000 scientists from 100 different countries, and they're collaborating together to solve these problems. The organization itself has a budget of around $1 billion a year by the 20 member states. These are largely European-based countries. It's not exactly the same as the European Union, but a lot of them are in common. And with that, roughly $1 to $2 per year is paid by the members of those countries towards the cost of the laboratory. So the current flagship experiment, there are lots of experiments to go on at CERN, but the flagship one is called the Large Hadron Collider. This is a 27-kilometer ring 100 meters underground across the border between France and Switzerland. It takes about 20 minutes to drive from one side to the other. The particles themselves in the experiment go around 11,000 times a second. So they're just below the speed of light. As you can see from the airport up the top there, it gives you a rough feeling of the kind of scale. So when you go 100 meters underground, what is there? There is basically a tunnel like this. In that tunnel, there is a large pipe and within that large pipe there are two small pipes, about 1 centimeter across. They have a vacuum which is lower than the atmosphere on the moon and they are chilled down to 2 degrees centigrade above absolute zero. That's minus 271 degrees centigrade. That's required in order that you can get magnets which are superconducting. So this means that using liquid helium, these magnets are able to sustain extremely high magnetic fields and bend round the particles that we send round through the beams. This sort of technology, there's only one of this kind of thing that is built in the world. It cost about $6 billion to build, including doing all of the engineering work. So you don't go building many of these. And it was conceived in the 1980s before the technologies for superconducting magnets was even invented. They basically guessed that over the next 20 years someone would come up with a solution to these problems and design the experiments accordingly. At four places around this 27 kilometer ring, there are experiments. These detectors are roughly 1,200 tons. They're the size of large cathedrals. And they have basically been modeled as 100 megapixel cameras. So they're huge numbers of sensors. The difference between this and your standard camera that you would have at home is that these take 40 million pictures every second. When you add all that up, 100 megapixels, 40 million pictures a second, you get at data rates that are approaching one petabyte a second. This means we have a data problem to work through. So the kind of things we get out of the accelerator after we've worked on the data is that we have beams of particles, they collide. We start off with protons. And then from that we try and work out what went on. It's a bit like coming along to a car accident and trying to work out what the makes of the cars were as they collided. To make things more interesting, we actually send around hundreds of protons at the same time. This means we get hundreds of collisions inside each of the detectors and we then need to work them out and separate them out. This takes serious computing power. And because we just about mastered that one, then what we then do is we send around lead irons. These are hundreds of neutrons and protons, the nucleus of the lead atom, and collide those together. And this is the thing that creates the conditions that they were just after the Big Bang. With one petabyte a second coming out of the detectors, that clearly cannot be recorded in any form today. So we have farms of around 2,000 to 3,000 standard servers that filter down each of these events and produce a lower data rate that we can then cope with. For the four experiments, they send in varying data rates towards the central computer center. At the central computer center, we're normally receiving somewhere between 5 to 6 gigabytes a second. In peak times, especially with those lead collisions, we get to 25 gigabytes a second. It's pushing a lot of the technologies of networking and equally of storage. In order to analyze all of that data, what we can do at the CERN computer center is to record it and then have a first look at it. But we do not have the computing power at CERN to analyze all of that and work out the necessary information about what the collisions were and what the particles were that were produced. So as part of this, we formed an LHC computing grid. The work was done during the 1990s and has run successfully for the past three years while we've been collecting data. And it consists of the center of CERN called the Tier Zero. There are sites around that where for each byte of data stored at CERN, there is a byte stored at one of the other 11 centers. This covers us in the event of a disaster at CERN and we ensure the data is protected. And then there are 200 sites around that that are connected to the Tier One centers that analyze the data. These are often universities or small labs. With all that, we run about 2 million jobs every day. They're basically running through this data as it comes off, trying to work out what is in there that's interesting and hopefully then writing up papers and in the ideal case then going to collect the Nobel Prize in the event of major discoveries. The CERN computer itself, the computer center itself, we've got about 10,000 servers, 64,000 cores. So it's a reasonable size, but nothing that's the size of some of the large installations, Google's and the Yahoo's. However, we're a very open center. This means that we have no reasons commercially to protect what we are doing. And so we are very open in terms of the technologies we use and we have regular discussions with people. In particular, we've had a lot of discussions with Google and Yahoo regarding disk drive reliability. So we see something like 10 times the expected failure rate of disks compared to what the manufacturers expect. And this correlates very closely with what Google and Yahoo are seeing as well. So there are clearly other factors other than the raw reliability of disk drives that affect how often they fail. We have around 2,000 drives failing a year, so that's a lot of times for people to come in and swap things in and out. The CERN computer center itself, it's actually a tourist attraction. 80,000 people a year come through on tours. You can come along to Geneva or anytime, book yourself in and be shown around. Recently we had the Google street car coming around, so that should be uploaded in the next few months, along with the street view of the actual accelerator and the experiments. As you may notice, we've got very wide aisles. This is to make sure that when we're showing children around, they can't push the power buttons on the back of the machines. You may also notice that the racks aren't very full. This is because the computer center was built in the 1970s for a main frame and a cray, and this is causing a lot of difficulties in terms of trying to get it so it's efficient for today's servers. The center itself is using about 3 megawatts. We get electricity pretty cheap because the accelerator uses 120 megawatts, so the amount of electricity that you would see in a small town. Our data problem. So to summarize the data problem, we're recording 25 to 35 petabytes every year from the experiments. The scientists want to keep this data for at least 20 years. This means we're facing a fairly substantial data recording problem. We're already at 73, 74 petabytes of data, and we're certainly facing hitting half an exabyte, even if we stay at the current data rates. In 18 months, we're going to double the collision rate and the energy in the experiments, so we would expect at that point to be doubling our data rate. How do we do it? We use tape. So unfortunately, with the power constraints and equally with the economy situation, we're being paid by the taxpayers, we can't afford to have this data all on disk. However, we have a 10 petabyte disk cache, we store all the recently accessed data, and then with that, the other data goes off the tape. We keep the tape robots pretty busy. 60 to 70,000 mounds a week. So if you can imagine what you do with your VHS tape recorder, if you were pulling things in now to that kind of rate, you can do it more often. So we're then faced with a limit of power at the CERN Computer Center, and therefore we're limited in our ability to allow the physicists to solve some of these fundamental problems. So we've been working for a number of years to try and establish an extension of the computer center. This has now been approved 12 months ago, and we will get a new data center in Hungary. So one of the reasons behind distributing the data centers around CERN is to share activities around the member states. So therefore having a remote data center in Hungary, even though it'll be a hands-off facility, so there'll just be people swapping disks for us, allows us to spread the skills and the usage of the equipment outside of the central facility at CERN. The new facility will be about 2.7 megawatts, so that will mean we'll be able to roughly double the equipment that we have installed. And we're currently at a 200 gigabit connection between the two data centers, and we're looking to move that to 1 terabit in the next year or 18 months. So the good news is that we've got a new data center. The bad news is that we get no more people. So we're being asked to run twice as many machines as we were running currently with the same number of guys. This requires that we do a rethink. So what we've had to do is to look at the areas that are currently using resource and try and work out how to optimize those. One of the things that we've found is that we did a lot of work around year 2000 and 2002 in tools to manage the data center. Around the time we were leading edge in compute the large centers were growing but we were still ahead. So we developed a set of tools to manage that. Unfortunately those tools are now getting very brittle. IPv6 comes along, we need to do all the work. A new operating system level comes along, we need to do all the work. So instead we chose to pause and say, okay, it's now time to realign ourselves with the tools that are used by everyone else. And this means having a good look around the open source community, finding the tools that are useful for us going through an evaluation cycle and in particular trying not to write anything new. We're actually actively discouraging people to write code and it's really tough. Because in the end what we have to do is to say anytime there is a requirement that is special for CERN it's almost certainly an incorrect requirement because there are other people who have to solve this problem too. Occasionally we'll find there is something special. We will do the development work and then submit it back upstream and that way everyone benefits. So we've ended up with a classic tool chain structure. I won't go into the details but we built in the space of 12 months a tool chain that previously took us 8 years to build to the same level of quality. Key elements for us are open stack as the control and orchestration area and puppet as configuration management systems. We've actually noticed over the past 12 months as we've been doing this there are some interesting side effects. The first one of which is it used to be that when someone new joined the organization they would have to sit down next door to the guru and spend a month learning the magic. Now what they do is they come along and we give them a copy of the book. They go and follow the mailing lists and if they need any help they can always ask the community as well as asking the internal staff. So that means we're no longer alone in terms of having to maintain all of this. Equally they're more than proud to be contributing to the community because they get their names in newsletters and get a certain amount of publicity which is a lot better than just having their manager and CERN saying thanks a lot and the CERN employment conditions are working on the basis of short-term contracts. So we have contracts of two years and five years. It's very rare that someone gets a long-term contract. Only 10% or so people end up with a longer-term contract. The aim behind this is that staff from the countries that are contributing to CERN send the people, they get trained up and then they go back to their home countries rather than the remaining at CERN. So we have a positive approach of actually having people ending their contracts. So I think we're probably the only organization here that is actually able to produce experts and be willing to give them to all those people who want to do hiring rather than wanting to retain them ourselves. So when we look out at figures like Veropen, Stack and Puppet we see staggering rates of job opportunities there compared to where we had an internal programmer who'd been writing and contributing to one million lines of Perl. So this means that these guys become active on the market and immediately get taken up by organizations such as yourselves. So at the same time as we were doing this rethink there was also a clear move that said that the mode that we were working in was an out-of-date mode. The grid structure has served us extremely well and given what we had to solve and the data rates we had to handle it was the right thing to be doing at the time. However, when we look now at where industry is going it's very much going towards a cloud model rather than a structured grid. So a lot more random association a lot more dynamic nature. So we had to go through the operation that says what do we do in order to get an out-of-a-mode where we're doing static machine allocation static usage of machines and more into a cloud structure. Around the same time OpenStack was just starting up. We'd been following it from the start and with that we saw the possibility to be running at the scale that we needed and in addition doing that with a number of other people who had similar sort of problems. Along with the need to make ourselves more dynamic we also need to make ourselves more efficient. When you wait for a tape to be mounted it can be up to five minutes before the robot gets there pulls the tape out the right place and puts it into the drive finds the right place on the tape. So we're trying to find ways under which we can take the current usage of the CPUs and improve that. Virtualization and the ability to suspend and resume activities gives us exactly that framework. On top of that our physicists where we weren't able to give them capacity just coming up to a conference they want to get their paper ready. These guys were then going off and getting their credit cards out and buying resources from Amazon and what they were looking for was a coffee time response. So the ability to basically say give me a hundred virtual machines walk off grab a coffee come back and have them already spinning up and doing useful work. We did some calculations of how long it would take if they asked for a physical machine. Now we're subject to European public funding rules and this means that there is a formal process to go through. Based on the summary of this process in a good day or a good year to be more precise we would end up with 280 days between you expressing your requirement before you would get your physical hardware. Now in reality what we normally do is we buy machines in bulk and then you get one of those but if you want something which we haven't already foreseen you wait 280 days. That's the good case. The bad case is where you've waited all that time and we then do a final step and at that point we find firmware issues. So we've had cases that have taken over 300 days to solve and we don't pay for the machines until this past its burn-in test. We've had other cases where we had a disk firmware problem and this involved replacing 7000 drives in the machines. Now we stopped counting the drives in the end and we just started weighing the pallets because you can't handle this sort of volume in anything where you're counting the individual units just the last pallet we put on a weighing machine and then followed it as it went down. It was 7 pallets of disk drives that we had to change. At the same time as we've been doing all of this in the infrastructure we've also been working through with the experiments and application guys. This is a concept that cloud scaling have been talking about in fact Randy talked about this morning as well and this is separating our workload into pets and cattle. Pets are things you give loving names to. You look after really carefully and in particular when they go get ill and they go wrong you nurse them back to health. Cattle you give numbers to and when you get ill you shoot them and replace them. So what we're looking to do here is to try and find a good model where we encourage our users towards cattle which is the nice fit on the cloud. However we have thousands of pets in the computer center and we're not willing to shoot them all at the moment it would make us very unpopular and probably stop a lot of the migration. So there we're trying to encourage people to use increasingly standard configuration management techniques so when the pets do get ill we're able to recover them and recreate them relatively quickly. So in order to tolerate those pets which doesn't fit completely naturally into the cloud environment then we've been working through a number of scenarios. The first is that we have a very old legacy structure to our networking. It isn't structured along the lines of the sort of things you can do now in quantum and therefore we've had to do a fair amount of work to make it so that when you get your virtual machine you get your reverse DNS lookup you get your Kerberos identity so you have a valid security ID on the network. We need to make it so that you can configure that machine relatively easily from a list of standard configurations and in particular when there does need to be an intervention such as changing of a memory chip or something like that those migrations can be done so that it's transparent to the person who's asked for the machine. So with the combination of things available in OpenStack we've been able to configure that in particular KVM and Hyper-V Live Migration has worked very effectively for us with Gluster as a backend storage. So where are we at the moment? We're running Essex we're a Red Hat or a Red Hat rebuild based environment so we use a distribution called Scientific Linux and this is basically just a standard rebuild of Red Hat and we use the upstream Apple packages. We've had extremely good relations with the Fedora Cloud SIG team that's been doing a lot of the packaging and helped participate in a lot of that testing. This has allowed us to take advantage of a lot of the work that's been done to port tools like Cloud in it and the image management tool OZ into our environment. So it's definitely the case that this environment under a Red Hat setup works well. In addition to that we're currently focusing primarily in the Nova area, Glance, Keystone, Horizon. Swift is interesting for us. We're keeping very careful eye out in it doing work on testing but because it requires application code changes on the experiment side we're holding off a little bit we want to get Nova, Glance and Keystone up in place first. Working around with 75 petabytes of data sort of in your pocket means that you don't rapidly move around between different storage technologies. So we're a bit cautious about what we do. The current pre-production environment so we've got around 150 hypervisors with 2,000 virtual machines on it. At the moment they're running a set of programs called LHC at Home. This is a bit like SETI at Home rather than where you go down to a basement and build your own particle accelerator. And with this we're able to basically go through a set of build and test environments to validate the environment before we go into production. This is our typical environment we're running the LHC at Home under the Boink setup and we're using Classic after the box Horizon. It gives a fairly friendly visual interface for the administrators to use. However we wanted to give something that was easier for the average service manager the people that are just doing the standard configurations and for that we found that the combination of OpenStack with Puppet gave us a lot of power and then combined that with a tool called the Forman and these three together allowed us to be in a process of spinning up a thousand virtual machines in three hours by simply pushing a few buttons. So with the Forman what you get is you get the ability to ask for a virtual machine and immediately associate it with a Puppet configuration and then through Cloud in it that then gets configured and delivered to you as a completely configured virtual machine. This avoids having to do a lot of work maintaining images and allows you to have a lot of flexibility in the configurations you get. Other things that we found were missing one we are very heavy active directory users. We have 44,000 user accounts we've got around 29,000 groups and we have 200 people arriving and leaving every month so we don't want to be doing this stuff manually inside of OpenStack. So we sat down and we worked through with the community about how we could get the LDAP functionality of active directory available to Keystone. And those patches have now just gone in they didn't make it into Folsom but they're coming along soon after. What we're then able to do is basically spin up an OpenStack instance with there is one small tweak that you have to do to active directory at the moment but this is not a schema change and so it's something that most of the active directory would be happy to do. The details are all on the wiki if you search around for active directory and OpenStack. In addition to that we currently have a virtualization environment which is based on Hyper-V and SDVMM system center virtual machine manager. We're actually very happy with this as far as the server consolidation environment goes the problem with it is that we needed to scale and when you look at the limits of what the system can go to we're a factor of approaching the short in order to get to the level that we need to. So with that what we want to be able to do is to bring in Hyper-V and this allows us then firstly to have an option of hypervisors we like KVM but we also want to be able to compare and contrast the environments. Maintaining para-virtualized drivers is hard work and so having a stack where for example we can try out combinations of KVM with Linux, KVM with Windows and equally Linux on Hyper-V and Windows on Hyper-V and compare and contrast them is very attractive for us. So since Microsoft has been putting in some effort via various people we've been working closely with those guys in order to get this up and running. So within our Essex environment we've actually got a number of hypervisors running Hyper-V and it runs very, very smoothly. What's nice is we've also got puppet running on Windows and with that we're then able to configure the hypervisors in the same fashion in an automated way. We are doing a little bit of work there are some functionality still missing in particular access to the console of the virtual machine and linking into the metering so that work will then be going on along with the people that are working in this area but the aim is then to get Hyper-V up to a first class citizen within the framework. So at the same time as we were doing this we went along to I think it was the Essex summit and we were asking a few questions and it turned out that the guy that was sitting next door to us was also from CERN and he was working on a different project but hadn't got around to talking to us and the project he's doing is these farms of 3,000 machines that are doing the filtering from one petabyte down to 6 or 7 gigabytes there are about 2,000, 3,000 boxes when the accelerator isn't running and this isn't the kind of thing you buy from Walmart, this is special stuff when the moon is close to the earth you actually have to tune the accelerator to take that into account when trains go over the top it disturbs the beam and you have to be careful to keep it aligned so the accelerator is up and running about 80% of the time but 20% of the time it's being worked on and being tuned and during that time these machines are sitting doing nothing in addition next year we'll be shutting down the accelerator for 18 months in order to upgrade it twice the energy again in that time these machines will be doing nothing so what they're looking to do is they'll start up an opportunistic cloud and this means that they'll sit there and then when they need to they will start up the virtual machines accept work and then when the accelerator gets back online again they will then kill off the work migrate it out and then turn the trigger into being able to do and do the job of filtering this data down there are two kinds of jobs they're looking at one is simulation which is basically trying to work out what the universe ought to be doing the data that's coming from the experiment and working out if our theories are correct they have very different profiles simulation is very CPU bound analysis is largely IO and network bound so at the same time from the CERN point of view we're springing clouds up in various different places at the same time lots of other centers amongst those 200 centers around the grid are looking at doing exactly the same thing so we're finding a need to federate and this would allow a situation where at the grid you can submit generic workload and find a good place in which that job should be run you'll actually be able to submit work identify a good infrastructure as a service endpoint and then run the work there there are two projects we're doing one is internally within the high-end physics community to get the sites federated together we're currently running a test with about 15 sites to see if we can get that working at the same time the other thing we're doing is with the cloud industry in Europe Europe hasn't got a very far highly developed cloud industry all the big companies US or Asia based so working with the European Union we've identified a number of companies and we're working with them the European space agency and the European molecular biology labs to try and find a way of using those resources for the purposes of science but also to allow those resources to be used for other purposes some of these are open stack based some of these are using open nebula which is a very popular cloud solution within the academic community in Europe you don't often hear about it in the states and then with this we're seeing a series of things that are in common the first is we need to federate identity you need to have it so that when you're registered as a person in one location you have access to the resources in another with a reasonable level of trust we need to find common security policies that allow you to make when you want to ban a user because they've been abusing the system that ban is a universal one across the whole of the clouds we need to have ways of sharing images you don't want to be uploading to all of the different endpoints as you're going along so with these combinations of things and working on how we can federate them together we're then able to bring these clouds together and provide them as a single resource for the researchers the combination of technologies probably at the moment the majority of the ones we're looking at are Open Nebula OpenStack is coming up pretty fast behind that and there are also a number of proprietary cloudbenders this is causing us API issues one of the most frustrating things is the lack of a formal standard in this area OCCI has been a good established academic standard but has not received the widespread adoption although we do have an interface for it so it's not open stack and open nebula so it's currently the basis under which a lot of the work is being done so where do we go from here we're now at a point where we're seeing the data center in Hungary coming online at the start of 2013 with Folsom being here now we'll do the Essex to Folsom migration and then start to scale out so the aim would be to start the deliveries we're probably looking at about 2,000 additional machines being delivered to Hungary in 2013 and then start to convert the main CERN computer center the 10,000 machines there towards an open stack based environment we're also at the same time looking at the number of new things that are coming along it's actually very difficult for us to just test all the stuff that's being done by the development streams so the majority of our resource are actually being taken in terms of taking what's coming upstream, validating that solution and getting it working in our environment which is a very productive situation with long term maintenance so C-limiter is one of the things we're looking at low bouncing as a service getting the X509 certificate management into Keystone so that we're able to use our existing security infrastructure and bare metal in the cases where we're not using virtualization there's certainly still the case where for the database servers for the servers which are really heavy IO we're holding off on virtualization for about 90% virtualized the end target is that by 2015 we'll be running around 15,000 hypervisors chances are about 150,000 virtual machines what we hope is that by that point we won't be leading edge because there are a number of other people that are also doing this work and while we're happy to contribute towards the scaling out we don't want to do it alone part of the reason for getting involved in the community is to be able to do it with others we're missing at the moment a number of things around the documentation and procedures area we're basically looking for best practices so we can learn things ourselves what we'd much rather be doing there is to have good places to go to find the best quality recommendations now at the moment what we tend to find is we've got blogs in various different places but it's really quite difficult to work out high quality blogs and equally to even find them so just finding out the best way of doing monitoring the best way of doing key performance indicators disaster recovery these things are things which can take a lot of searching to identify at the same time we have a number of projects which don't correspond to the classic cloud model where you have a project admin and then members we actually want people to be able to create virtual machines for things like personal desktops and then be able to manage them themselves but not have it that their colleague in the same project can then stop and start them at will we'll certainly be putting a little bit of work in here because it's probably something which is outside the standard scope of the cloud environments and is the kind of thing where we will contribute where we see something which is a reasonable requirement and write up a blueprint for and then finally we don't have a credit card CERN gets the money from the governments of Europe spends that money but once that resource is used up we can't throw more money at it the pot is empty we don't want that having to be a lot more aggressive in terms of quota management we need to make it so that important research gets the resources and the clever guy who's worked ways around the system to get bulk machines isn't able to get more than the amount of resources that he is willing to be given so this means a hierarchical structure of quota management we're able to set high level policies of quota and then pass those down through infrastructure as a service this again is something where we expect to be able to work in the community as well so in conclusion we've had a production build up that's gone very smoothly once we get the fulsome integration in place and our few legacy changes that we need to do patched in and in particular get them so they're clean we want to make it so it's not patching the code but there is a user exit and at that point we code up to that user exit rather than having to modify the core code with this production early in 2013 on the scale of about a thousand hypervisors moving on from that then we're looking to expand out into Hungary and then start to really ramp up the production for us here community is key we found it so powerful to be working with other people sharing problems and identifying and working together on a common solution at times it's been very very informative for us to have other people's opinions and challenging sort of ideas that we've had in particular I'd like to say a special thanks to all the people that are doing packaging documentation testing these are things which aren't very high profile but they're absolutely vital to a site like us who wants to be able to rely on the community to be delivering high quality product so when you're there contributing to OpenStack in all of these different ways we're basically going to be helping CERN to find out what the universe is made of so finally on the subject of collaboration this isn't actually a slightly strange experiment but it is the result of a collaboration that we've got going on with Rovio who are the authors of the Angry Birds software and they've been doing an education program one of CERN's objectives is to work through the education especially amongst the schools and universities they'll produce a book based around the work being done at CERN that will show the effects of firing Angry Birds one way around the tunnel and pigs around the other we don't do this at CERN this is just a simulation thanks are there any questions can you repeat the question sorry yes so we're in touch with a number of OpenStack research organizations we're always willing to hear more and in particular with it being a very open environment we're more than happy to be discussing with people about how they're solving their problems one thing to bear in mind is that we're not a high performance computing site we don't use GPUs or InfiniBand we're actually high throughput because what we do is when we get collisions we just give the results that collision to virtual machine number one and the next time we give it to virtual machine number two it depends on the profile of the research organization as to just how much they can share from their experiences and our experiences but certainly it's a completely open site we're more than willing to share what we're doing and hopefully to benefit from the work of other people so obviously in the grid community they've spent a lot of time dealing with these federated resource management issues and at the beginning of your talk I started to jot down a bunch of questions about what the top three pain points might be for this kind of enterprise and moving that to something like an on-demand or virtualized kind of environment and I think you actually mentioned most of those but the question is out of all of the federated identity management and federated resource management machinery that's been built so far that you're using operationally right now and I'm familiar with the LHC dashboard I use shots of that in a lot of my talks how much of that stuff would actually be either reusable or that you could graft into any kind of open stack implementation because I know that we're talking about federated identity management tomorrow but that's just one piece of the story in terms of concepts like virtual organizations as being a way of managing these security contexts that cross administrative boundaries the question is how much of the existing grid infrastructure can we be using within this environment what things would we not use what things would we use so I think there is as we convert in the initial stage what we will be doing is to run the grid on top of the cloud so we will just be providing that as virtual batch resources as we go further on we then be looking to exploit some of the infrastructure of the grid in an environment of a federated cloud so there is questions like the certificate environments that we spent a long time getting ready along with the security policies and associated things we would expect a large amount of that to be convertible to an environment it took years to get going it's not something that I'd see any other better solution for at the moment there are other areas that would be saying do we need this anymore and this is in areas like the compute scheduling where clearly with a cloud it's a very different kind of question my opinion the soapbox that I got on is that a lot of this stuff is orthogonal to on-demand provisioning of resources and I just don't want this community to reinvent a lot of wheels but equally we want to take a benefit to the maximum level of the inventiveness of this community to avoid that we have to maintain a large amount of specialized code and that's the sensitivity that we are at in terms of the amount of manpower that is used maintaining this so very much enjoyed your talk as in the previous OpenStack summits and when you showed the pictures of racks with underpopulated servers have you looked at adding more servers with the over capacity and then using node manager in those young servers to decide which ones to run at what power and always staying within the limit but having more capacity available which you can orchestrate we have in some cases done over provisioning the problem that we have is that we actually run at a fairly high utilization in that we've always got the simulation workload to keep the machines busy so this means that we aren't really in the option of basically powering down the set of boxes that we're not using because we are basically spending a lot of time making sure that we use what we've got to the maximum we've tried water cooling the problem there is that we've still got an electricity limit even when we solve the cooling problem and that means that we basically got the choice of saying to the accelerator I'm sorry you can't run because we need another megawatt for the computer center and then in that case we wouldn't need to be there if the accelerator isn't running so that's why we've been looking at this external data center as the solution to these problems you didn't mention whether or not you're adopting quantum I was curious if you were going that route or another we have a very conservative network team so for very understandable reasons as you can imagine in an environment with 11,000 PhDs moving around and a lot of this equipment being fairly unique they are cautious in adopting a lot of the newer network stuff they are doing their tests at such time as they are happy with some of the underlying technologies and the impacts that would have on the campus then at that point we will look at quantum so does that mean you're using nova network right now or something else so at the moment this area where we've had to do a large amount of legacy plumbing we've had to write our own network manager and we've basically had to take the flat DHCP and then adapt it to produce some of the talks to the legacy infrastructure the aim is to stop doing that in the future but we don't want to be holding the project back from deployment while we're doing this as a result of that we need to do things like wait 15 minutes for a DNS update to propagate around the site and this is the kind of thing which when we look at a cloud I want to get rid of but for the moment to go forward we need to trade off a certain amount of effort by writing some specialized network drivers if anyone wants to get in touch then you're welcome my email address has been widely published