 Hello, everyone. Thank you very much for having us here. So that is my colleague Janis Schultz from University of Mannheim. He is the central project manager of the project we are presenting here. Me, I'm Dirk van der Roelitz from Freiburg University. I'm the local manager of the Biviklauert site in Freiburg. Please be warned. This tech is not a technical track. It's more about what modern technologies like OpenStack are doing to your well-established long existing institutional framework and if you want to tell you about a little bit on the challenges we are facing how to govern a federated science cloud for 20,000 users. Just a little bit on the context of the setting we are, we are in. So tertiary education in Baden-Württemberg, that is the state we are coming from just for the geographics is pretty much in the west south of Germany comprising nine universities and other 43 different institutions like arts colleges, teacher training colleges and so on which are all managed by the state ministry of higher education, arts and science. That means that we have each year pretty much of 36 360,000 students and have a significant number of people working in those institutions, which in the end comes down to the fact that there's a significant demand of computational and cloud resources which should be provided at some point. So we got to the initial situation in 2014 when we started that project that we found that there were no established infrastructures for the deployment of such large-scale resources. Another problem we got is that there were no self-service functionalities offered by the traditional services. So often, for instance, VMV, ESX or Hyper-V virtualization infrastructures were already established, but meant to fulfill certain procedures to get up in such a resource which was quite tedious for the half which was entitled to and usually impossible, for instance, for students to use. So the idea was to create and operate a federated cloud infrastructure. What is usually a thing which is not a tool to handle by a single university computer center. So there were a couple of points we needed to consider when actually designing that project and the setup. In research and science, usually commercial providers are often not the primary option. Either the pricing models doesn't necessarily fit, especially if it's on if you are on some grants and funds, it's often difficult to dedicate money on such external resources or they are privacy related issues because of intellectual property research and so on. So what we did by then was after the Ministry, which is overseeing those different universities, set together with the heads of the computer centers in the state and decided to create a federated cloud, which means that not one single university is providing this service for all or that all universities are provided with the funds to set up a single cloud for itself. So the mission was federated infrastructure which then could be used by all users in tertiary education in the state. The start of the project by then was to decide what kind of an infrastructure should be should be set up so that we can offer infrastructure as a service to a huge variety of users we got. It was decided however such decisions are made by then that there should be four different operating sites. Those operating sites should coordinate to a certain degree but had certain freedoms on the on the other end. All those sites got connected by the dedicated network for educational resources in the state which provides bandwidth up to 100 gigabits per second. Which means that there is no much difference for users from non-hosting sites to connect to the cloud and shouldn't feel much difference in performance. The idea was that it should work on standard commodity hardware, should provide self-service functionality, different storage bag ends, should not cost much or not at all license fees because many of the computer centers had their problematic experience with the typical VMware licensing models. Preventing them from properly scaling out. So that was the main objective that we can massively scale up if the demand is rising. So we came up with OpenStack as the kind of the most permanent cloud project available in those days. For our users, we have quite a bit of a variety, so what we were doing was to identify certain user groups which made them for different use cases. Primarily research and scientific staff but then a computer centers should be able to host at least parts of their standard services within the cloud infrastructure and of course for standard employees which have certain needs to test out services software and so on. And finally providing students with a generic resource for their studies in various fields. At that point, I will hand over to my colleague. Thank you very much. Hello, my name is Janne and I'm the BivyCloud project manager. Thank you very much for the insights into the history and the development of the BivyCloud project. So let's have a little closer look at the current and future infrastructure we actually use. The BivyCloud is, as Dick already said, powered by OpenStack and we build a multi-region setup which includes four different regions at four different locations or cities. Each region acts mostly on its own but we share some key components. For example, we share the authentication component Keystone which is located in Freiburg. The Keystone is connected to the statewide identity management system called BVEDM. It acts like eduroam so the users can use their institutional credentials to do the registration for our service. We also use some sort of shared glance repository to store and provide the list of templates for the virtual machines and to ensure that user can access the same list on every region. Because of the multi-region setup, each region uses its own public IP range. But the users access the BivyCloud by using the central horizon dashboard. So there's only one point of entry into our system and once they are logged in they can choose their desired region. In terms of resources, we offer standard virtual machines in different flavors. I guess we have actually seven different flavors. Users get a virtual machine, get a sub-storage attached, a public IP and that's it. But for most of them, that's enough. If we take a look at the current hardware, then we see a picture which is very heterogeneous. You can see that different sites have access to different numbers of nodes. That's because each region was encouraged to buy hardware by using their own procedures. That led to the situation that we have to deal with heterogeneous hardware environment. But that's okay, we wanted to do that because we wanted to evaluate if we could set up a software setup which can handle heterogeneous hardware and well, it does, it can. It works very good. In this current setup, we realize the hyperconverged architecture with OpenStack and SEV. We bought node-dedicated storage systems, but we bought hard drives which are included in the compute nodes. The whole hardware setup works very well, but we are going to change it a little bit in the future. For example, we are going to buy separate storage and compute nodes. We want to divide this a little bit. We want to offer two different paths of scaling up. So when the user comes who has demands for huge storage resources, we can offer them to say, okay, let's increase the storage system by buying new hard drives other way around. If we take a closer look at the current available hardware, you will also notice that the number of nodes actually is not that huge. 32 nodes distributed over four regions means eight nodes per region. That's not very much, of course, but we are going to scale up this year. So we want to do an investment of almost 800,000 euros, which means that we will spend 200,000 euros only for hardware per region. This is not much, but it's more than we have, actually. Our goal is that we are able to host at least 1,000 virtual machines per region for the start. So let's do a first sum up of what we've heard. The BB Cloud project is a cooperation between different universities and was initially funded by the Ministry of the State. The funding changed over the time. First funding included 100% funding for personal and hardware, but in the second stage, during the project, this funding already changed. Only expenditures for personal got 100% funding for hardware. They had to be an own share of the operating size of at least 50%. The good news is our technical infrastructure is up and running. We opened the BB Cloud for a greater users community at the end of 2016, and since then, we recognize a steady increase of numbers of registered users. This is the good news, but new challenges arise, especially in the field of governance and steering the BB Cloud. We are very happy to see that this service we are going to offer is obviously something the users want to use, and we definitely want to increase the number of users, but we have also to be prepared on what's going on and what will come next. So there are some very important obstacles we have to deal with, actually, and we are going to deal with in the future. For example, how do we organize the process of the distribution of the limited available resources? What is fair in this context? What does it mean? Do we want to follow the policy, first come, first serve? Or are there any VIPs who get more resources than others? What's the metrics to decide who is VIP and who not? And, of course, there are some very important things to think about, the costs. Who is going to pay the bill? We saw that the ministry paid in the past, but they won't pay in the future, at least not for everything. And what about the needs and demands of our users' community? Our users' community is very heterogeneous. We are a multi-purpose, big cloud infrastructure, so we have the whole range starting from simple students with probably not so much or not so huge demands for resources, ending up with scientists who want to build their own Hadoop cluster and analyze petabytes of data. All these questions mean that we have to be prepared about our stakeholder and that we have to think, especially about the agendas, what do they really want to do? So let's start with the open and hidden agendas. Let's start with the users. The open points are, of course, they want to get reliable resources as fast as they can, and they want to scale up. If there are new scenarios they want to do, they want to scale up and want to get more resources. What they don't want to do is to pay for the resources, or at least they want to save as much money as they can. They want to get reliable, trustworthy resources, of course, and it should be cheaper than Amazon or Google, or maybe it costs nothing. That's some kind of hidden agenda, maybe, but it's obvious. The next point is the organizations, the universities, the colleges, they want to provide decent, modern research infrastructure. They want to offer some infrastructures to their students and scientists to increase their attractiveness, of course. They want to increase the number of students, but they also don't want to spend money on various different and individual infrastructures, and they are a little bit curious about corporations. On the one hand, they know that building and running complex infrastructures means often you have to cooperate with others, but on the other hand, they look very closely on their own users and they say, okay, what's in the deal for my people, for my users, and how can we maximize that without doing too much work for foreign students? So there is a kind of conflict. The next group of stakeholders is, of course, the group of operators or the computer centers. They have a vital interest in building and running a cloud environment because with modern infrastructure, they can fill the gap between science support and strengthen their role within their own university. Especially a cloud environment can break up some monopolies in the field of virtualization, which is typically dominated by VMware, for example, and everybody knows about their licenses and their fees, that's very expensive. And last but not least, those cloud infrastructures allow them to collect and restructure old hardware systems so they can catch up all those old running systems on the campus field and say, well, you don't have to run this old hardware anymore, we have this cloud resources you can use. The hidden agenda we face is a little bit more difficult. Cloud computing is not only a technology, but it's also a possibility to restructure internal processes. And things become more responsive and faster. DevOps, for example, has not to wait until the needed hardware is delivered, they can speed up their processes. But computer centers tend to be conservative in some ways. The phrase, we always did it that way, so why change it now is a quite common and typical phrase, and not everybody is unhappy with that phrase. So, yeah. And last but not least, we have the group of funding agencies, or we want to mention the group of funding agencies. They play, of course, a major role in science and research. They want to ensure that the tax payer's money is not wasted, but that it is used in a good and efficient manner. They want to support decent research infrastructures to stay within the political computation ahead of their opponents. What they often intend but not say is that they want to encourage the computer centers to start those restructuring processes I earlier talked about. Restructuring of internal methods and processes to become more agile and to focus more on the demands and needs of the users. Offering a significant funding, for example, for a cloud infrastructure helps a lot to convince the computer centers to go in a new direction. And, of course, the funding agencies want to avoid funding dozens of small grant applications, so the process of centralization is also here vital and in their best interest. Yeah. And now I am stuck. Sorry. Okay. All those different areas are the area of governance. I explained and I introduced you to. So what is this about? Governance of the BV Cloud means that we have to handle and manage the expectations and demands of the different users and the different stakeholders. What it makes a little bit hard for us is that we are building a new form of cooperation and there is no blueprint, at least not in Wittenberg or in Germany for this kind of cloud infrastructure we can ask and take as a role model, for example. So we have to think about how to do it in our own way and figure out what is the best, at least for Wittenberg. The technical solution we chose earlier is very important because it forms the basis for all the subsequent actions and processes and here the OpenStack portfolio fits perfectly well our needs. So this was a very good choice, of course. To do. Money flow and compensation is also a very important subject we have to deal with, especially in the public sector. We are facing some obstacles and some problems with money flow. If we talk to the users and say maybe the resources will cost at least in the future a little bit, they are not very surprised. They say, okay, I can understand. I got some resources from you. It's a trustworthy environment and if I go to Amazon or Google, I have to pay too. So even if I have to pay you a little bit more than Amazon, a little bit, it's okay. But if we... The point, the problem is we can't actually take this money. It's very easy. We have no structure to collect the money so it's a political process to organize all those money flow processes and to ensure, for example, that this money is used for scaling up or something else. So even if we want to take money, we are currently not able... We are working on it, but we are currently not able to take it. Another point is how to organize external money which comes in from third parties. I mentioned earlier that there may be scientists who want to increase our storage system and they bring in their own money. That's well, we are suited for that, but we have to think about what kind or we have to think about the amount of this hardware is exclusively used by the contributor and about the amount of hardware or storage or whatever is used by the public, for example, as compensation. And of course, Dirk already mentioned that point. Very shortly, researchers have now new possibilities to buy resources, but the funding agencies are not always able to handle those new resources. They act often in a very traditional way and they want the users to buy their own hardware. And when the users who got the grants come up and say, well, this is this system and we can buy virtual resources, the funding agencies are a little bit... Yeah, we have to convince them that it is equal so that the users don't have to buy their own hardware. Yeah, next point of course, how to motivate users to free unused resources. What about the daily operations we have to face? The hint that we will be in the future is that being able to build them is a good point to encourage the users to think about what kind of resources they really need. Actually, our data cloud resources are free for all users. No one, nobody has to pay anything. So users tend to collect virtual machines and to collect resources more than they need. This is also a point. We have limited resources and we will have limited resources so we have to think about mechanisms, how to ensure that we deploy, not deploy, but remove unused virtual machines to free the resources, for example. Earlier these days, I guess it was on Monday or Tuesday, I listened to a talk about the flavor management, flavor management is also a very important question. We see coming up lots of questions or more and more questions about our flavors and the users asking us, well, can we have this flavor with 15 gigs of RAM, not 16, and can we have this combination and stuff like that? We tend to use a very, we try to use a tight flavor management, we call it. We offer at least seven different flavors and we don't want to change it that much, but we have management for power users so that we can act on individual demands. And last but not least, of course, very important to keep the connections with the users. We have to keep them updated on a regular base. We have to do meetings and this becomes an issue if you think about the possible number of users if we deploy this statewide service. So there will be some kind of point where it is no longer manageable to do one meeting in a year, for example, so we have to think about mechanisms to inform them. Yeah, well, what are the takeaway messages? We thought about what can we present to you and what might be interesting for you. Unlike many companies, we do have a very open-minded users community so we don't have to convince them to use cloud resources. If they see BabyCloud is for free, they are very willingly to come to us. And as I stated before, even if you want to collect some money or to get money for the resources, we have also problems to get this money and to spend it for a good reason. Once our operating team was formed or in place, we thought about our technical solutions and you know, there were many technical solutions on the market. So the process of choosing the platform is very crucial but we also saw during the last two and a half years, I guess, that technology is often not the problem because most functionality is already in place or invented or people or projects are working on it. So technology is mostly not the point. Managing cooperation and federation, that's the point. And it means lots of politics, talking, compromises, discussions, lots of them. And if you want to build a structure like BabyCloud is doing, you have to consider to hire also people not only doing forward technical stuff, but for doing politics and being diplomatic and talking to the heads of the computers and the heads of ministry being polite but without losing the mission out of focus or out of sight. It's a hard process because those infrastructures tend to be very conservative and they move not that fast. On the one hand, we have these cloud resources which enables the users to do new processes, fast processes and on the other hand, we have those old infrastructures, old structures like universities who sit there and say, well, we did it the last 100 years this way, why change it now? When building and running, especially a science cloud, do proper planning before the service is launched? This is obvious, but it's crucial. It's one of the points why the BabyCloud is steadily growing at the moment and we are not facing problems during that growing process. We did lots of planning in the beginning and we calculated of course also some time for evaluation and testing and we played around. We made some mistakes, we followed some paths which didn't lead us to the outcome we intended to but that's very important to do experiences, to gain experiences. If you want to build a structure like that, keep that in mind that there should be at least a little bit room for hours. And of course, talking to other cloud operating infrastructures is also very important to companies, projects, do trainings with the DevOps team, very important, very good thing and visit the OpenStack summits to get new insights, new impressions. Well, that was it from me for the moment. This is our team. Thank you for listening to us. I know it's late and if there are any questions, we are happy to answer. The question is how do we do the resource management? What is fair? Do we use quotas? Indeed, we do use quotas and those quotas are very limited at the beginning. If a new user is registered for the baby cloud, he or she has access to all four regions but only in one region the assigned project is equipped with a number of V-core, RAM, and storage and stuff like that. All other three regions, the same project is created but with zero quota for the moment. We are currently building an interface or some sort of add-on for the Horizon dashboard so that the users can transfer the unused quota from one region to another region. But those quota is very limited at the beginning. We want to steer that process a little bit. First question they ask once they spawn their first virtual machine is how do I get more quota and how much can I get? Mostly, at least actually, most times we say it's very easy how much do you want but in the future there will be some sort of mechanisms to sort out. We don't want to give, for example, the students too much resources. They get some resources to spawn at least, let's say, two smaller or one little bigger virtual machine but we want to ensure that the scientists, the projects, the computer centers which are also using the BivyCloud get the resources. Additionally, what we are setting up by now is kind of a try and buy scheme so that the researchers get resources to a certain degree in the beginning and if they are happy with that the idea is that they kind of can pool money and that money is then directly spent on hardware which directly is added to the resource and they get a share which kind of equals to the money spent on the actual hardware. So that is another model we are trying for the beginning and what you do need is you need a certain base infrastructure in the beginning so that you can show that you actually are able to handle all this and then people start to trust you and trust you their money. So that's the idea behind that. The new stack.io. It was my understanding and forgive me if that comes from American news sources that may have heard it from German news sources that Chancellor Angela Merkel had a program in mind for a special kind of high speed internet for Germany that was kind of a state internet that was specifically to be invested in by German companies with the intention of providing higher bandwidth internet service for universities and for other public services. Did anything ever happen with that or do I have my facts a little skewed? Well, that's a very good question. Let me try to give you some sort of... I need a picture to show it. I guess this one is very good. You see lots of states and in Germany we tend to divide everything between the states and the states are trying to get the authority on many subjects. So, for example, the question if there is a state, a federal statewide high speed internet for research and academia is a little bit complicated because in Baden-Württemberg in our state we have our own network, the Bellevue Court. But it's part of a bigger net, the DFN called the Deutsche Forschungsnet, for example. I'm not... I've not in mind exactly what you're talking about but I think that those initiatives are often announced and then when it gets to the realization there are popping lots of stakeholders and whatever up and say, well, we want to play within the field and some things tend to distribute in the area of whatever. I hope this doesn't get recorded. Otherwise I... It was him, it was him. So the typical problem of the federal state is it cannot directly fund research infrastructures in the single states. So there are some centralized institutions like the German Science Foundation which is state-sponsored but then individual researchers apply for grants and then can span it on research infrastructures and the state can try to kind of push money into the system for instance by the so-called Excellency Initiative where universities compete against each other and they can bring in huge research clusters to compete for that state money but to span it on infrastructures is quite more difficult because of that federal system regard. So higher education is led by the states and so it's really difficult to directly fund some infrastructure without involving all the states with all their particular interests in that. Let's turn it in a positive way. Chancellor Merkel is announcing something like that. She has at least to convince 16 ministries to do that. Otherwise nothing would happen. Maybe it's an ongoing process. The positive point is and we stated it here the responsibility for educational supervision is by the ministries of the states that once you convince the ministry of the state it's very easy and it goes very fast to get money and to build infrastructure. So that's the good point but the bad point is if you want to scale up above the state level on a federal level it becomes really messy. So if the federal government is funding a technology initiative it's more difficult for universities that depend on state funding to acquire or participate in that federal research because by design there's a disconnect there. It is just there are only those indirect ways. Of course the ministry of research for instance can put out some funds and universities and other research institutions can apply for those grants. That is possible but usually you just can create a certain framework that can set up some gold and then kind of open a challenge and those different institutions can apply. And then you can hope that you have a proper mechanism installed that something is actually happening in the direction you originally desired. But you cannot actually directly control the whole process from end to end. So that is not really possible in the system design we got at the moment. You're welcome. Well, any other questions? Disaster recovery, okay? What do you want to hear about this? Yeah, for that level of service at the moment we do not have a real disaster recovery in place. So we have certain precautions so that if parts of systems fail that for instance it would be possible to a certain degree to send a system to another side and start at that point. But for instance the management of the different IP networks and all this is not consistently handled in that. So lots of manual processes are involved in that. So at the moment we have a kind of a two tier model. Most universities one. So we have those commercial grade VMware Hyper-V or similar virtualization infrastructures which have usually several levels for disaster recovery and redundancy built in. And the cloud infrastructure is more meant for research purposes where for instance certain service outages are not that crucial than for other some like core service of the computer centers. That is at least the state we are in at the moment. Can't life migrate the machines between the sides? Yeah okay. No we don't do that. So we don't do that sort of disaster recovery. We do use CEP with a replication level I guess of three and we also put our root disks and the attached storage into our CEP system. So once the compute node fails there is a chance to restart on the same side. But we do not keep any copies of the data in sync with other regions. We try to build one region above including two different cities with a distance of at least I don't know 100 kilometers. And it worked. It worked but we got some very weird things to see because once you start for example a virtual machine on site A and then you move it to compute nodes which are physically located on site B the whole network traffic might go between those two cities up and around because the breakout point is still located on site A for example. And what we also saw that if the conditions of the OpenStack environments change or if something a little bit changed the whole region broke together because services went into timeouts or couldn't find them so that was the point why we decided to do a multi-region setup with four individual regions and not one huge region. And in terms of disaster recovery we encourage the users to try to solve it on an application level for example and we will help them of course. There are no more questions. Thank you again. Thank you again for attending and have a nice flight back, go back, whatever. Thank you very much.