 Okay, thank you very much. Hello everyone, thanks for having me. My name is Manuel Holtgriefer and I'm a trained computer scientist then did it PhD in bioinformatics. And now I'm working in the coordinate of bioinformatics at Berlin Institute of Health. And I will talk about our use of open stack and open stack ironic for our HPC system that we are maintaining. So quickly as the agenda, I will go over a bit of background of the Institute. So who are we and what are things used to? Our use case, a little bit of history, what happened until we got here. Then our current setup that we did with the help of the ironic and the ironic community, of course, our plans for the futures. And then I was asked to present on some challenges that we had. So maybe some, yeah, maybe they can be resolved with one sentence or maybe people just not and say, okay, everybody knows this is where it will be between them, these extremes. So first of all, I hope this isn't too boring for you. So I'm working at coordinate bioinformatics and we are about 12 people, mostly people doing data analysis, building data pipelines, tools, and we are working in scientific collaborations with people at Charité and other academic institutions in the life science health space. So we're analyzing a lot of molecular data of various forms and the aim of, so QB is for the coordinate bioinformatics is part of the Bernoulli Institute of Health, which is itself a part of the Charité University Hospital in Berlin and the aim of BAH explicitly is to make translations. So bring problems from the bedside to the bench side, get some insights, eventually improve diagnostics on the bedside, improve on therapeutic projects on the bedside. And it's, well, you could argue that Charité is doing this anyway, but there's this particular focus of the BAH which is a federal research institute. So Charité is a huge place. So there's about 25,000 people working here from doctor, students, genitors, everything you have in a hospital, researchers. There's about, the faculty has a size of about 2,000 people and yeah, we are very small group and we cater to ourselves and then other groups, such as the virology. So for example, I don't know if you're German speaking part of the world, you might have seen Christian Rosten who's the head of Charité virology and their group are working on our system. So Ironic is helping fight COVID, yay. And one part that is also important, we are independent of Charité Central IT and we're of course working together with them and interfacing with them. But it's not like, I'm not showing the main deployment of IT using Ironic at Charité. It's really our group, which is 12 people and then we cater to maybe 200 or more people. So it's sizable, but it's not all of Charité. Yes, so that should be enough. Otherwise, feel free to ask questions later on. So what is our use case? We have an HPC system of about 250 CPU nodes of variant ages. So some are five years old. Some are from last year. We have seven GPU use with four GPUs each. So quite some punch compute-wise, but it's not the largest deployment, of course. We have an old two petabyte GPU-FS appliance that will go end of life this year. And we're currently introducing another fast storage, tier one storage, as we call it, based on all NVME, which will be based on ZFFS. And we have some tier two storage. So for just archiving stuff also based on ZFFS, but they're on spinning disks. And we, by now, well, not all of these hosts have been set up with Ironic yet, but most of them. And then there's about 20 virtual machines to support the HPC operations, including this learn server, database server, some virtual machines for logging in, all of these things. And then our group also maintains a couple of server installations for data exploration, visualization tools, some data management software that we develop also by ourselves. And there's a couple of, say, non-HPC, just virtual machines that sit there and serve some web services. Things go these days. So and historically, we started out with the open source version of Grid Engine and deployed everything with our own, yeah, DHCP server. And then we had our own TFTP server and all of these kind of things. At some point we switched over to XCAT, which worked pretty well, but we, yeah, there was the, say, impotence mismatch virtual machines were managed by a software called, or we had a software called Proxmox. And migrating this to OpenStack really gave us a, yeah, coherent view on the system and suddenly managing bare metal and virtual machines is very similar, which is really good and great and helpful, yeah, saving us time. And the other things here, I copied a bit of the logos on the lower left here from another presentation. So, yeah, we're moving also away from commercial software to more open source based system. On the lower right, you can see a rendering of what we have in our data center, data centers I have to say. And I think I saw an etherpad of the ironic meeting which these are the decent software issues. So we're using that box. And if you know it, great, if you don't know it, have a look, it's good software. So if you're looking for something that can be useful and it can give you these renderings. So it's more or less 10 rex full of hardware that we have and we're managing it mostly with ironic. And if we don't, we're planning to. Okay, so roughly the network layout, this can be quick. So for the HPC network on the left-hand side, there is something like that, I would say an internal public network. So it's private IPv4 addresses. And then there's an internal private network, also IPv4 addresses. And yeah, the head nodes, so to say where people log in or transmit data through, or we also have a, yeah, graphical user interface portal, they're connected to both networks. And then we have all of the GPU and CPU machine genes only connect to the private network. So this is one open stack installation overall. And on the right-hand side, we also now manage the four physical servers in the DMZ with ironic more or less. So we installed it with KOB, which uses Bfrost, which then is ironic. And there's a couple of virtual machines there and also two networks, but not so much an internal external one, but rather one for things that are connected directly, virtual machines directly to the internet or to the DMZ. And then there's, we have something like a gateway node with an HTTP reverse proxy in the case that we need to proxy through to internal networks such as, yeah, S3 from our self-system, for example. So what do we use? So we use the Kola and Kayobi projects for bootstrapping the open stack Kayobi and then Kola for deploying open stack. And then also our self-storage servers are deployed directly with the Bfrost inside of Kayobi. And from then on, you have an open stacks server as you know, better than me. And then we bootstrapped everything else with Nova or and then the, I think it's a lip word driver and ironic for the Benetton nodes. So all of our HPC compute nodes are managed now with all of the supporting virtual machines that we have. And we have a handful of what I call here, user virtual machines. So there are some things that are not HPC but QBs. So these data exploration services that I mentioned earlier and there's also one virtual machine where we posted for another group where they run a web server for the internal use. Yeah, there was, we had some, you remember, I will go back one slide. It's a bit confusing, but so on the public network, oh wait, let me put it differently. As long as you deploy things on the internal network, that works very well with the standard Kayobi setup. But if you want to deploy bare metal on a second network, then you cannot use flat networking anymore. I think, I hope that is, I expressed that correctly. It's my understanding and I asked in the Kayobi or in the color chat and they confirmed it. And so I would actually now need to switch to a neutral networking for my bare metal deployment and I haven't figured out how to do that yet. And so I had to have two set bare metal servers on the internal public network that I set up by hand. So we've deployed everything as Rocky Linux, except of course, or more or less of course, but the color Docker images that we're using are just the stock color images and that's I think a center stream. What we do with Ironic is installing the more or less stock Rocky images. So all of our bare metal servers and actually also the virtual machines are installed just the bare operating system. And then we use Ansible to give them their roles to be either compute nodes or login nodes or slum controllers and all of these things. So we only have one image and we don't have, let's, I don't know, you could criticize this as it's not cloudy because you don't just deploy the image and there goes your application and you have one image for each role and we're still using YAM update to update the packages. But it works quite well for us. So we were able to replace our previous deployment using a hand-rolled TFTP servers and THTP and PixiWood servers that we set up. And then later on we replace this with XCAT and then from this thinking we now replace it with OpenStack and certainly there are things or steps that we could take to be more bare metal cloud but right now it works quite well for us and let's see how we can simplify our workflows in the future, yeah, becoming more cloudy in a way. And then also which, yeah, I think it's worth mentioning we don't give our users access to OpenStack so they don't deploy their own bare metal servers although that could be done certainly but it's more we have a classic HPC system that has 250 plus nodes and people log in to log in nodes and then can access the other nodes with the slurm, yeah, scheduler or workload manager. So this is our use case. It's not that we give people, people don't have OpenStack workflows, they have classic HPC slurm workflows, our workloads. Future plans, yes, as I said, we want to look into deploying our bare metal instances with more specific images to also reduce the time we spend out rolling out with Ansible and then there's of course the advantages of having immutable machines and you just redeploy them but for now we're happy with where we are. One challenge that I would see there but I might also not understand correctly enough. So we're using GPFS currently for our storage and to the best of my understanding, you have to like with GPFS you have some storage servers which form a cluster and then each of the clients also has to join the cluster so you'd not just mount something from a client that is a server but you have to join the cluster and this is a manual step and you also cannot do this from 100 nodes at the same time because there's locking taking place. So I would know how to solve that but it may not be that important because we're sunsetting our GPFS instance by the end of the year anyway. Another thing that I think will improve the deployment is yeah, using OpusDec Barbican for secrets management but yeah, we cross the bridge when we get there and get towards a more image-based deployment rather than an Ansible-based deployment. As I mentioned earlier, yeah, we really want to have Newton for HPC networks and we are very use case driven. So if we need something, we invest time learning how to do it and maybe if there's a third and fourth and fifth server on the public network that we need, we will address this because right now, it works quite well and I don't expect to touch that these two servers for quite some time. Then yeah, there's a couple of small patches that I want to do for Kayobi. So for example, spending tree protocol on the virtual bridges that would be good to have and which it currently does not support out of the box but which probably would be a simple patch but let's see when I find the time to do that hopefully soon and also at least so we deployed with Kayobi Xena and there one had to patch a Kayobi, a bit the Ansible playbooks inside of Kayobi so you could use a custom-based image for your OpenStack hosts. So this is not for the ironic Nova deployed hosts but for the OpenStack hosts that you deploy with Kayobi via BFrost. Yeah, then there's designate which might be useful for DNS but I have to look into the features and limitations right now we're maintaining a DNS server manually and roll that out with Ansible and then I need to teach OpenStack to more of my colleagues and we found it a bit challenging to find say suitable training in books for OpenStack ironic because everything that I found so far is how to get started with OpenStack and then when it starts to get interesting the tutorial or the book stops but I might also or they are very high level not so much hands-on but my experience with OpenStack and also ironic is that you need working experience and probably the best way would be to work in an organization that already uses it and then learn it but yeah, I had to learn by myself the OpenStack community was very helpful and there is a lot of documentation and the manuals are great it's just I sometimes feel at least I couldn't find it there is something missing or there was something missing for me to connect the dots the dots were all there but some connections were missing but I might just be have been missing the things and it was a really good experience using OpenStack it was just hard to learn I found so what kind of technical challenges not coming to the challenges that we have so first of all, yeah, we had all of these servers bought it's also, I don't wanna make advertisement for them but it's all Dell hardware so at least only one out of band management that we had and it's quite okay from Dell different versions still but yeah, we somehow had to make a homogeneous BIOS or UEFI settings to simplify deployment and there were some things that we had to adjust for use with ironic that we didn't have to use before but we figured all of that out and cleaned that up which is good in itself and then we had to, yeah, I call this here impedance mismatch HPC versus cloud so a lot of that I think is of the mindset people coming from classic HPC to just accept that if you deploy say 20 ironic machines, machines one to 20 they randomly go to your bare metal holes so you just have to accept that this is the way cloud works but yeah, one thing that we had, at least that I faced when having cloud ready images that they had this net interface names set to false by default and we are using eight OpenStack hosts where the virtual machines run and they come from different generations and it turned out that the ETH0 to ETHN setting was not as stable across reboots as I would have expected so yeah, I had to work around this a bit but again, this is something you can use different images and you just have to build these images and then we have a link aggregation everywhere so all of our network is either dual 10 gigabit or dual 25 gigabit based but we don't have dedicated in-band management it's really coming from this HPC part of the world where you have out-of-band management and then there is a compute and in a cloud world I now understand it's more common to have in-band and out-of-band, yeah, on the same part which is also fine and then you might have just one non-redundant network connection and where this came in as problematic was that you have to get the switch settings for link aggregation just right so for Dell, it's documented in the ironic documentation you just have to figure out the problem is on your end and it's well documented and ironic but you have to configure the link aggregation pod channels in the Dell switches that we have to passive and not active so the server initiates the creation of the bond and not the switch so we figured that out and resolved it and now things are working nicely and then I would say another category of technical challenges was that yeah, building images is not trivial so thanks a lot Anna for helping me there to and also Julia for using the disk image builder of OpenStack and yeah, it's just not trivial I tripped over a couple of places and just things that are hard and then one thing where I was building the images on my Ubuntu workstation but the disk image builder assumes that I think you use the same distribution so well, I got over all of these technical challenges after all, it's great, it's open source and you can introspect everything and look at the source code and see what's happening so that was really good and yeah, some things are really hard like yeah, Kayobi sits on top of collars which sits both sit on top of Ansible and now you need to figure out how do you configure a particular thing and some things maybe you cannot configure them but that was also quite challenging but also the collars and Kayobi communities really helped me but on the other hand, it's yeah, you really if somebody is seeing this and wants to do their metal deployment with OpenStack Ironic you really have to go to IRC and like of course, yeah I always try to resolve issues by myself but I found it hard to do everything on my own so that was, I got lost on the way and then yeah, of course everybody who's working with hardware knows rebooting these enterprise servers takes ages and it's really frustrating getting something to PixiWoot for the first time it's also really frustrating but then of course you have a hundred of the same machines so once we solve it for one you can PixiWoot a hundred and it's really hard to debug things until you get a prompt and can introspect things but again, you can introspect everything and once I figured out how to get a root prompt for one of these Ironic Python agent yeah, Boots, I could figure out almost all of my problems very quickly so everything went quite well and then the non-technical challenges yeah, I'm, you know, for the inpatient guides I'm the inpatient users and this is really my problem when doing these IT things because my job description says, yeah, I'm a scientist and while I know a lot of the general things and I now have quite some experience with the technical things it's, yeah, in software engineering they say, yeah, hours of planning can, no, no weeks of coding can save hours of planning and here it's the same way sometimes it would have been useful to just take the time and read the manual and I didn't do that so that was my failure and then, yeah, also it's quite easy to have some quick success and then you might build some technical depth so it should not be depth, but depth so, yeah, so that was my bet and also, yeah, you know, I had set up the system with our hand-rolled pixie-booting infrastructure and also with XCAT and that was relatively straightforward and with OpenStack it's a huge software and you don't know where things fit and I had no prior experience with it I figured it out in the end and documentation is there and it's great, but it's just like, yeah going from, okay, I think I know what I'm doing to really I have no idea what I'm doing and, okay, let's try to do first things first and figure out how things go and actually first thing that I found a bit hard was what's the preferred way of bootstrapping things? I understand this at least using Koby and Kola versus this triple O project and hopefully I picked the right approach by Koby Kola because that was what people were referring to as good deployments for HPC and, yes, OpenStack is a big project so it's tough to now in the beginning what goes where and what you need and where to look and well, it just takes time to learn, I guess and as I said, documentation, yeah I found some book by this I think it's already self-publishing this packed thing and some books are quite good and some are more or less just three blog posts you have sometimes a feeling so it's easy to get started but it's hard to become an expert, I would say with OpenStack there's this universe from nothing of Koby from this stack HPC company and without that, I would have been completely lost so this was extremely helpful but that again then making the step from having everything in a virtual machine to real hardware I found that challenging but again, the community was very helpful so thanks a lot for that and in the end I succeeded, so yay Yes, as I said, what is good? So really the ISC chats are really great I love that everything's open source you can look into everything you see that people had similar problems as me and everything is introspectable and you can get prompts everywhere and there's a lot of log files and once you've figured out how to increase the log verbosity with the KGB and quality deployment you can see everything it's really, really great I can deploy everything with Docker so you don't have these yet some lost installation files lying around so that's really great the integration with Ceph both for block storage and file system is really great the Horizon user interface I found this very useful and also instructive to discover things and of course the command line interface that's also really helpful for scripting and I forgot the Ansible modules so yeah, overall tough time for me, a lot to be learned but in the end, big success and I'm confident that we can use this for years now and improve stability and yeah, let me close with that so yeah, thanks for having me thanks for your attention and I'm happy to take any questions that you might have Thanks a lot Manuel for this great presentation are there any questions, Mr. Speaker? I think I have one I hope it's not too loud, I'm sitting outside thanks for the presentation, it was very nice I was wondering, how do you deploy the Slurm cluster? Do you use heat or using Ansible or your script is through the CLI or Terraform? So we only have one HPC system and then there's... So what is the Slurm cluster? So we deploy virtual machines with a... So the virtual machine is provisioned with the Ansible Playbooks and we deploy, say, two Slurm controllers and the databases and the Slurm DVD servers and so we provision the machines with Ansible then we install all of the software packages with Ansible and we then started up via system D with Ansible and the same for the clients we install the packages with Ansible and then started with Slurm DVDs so we're not using any of the advanced things in OpenStack but all of these things right now are deployed as stock rocky images and then we install the software and as I said, probably if we had purpose-built images it would have been possible to use heat to say okay, I want to have two Slurm controllers one MySQL database, two Slurm DVD servers and then I still would... Probably there's an easy way to bring the configuration to the servers but we're not there yet. Okay, thanks. Because I think in the compute nodes you use also Ansible to deploy them through Ironic, right? Yes, precisely. We deploy them with the Ansible modules that are there and yes, precisely. So it's... Okay, because we also look... We have a similar journey we are still using virtualized compute but it is possible, I think, to deploy everything with a heat stack where you specify different flavors for bare metal and the VMs and theoretically open... Not only theoretically but actually the OpenStack will do the right thing and deploy the compute nodes on bare metal and the Slurm controller and DVD on the VMs and then you have one heat stack at least that is on our side, our plan that we want to try out. Yes, so... If you... So one caveat that I would have is you can... With Slurm you can upgrade... Like make minor version upgrades by... Okay, this is very specific to Slurm now. The recommended way to install Slurm is not by building RPM packages but by building the sources, installing its two-opt and something version and then having a link to the active version. So this is the way that we use for deploying it. And if you do this, you can install two Slurm versions at the same time and change the Sim link. Just restart the Slurm demon and then you have a Slurm cluster running the next version. And if you were to use heat probably there's a feature to say to rolling, re-installations and these kind of things. I don't know, but I'm certain it has because that's like how you do things in a cloud. But I don't know whether... So how much you gain by using heat but still it should be possible. Maybe just to clarify, we wouldn't use heat to configure the software just to get the basic operating... Like you do it actually to get the basic operating system installed and then actually use Ansible to install packages and so on. The advantage of using heat is that you can do easily scale up and scale down and you have one stack to manage. So that was only our experience. Yes, that's of course nice. But then of course you have to be careful that you don't scale down compute nodes with jobs running. But still, yeah. That's true, yes. Probably, so we only have one HPC cluster. Probably if I had to cater to two or more, that would be a pretty nice way to have some kind of elastic growing and shrinking clusters. Yes. Thanks. Thanks, image. Are there more questions? Yeah, I have a question. My name is Dinesh. I'm actually joining from Chennai, India. I have a question here. When there's a host server machines, if their firmware need to be upgraded from time to time, various vendors may release the firmware updates. So instead of making the servers down, host machines down, whether Ionic provides any methods to do it on live systems? Yeah, maybe two. I can repeat the question, but I don't know whether I can answer. I'm not an Ironic expert. So you were asking whether you can have some kind of migration of the Ironic deployed machine while the... No, no, no, no, no, no, no, no, not migration. What I mean, periodically, the manufacturers of these servers, they may release certain firmware upgrades to the host machines, the base servers. Suppose you are going for HP or Dell or any other kind of manufacturers, they may release from time to time some firmware upgrades to support latest features or whatever. So in those cases, our client is asking what is the mechanism to do video firmware upgrades into the servers without making it down? I can say how we are doing it. So if you have for the hosts where there's virtual machines, you can have a live migration of the virtual machines to other machines and then do a cycling reboot by this by bringing all virtual machines to a moving all virtual machines to another host and then rebooting the host and have the... So our Dell iDRAC is configured to download all patches but apply them only on the next reboot. So this is how we're doing it for the hosts and for the bare metal machines. We're doing the same. It's just that we have... So Slurm can reboot a machine once all jobs are done. Oh, wait, I'd rather put it like this. The machine will accept no new jobs and once all jobs are done, Slurm or the operating system reboots and when it comes back after... If it comes back, Slurm will bring back the machine into the Slurm cluster. Okay, so there is a downtime of the base machine, essentially, it's required. For... In our deployment it is, but by migrating off all virtual machines or having the Slurm cluster manager reboot the machine, it's not visible to users. Right, agreed, agreed. And a second question I'm having. There are various tools you have specified here and Siblekola and Siblekola. So what's the selection criteria among this? It may be funny, but I was seeing, what's the selection criteria in your case? So I was looking to... I was looking for bare metal deployment or things to... There were some issues we're having with XCAT and we want to homogenize everything. And then I don't know how I got the idea to do this with OpenStack, but I thought, okay, at some point I figured out, okay, OpenStack can do these things. And there's other bare metal management systems out there. I don't know the names. And at some point I thought, okay, OpenStack also looks to have the biggest community, which was then our decision to do that. And then I found a video from some Samsung engineers also from one of the OpenStack conferences from three years back also. And they... It's titled, What options do we have to install OpenStack on bare metal? And I think they cited K-O-B as one of the up-and-coming and Ansible-based ways. And also there's AAA. And then I had a look and it turns out that K-O-B is mainly developed by people from a company called Stack HPC. And I thought, oh, it's HPC in their name. And then they had this wonderful block and it turns out they're friendly people and it's great software. And I know a bit of Ansible and that worked quite well for us. So this was the selection criteria. There was no feature metrics also. It was a bit of research, a bit subjective and a bit of gut feeling. Thank you, Manuel. Thank you very much for explaining to us in detail. And also I would like to thank to giving you such an opportunity by organizers. We're going to have a fantastic session here. I would like to thank for giving an opportunity to join this session. Thank you. Thanks a lot for the question, Danish. Any more questions? No more questions. No more questions from my side. Anyone else? If not, then thanks a lot again, Manuel. You're very welcome.