 Hi, good morning Thanks for coming out on the last day early in the morning. So showing commitment. I like it My name is Andy. I'm a technical lead work on the nectar research cloud for the Australian research data commons. So I'm just I wanted to show I want to demonstrate this project that we've been working on called bumblebee bumblebee's Nectar research cloud. We sort of got a theme with our names. So that's where bumblebee comes from So it's a virtual desktop service that we built on OpenStack So first of all, I just want to give you like a bit of an intro into ARDC So the organization I work for it's nationally funded through the Australian government and This slide is like I don't want to talk about all of this really just going to want to highlight that ARDC Sort of supports research. It's kind of like a really broad kind of focus on how they support research The research cloud really sits in this sort of purple box at the end here With the research computing cloud. So so that's kind of the space that we're playing in but also wanted to highlight that the strategy for ARDC is really to Provide the Australian researchers with a competitive advantage through data. So what we're So the virtual desktop, it's it's really a it's a tool for researchers to be able to get really easy access to Workflows and and tools in an environment that they're that they're comfortable with or Stuff that they're used to right. So this is a screenshot here from Fedora scientific with Jupiter notebook running. So There's a kind of tools that researchers really like to use and it's particularly a Jupiter notebooks like it's got a Really it's used across a whole range of disciplines. It's really useful But quite often researchers struggle to install these tools themselves, right? So So part of the reason of all Why we're doing virtual depth sops is is so that we can you know avoid the complexity and overhead of infrastructure as a service kind of tools You know researchers We've been operating a research cloud for 12 years or so We've been doing infrastructure as a service for that we've been doing it quite well been offering lots of services But really one of the things that that often came back from feedback was You know horizons kind of complex, right? There's a lot of things in there It can be quite daunting, right? So a lot of users that don't need that complexity They don't need all that power. They just need something that they can use All right, they they know some that they know the tools that they want. They just don't always have the means to get to it easily So we're able to package tools and workflows in a really convenient way We're able to support users in a standard operating environment, which you know makes support easier It helps him get get started faster And a big one for research data is Data sets can be huge right and it's it's impractical to be able to move that data in many cases, right? So if we can bring You know these environments that that users need Closest to the data, then that's a real win. Then they can be productive much quicker And we found that virtual desktops are really fantastic for training courses So if you've got a whole room of people who participate in a training course You know, you don't want to lose half an hour an hour just trying to set up their local environments because that's just wasted time so especially for remote training courses virtual desktop really is a is a Huge advantage for us So for our service, there's a couple of design decisions Obviously we wanted to make it simple as possible We wanted the desktop to be available in the browser. So users didn't have to install anything. They didn't have to configure anything We use Apache guacamole for that. I think we've we've heard of guacamole here quite a lot. So it's It's a quite a useful tool and I'll demo that later We've got like with researchers it's really important for us to have a Process to make sure that users don't just sit on the resources and keep them for themselves That can happen when they're not directly paying for it So we built in an expiry process that that means that those resources are reclaimed and they're available for the next person We've done boot from volume. So boot from volume allows us to have a little bit more flexibility in in The environment that we provide the users So we've got Instance and volume are loosely coupled. So that'll become clear later in the demonstration But it helps us to be able to If they're loosely coupled we're able to manage them differently and there's some scenarios where that makes a lot of sense for us Multi-availability and support was essential because we're a federation of sites all over Australia So we needed to support users whether they're in Brisbane or Melbourne And no income connections was it was a security thing for us. We wanted to make sure that users couldn't shoot themselves in the food or Users who weren't necessarily you know admins We do we wanted to make sure that they had a good time, but we're safe doing it So the only access through is through the guacamole kind of proxy So we've got a few different desktop types We've got we've got some that are generic. So we've got centos Ubuntu and Rocky Linux That is a new one for us. They're quite generic. They're really For users who know what they want to do. They've got sort of custom things that they they want to install The desktops all have root access. So they're able to sudo and install whatever they need, but we also support Fedora scientific Geo desktop and neuro desktop and they are desktops that have some tools built in so Geo desktops got some geospatial types tools Neuro desktop is a Really interesting one where? Applications are streamed in via CVM FS. So it's a really really powerful and flexible way to support users in a dynamic environment Fedora scientific is a It's an official spin of Fedora that includes a lot of scientific tools. So we found that's quite popular with researchers, too So when users come to our service and create a desktop, we allocate them for CPUs and a giga RAM So we're quite conservative here There's a good reason for that and it's because we allow any Australian researcher to just log straight in Basically, no questions asked. Well, we asked them a little bit about the research But but basically there's no approval process. So So any Australian researcher can get in and get access. So we've tried to make The barrier of entry really really low so the users can if they need this they can get in straight away without having to wait for us to You know prove or or gatekeep it So so we give them quite a modest amount, but we do support a boost functionality So if you decide that you need more resource, then you can hit the boost button and it will double your CPU and RAM so that's also time limited so There's a few things that we allow them allow the users to do with a desktop So when they first log in and launch the desktop, they've they're presented with these buttons Your shelving is like a shut-down thing. I'll talk about that later Extend is to increase the time if you're if your time runs out. You can extend it at any time keep going Booze delete Makes sense A little bit about the life cycle so from the start when users create a desktop They're able to hit that boost button straight away Boost we allow them to do for seven days and at the end of that process if they haven't extended it It'll just resize back down to the standard size. So that's that's just a nova resize that function So it's really flexible for us At any time they can go and extend the desktop to so they don't have to wait for the full time period Anywhere within that sort of 14 days, they're able to come and hit the extend if they know they want to use it a little bit longer But if they don't and it hits that 14-day time period then it'll go into a shelved state and so for us the shelved state means that Their instance is you shut down and delete it, but the volume stays around So that's kind of the advantage of decoupling the instance from the volume is it allows us to manage those resources independently And so if it's stuck in that shelves they will not start if it's in the shelves state for 30 days After that time period. We actually delete the volume and clean up This is what we tell users we tell them that we've deleted it But actually we archive the volume because we know that researchers like to come back and say hey I really needed that research thing. I did like three months ago, right? So, you know, we have a little little thing there so But if users opt to delete their own then it does actually get deleted So for access in Australia, we've got the Australian Access Federation, which is a federation of It's a identity federation. So I think nearly all of the Australian Universities and many of the research organizations are members of this So it makes it really easy for us to support the whole kind of Australian research community because they're all part of this federation So we're able to support them through open ID connect and for us we run key cloak as our identity broker It makes it really easy for them to get in so So how it works is virtual desktop images We we build them with pack up And we use Ansible for provisioning those we convert them to volume snapshots And the snapshots are stored in each of the availability zones that we run in Part of that is because cloning from snapshots is the fastest way that we can get that storage set up for them Unis users manage their desktops via the bumblebee web service which I'll demonstrate very soon And so as part of that process when users create a desktop or resize it or delete it The bumblebee web service will create an asynchronous job And it goes via a redis database to a worker and that worker will then call the open-stack API's and and do all that kind of processing So it all happens asynchronously and the users get nice little Nice little bar that shows them their progress as it goes So the volumes and instances as said they're managed by the asynchronous worker process It does all the hard work for us basically The bootable volumes are cloned from the snapshots mentioned that already And then when the desktops come up there They're fed a cloud in it config that provisions the user account and and a few other things that are specific to them And then at the end of that process There's a there's a phone home system that basically calls a webhook within bumblebee so that it knows that the instance is ready to go And then from there users are given a big green button that says you know go to my desktop And that link will send them directly to a guacamole server that's based in their availability zone So we try and push as much of that stuff to the local site as we can to improve latency And then guacamole connects to the desktops and it uses RDP on the back end so we found RDP Provided by XRDP on the instance provides really good performance and allows us to do some really neat things like audio support as well, which is Not not not all that common. I think in a lot of these virtual desktop services So a little bit about the architecture so the user on the left-hand side there connect through a load balancer to the main web service It's using MariaDB for its back-end storage And you can see it links to Redis where it connects to the bumblebee worker that does the sort of a synchronous job stuff And the bumblebee web service and worker they both talk to the OpenStack API for Provisioning and querying status and that sort of thing as you can see that that component there is we run it in Kubernetes And we've got a Helm chart for the bumblebee stuff Works quite well for us and so this this core component runs in our In our sort of main data center And then we've got the site component So the site component this is the part that we push out to each of the availability zones So when users have their desktop and they're given a link to guacamole, they'll come in through the load balancer and to the Apache guacamole cluster there That linked up there to MariaDB. That's actually a shared database between Gacamole and the bumblebee web service. So that's that's how it communicates through that database And then it connects to the desktops there at the bottom So you can see that that sort of red part there. That's running an OpenStack and we provision that with heat We've got a couple of different Templates that they sort of build this environment and we've got like an A and a B set so that sort of gives us stability to Kind of fail over or if we need to do upgrades and things like that so that we don't have any There's no outages or anything like that for the users So demo time Hopefully So I pre-recorded this to make it easier, but I mean I prefer an online demo bit. How's the Wi-Fi? Moderate. Yeah, I just had it running before too Yeah, that's a great idea. Oh Here we go. Once the lights will be all right. Oh, yeah, here we go. Excellent All right, so I've just hit the sign in button here I'd already been signed in so you didn't see the full open ID connect process, but So what I'm gonna do is I'm straight in I'm gonna look at this one to jammy desktop here We've just got a little bit information and gives the users some idea about what's provided in the image And then I can choose my availability zone from here. I'm gonna choose Monash in this case and hit the create button All right, so at this point there's been a job sent to the worker and the workers starting to you create the resources So that I hope you can see this but basically what's happening here is on I'm demonstrating So you see the bar what the what the status is at the desktop there and then in here We're looking at the nova and single lists basically for this This desktop so the volume is created first Clone from our golden snapshot and then once that's finished the nova instance comes up attaches the volume And then it's going to an active state and so at this point the instance is booting and then we're waiting for Clouding it to do its thing We're just looking at the the logs here So we've got cabana set up We all the log stream into here so we can kind of keep track of what's happening So you can see last status up there was the instance was active and so Cladding it's going to do its thing and then at the end of that that cladding process will get that phone home back to the the web service and then It'll know that the the desktop's ready to go so It's a pretty quick process considering it's spinning up a whole VM It's it's not too bad. I think it's it's roughly around 60 seconds depending on the availability zone sometimes a little bit more sometimes a little bit less but Our web designer thought it'd be great to have this little spinning be right to distract you from the weight It's really nice All right, so the log solve show that It's come up. Yep. So you can see the phone home was successful so So it's ready to go. So at this point You can see a little bit about this current size and things like that, but hit the green button It opens up in a new tab And then guacamole launches So guacamole is We sort of loosely coupled the the two components so that users Don't have to necessarily go straight to the bumblebee web service if they want to log in they can go directly to their guacamole Server if they want to both of them have open ID connected because it's single sign-on if you sign in in one You can go straight in with the other. So so that works quite well. So you see like the desktop is quite responsive I mean the latency between here in Australia is a bit rough. So it's doesn't look quite as nice, but better than I expected So we've got you know like Firefox We've got some standard tools in there and I think I just did a little demo to show that it's got How many cores and how much Ram's in there? So I've been proc there It's been out to see isn't it's a little small but that says four You've got eight gig of RAM. All right, so so the next thing I'll demonstrate here is the boost functionality. So I'll close the tab so we don't we don't need to log out of this or anything. We just I just closed the tab We can go back to here the management page And then I hit the boost button you just get this nice little dialogue. I love the animations. It's really nice And then we go through this sort of process again, so we'll get the the bar there. So it's just doing an over resize Makes it really easy with the volume because they're sort of we can operate on them independently but we're just gonna do a resize and then Doesn't take very long and then it will automatically do the verify resize Reattaches the volume and and then it's ready to go and it's just got to wait through wait for the phone home process to happen again so we have a phone home as part of Like the last stage of the cloud init process, but then it also sets up a system d-service so that If these is rebooted in any way That it's only really going to run when you first provision the instance So we set up a system d-service that always runs at the end of boot for any times you do any other processes, so We could see that there the 8v CPUs now 16 year RAM and then we can go straight in and And we're back to where we were before Unfortunately, we can't do live resize like I think that would be really nice if we could dynamically Add the CPU and RAM, but I haven't been able to find a way that we can really support that in a practical way But I don't think our research is mine too much at this stage. So yeah, I see you 16 grand so shelving is the is the last thing I'm going to demonstrate and so it's part of the shelving process the Then over instance, we'll just basically just get shut down and and destroyed And then the volume will just stay around so we can stay in that state as I said with the with the workflow We can stay in that state for up to 30 days before That gets cleaned up. So at any time during that period users can come back and and hit the unshelf button And they'll get their desktops back again So we're just looking at the logs there. We could see Instructed and over to delete the instance. So what happened in the terminal? Yep, so there's no Vince has gone volumes there sitting in an available state and Then just takes a few seconds for the interface to catch up There we go. So it's shelved So, yeah at this point. I think the demo. Yeah, I hit the unshelf so this is going to Another job on the work a queue It'll just create an over instance now We should see that come up in a second. Yeah, there it is So we're just waiting for the for the instance to boot now is this active So we're just waiting for the phone home. All right is it's ready So that's it That's kind of the basic workflow of how the virtual desktop works So we've we've deliberately tried to make it as as easy as possible And I think it works quite well, so I Just finished the demonstration by hitting the delete and then And then that's it so at this point users could create a different desktop at this stage We're any louder than to have one in the future that that could change but We find that one's probably enough for now It makes it a whole lot simpler to So the worker is now just going through and clean up the resources So it powers off so if we go back and looks logs, we'll see that it's Instructed know to delete the instance And then once it's satisfied that part is done. It'll go and clean up the volume as well There you go, it's yeah deleting so that's it. That's the that's the common whole workflow and log out All right, so it's into the demo So future plans we intend to support Windows Later on the year Windows has always been tricky for us because we're a federation of multiple Organizations and Windows licensing is Really tricky in these scenarios, but we think we have a path forward and Users are always asking for Windows support. So So we're looking forward to be able to provide that for them that the the actual code base Did have some Windows support and we ripped it out thinking we never gonna we're never gonna be able to resolve this license thing But looks like we're gonna add it back in But the whole the whole platform is set up to support that so it's going to be quite easy for us to add that back in I'm looking at creating a docket compose environment. So This project can be evaluated easier because at the moment It is there is quite a few moving parts It is quite complex for anyone else to kind of take it and try it out. So to make that easier. I think it would be Be a real bonus for anyone who wants to try it We're looking at doing some selenium integration testing There's a lot of moving parts for sure when you start talking about The OpenStack APIs and things like that. So we're hoping selenium will help us get better confidence in the code and our handling And you do occasionally get OpenStack failures. So, you know It's been surprising to me how how little failures we've had with this the OpenStack cloud We can have the occasional transient failure with one particular site might be Then I have some sort of back-end issue or something like that how we handle those errors is It could always be better. So as we're coming across these kind of strange scenarios We're building exceptions in the code to handle that better So the code is under an Apache 2 license. So anybody wants to try it. You're more than welcome We've got a repository there for the main web service There's one for specifically building the images. The images don't need a lot, but they do need RDP to be available And they do need support cloud in it so that we can create the user account and sort of things like that And we've got a Helm chart there for a deployment on Kubernetes So that's it Any questions? Did you want to use my grid? It's better for the video, right? two things Is it all in one project all of those instances and volumes? How do you handle that? Yeah? Yeah And obviously you manage that on the back end. Yeah, and what's the actual Tech for archiving your volumes So your first question. Yeah on our own set cloud. We do have a dedicated project for this It houses all of the instances and volumes and snapshots and and all that sort of stuff There's private networks and all sorts of stuff all in that volume. So it's nice and self-contained Users we deliberately didn't want users to have to have a keystone account to use this service We we wanted to make it separate from that. So it's easier for users to kind of get started So that's why we have sort of a separate user system and And we manage all of those open-sac resources Ourselves and and don't give users access to that What was your other question the archiving the volumes you said after the 30 days you archive volumes You don't actually delete them. Yeah. Yeah, it's just a standard kind of Image Image back up to swift basically. Yeah, Chris. Hello Copy and paste in and out of the guacamole session. This is a common pain point for our users I'm wondering how you guide people through it Just works. Okay. Yeah, I know that there is there is some scenarios where it's it's a bit funny I know that the guys who developed the neuro desk desktop They had issues. I don't know if it was around firefox specifically and there was something really strange And they published some stuff on their help in their help documentation But but generally I find like pasting commands in and out of the terminal and things like that just works for me Okay, maybe I need to try a newer one. Yeah, I should also say that Guacamole supports drag and drop of files as well, which is really nice. Yeah Yeah, it's a bit of a pain point for us because Guacamole itself is running in a virtual machine That's not like what come on is not running on the desktop itself. And so you kind of got this we've got this Directory on the guacamole servers that we can use for staging the files in and out But it's it's still a little bit cumbersome at this point Like I'd love to have maybe an NFS mounted directory for that So all users files get kind of put into a shared NFS across all the guacamole servers, but we haven't done that yet Yes, it will be yeah Yeah, it can't yeah My question my I had kind of two questions as well, which is how do you have a limit on the number of times you let people extend? No, no, we did initially and we found users found it frustrating So my question to that my related question to that is how do you handle patching of sort of long-lived instances? We we don't at this state. Well, we do have automatic security upgrades. Okay, so you have like yeah, you have the automated upgrades Yeah, yeah, so if I could wouldn't to you've got unattended upgrades and you know the sentos has its own whatever So so we do do that and we do encourage users to update if they can But we find that the desktops haven't really Persisted super long term yet. I mean service only been running probably about six months. Yeah, so we haven't had any serious issues except recently For the XRDP packages that we use the guy who's maintaining it on behalf of kind of EPEL the EPEL repo they pushed one that broke it and then so some of our users through the normal kind of Security update process got a new copy of RDP and then got locked out of their desktops How do you find XRDP? Because we're using VNC at the moment I quite like the idea of switching over because we do already support Windows. Yeah, RDP. So yeah I found it's really good much better than VNC. Yeah, I would like to try spice like I think there was a spice in X11 thing But I I just couldn't get it to work Yeah, I get it seemed amazing on paper Also two questions It's funny that we have all two questions the first one it's for multi-user access and some of the research There's always that it's not just me that there's the only one person that want to use only use one Virtual desktop to access for my current project or for example of different projects So at that point if I just deploy this BDM then on that case I will be have access but if we are in a group and so on how is this supposed to work that another guy can Also access this desktop. Yeah, we don't really support that And what we're doing is we're encouraging users who have those kind of scenarios like if they want more CPU more disk or They want it to be multi-user things like that. We're encouraging them to go down the path of Getting on to our cloud and we're gonna support all of these desktop images as Applications in our application catalog on our cloud So the process is a little more complicated because then they do have to apply for a project with us and get quite Allocated and things like that, but if they want those more complex if they've got more complex requirements, then We're happy to help them through that path instead Okay, and the second question will be about some kind of a schedule support, you know Because sometimes the most big paper on pain points at most of the company's assets, for example During working times. So this desktop should only just work on this time I'm probably on weekends. Nobody is using it and just the resources is running out at Corby probably Increase the cost of energy and so on. Yeah. Yeah, is there any also playing about the schedule of these VMs that you can also say for example on Wednesday I want to the server automatically kind of shut it down and on Monday, for example at 8 a.m It would be a really before I yeah, there's nothing built in the platform But I mean there'd be nothing stopping you from being able to implement that kind of stuff at the open stack level if you wanted to Yeah, yeah, so I'm sure there's options there I mean for us we find researchers just work odd hours and everything right so there was no schedule that we'd be able to do Thank you so much. Yeah, I think I'm way of a time But if you if you want to eat any more demos or anything like that just come and grab me in the hallway Be happy to chat, but thanks for coming