 Thanks for coming out on the last day early in the morning, so showing commitment, I like it My name's Andy. I'm a technical lead working on the Nectar research cloud for the Australian research data commons So I'm just I wanted to show I want to demonstrate this project that we've been working on called Bumblebee Bumblebee's Nectar research cloud. We sort of got a theme with our names. So that's where Bumblebee comes from So it's a virtual desktop service that we built on Urban Sack So first of all, I just want to give it like a bit of an intro into ARDC so the organization I work for it's nationally funded through the Australian government and This slide is like I don't want to talk about all of this really just going to want to highlight that ARDC Sort of supports research. It's kind of like a really broad kind of focus on how they support research The research cloud really sits in this sort of purple box at the end here With the research computing cloud. So so that's kind of the space that we're playing in But I also wanted to highlight that the strategy for ARDC is really to Provide the Australian researchers with a competitive advantage through data. So what we're So the virtual desktop it's it's really a it's a tool for researchers to be able to get really easy access to Workflows and and tools in a in an environment that they're that they're comfortable with or Stuff that they're used to right so this is a screenshot here from Fedora scientific with Jupiter notebook running so There's a kind of tools that researchers really like to use and it's particularly a Jupiter notebooks like it's got a Really it's used across a whole range of disciplines. It's really useful But quite often researchers struggle to install these tools themselves, right? So So part of the reason of all Why we're doing virtual depth sops is is so that we can you know avoid the complexity and overhead of infrastructure as a service kind of tools You know researchers We've been operating a research cloud for 12 years or so We've been doing infrastructure as a service for but we've been doing it quite well been offering lots of services But really one of the things that that often came back from feedback was The you know horizons kind of complex, right? There's a lot of things in there It can be quite daunting, right? So a lot of users they don't need that complexity They don't need all that power. They just need something that they can use right they they know some that they know the tools that they want They just don't always have the means to get to it easily So we're able to package tools and workflows in a really convenient way We're able to support users in a standard operating environment, which you know make support easier It helps him get get started faster And a big one for research data is Data sets can be huge right and it's it's impractical to be able to move that data in many cases, right? So if we can bring, you know, these environments that that users need Closest to the data, then that's a real win. Then they can be productive much quicker And we found that virtual desktops are really fantastic for training courses So if you've got a whole room of people who participate in a training course You know, you don't want to lose half an hour an hour just trying to set up their local environments because that's just wasted time so Especially for remote training courses virtual desktop really is a is a Huge advantage for us So for our service, there's a couple of design decisions Obviously, we wanted to make it simple as possible We wanted the desktop to be available in the browser. So users didn't have to install anything. They didn't have to configure anything We use Apache guacamole for that. I think we've we've heard a guacamole here quite a lot. So it's It's a quite a useful tool and I'll demo that later We've got like with researchers it's really important for us to have a Process to make sure that users don't just sit on the resources and keep them for themselves That can happen when they're not directly paying for it So we built in an expiry process that that means that those resources are reclaimed and they're available for the next person We've done boot from volume. So boot from volume allows us to have a little bit more flexibility in in The environment that we provide the users So we've got Instance and volume are loosely coupled. So that'll become clear later in the demonstration But it helps us to be able to If they're loosely coupled we're able to manage them differently and there's some scenarios where that makes a lot of sense for us Multi-availability zone support was essential because we're a federation of sites all over Australia So we needed to support users whether they're in Brisbane or Melbourne And no income connections was it was a security thing for us. We wanted to make sure that users couldn't shoot themselves in the foot or Users who weren't necessarily, you know admins We do we wanted to make sure that they had a good time, but we're safe doing it So the only access through is through the guacamole kind of proxy So we've got a few different desktop types We've got we've got some that are generic. So we've got sentos Ubuntu and rocky Linux Is a new one for us. They're quite generic. They're really For users who know what they want to do. They've got sort of custom things that they they want to install The desktops all have root access. So they're able to sudo and install whatever they need, but we also support fedora scientific Geo desktop and neuro desktop and they are desktops that have some tools built in so Geo desktops got some geospatial types tools Neuro desktop is a Really interesting one where? Applications are streamed in via cvmfs. So it's a really really powerful and flexible way to support users in a dynamic environment Fedora scientific is a It's an official spin of fedora that includes a lot of scientific tools. So we found that's quite popular with researchers, too So when users come to our service and create a desktop we allocate them for CPUs and a giga RAM So we're quite conservative here There's a good reason for that and it's because we allow any Australian researcher to just log straight in Basically, no questions asked. Well, we asked them a little bit about the research, but but basically there's no approval process So so any Australian researcher can get in and get access. So we've tried to make The barrier of entry really really low so the users can If they need this they can get in straight away without having to wait for us to You know prove or or gatekeep it So so we give them quite a modest amount, but we do support a boost functionality so If you decide that you need more resource then you can hit the boost button and it will double your CPU and RAM so That's also time limited. So There's a few things that we allow them allow the users to do with a desktop. So When they first log in and launch the desktop they're they're presented with these buttons Shelving is like a shutdown thing I'll talk about that later Extend is to increase the time if your if your time runs out you can extend it at any time keep going boost delete You know make sense A little bit about the life cycle so from the start when users create their desktop Um, they're able to hit that boost button straight away Boost we allow them to do for seven days and at the end of that process If they haven't extended it It'll just resize back down to the standard size. So that's that's just a nova resize that function So it's really flexible for us At any time they can go and extend their desktop too. So they don't have to wait for the full time period Anywhere within that sort of 14 days They're able to come and hit the extend if they know they want to use it a little bit longer But if they don't and it hits that 14 day time period then It'll go into a shelved state. And so for us the shelved state means that Their instance is you shut down and delete it but the volume stays around So that's kind of the advantage of decoupling the instance from the volume Is it allows us to manage those resources is it independently? And so if it's stuck in that shelved state Well, if it's in the shelved state for 30 days After that time period we actually delete the volume and clean up This is what we tell users we tell them that we've deleted it but actually We archive the volume because we know that researchers like to come back and say Hey, I really needed that research thing I did like three months ago Right. So, you know, we have a little Little thing there. So But if users opt to delete their own then it does actually get deleted So for access in Australia, we've got the Australian Access Federation, which is a federation of It's a identity federation So, I think nearly all of the Australian universities and many of the research organizations are members of this So it makes it really easy for us to support The whole kind of Australian research community because they're all part of this federation So we're able to support them through open ID connect And for us we run key cloak as our identity broker Makes it really easy for them to get in so So how it works is virtual desktop images We we build them with packer And we use Ansible for provisioning those We convert them to volume snapshots And the snapshots are stored in each of the availability zones that we run in Part of that is because cloning from snapshot is the fastest way that we can get That storage set up for them Users manage their desktops via the bumblebee web service, which I'll demonstrate very soon And so as part of that process when users create a desktop or resize it or delete it The bumblebee web service will create an asynchronous job And it goes via a redis database to a worker And that worker will then call the open stack apis and do all that kind of processing So it all happens Asynchronously and the users get nice little Nice little bar that shows them their progress as it goes So the volumes and instances are said to be managed by the asynchronous worker process It does a lot of hard work for us basically The bootable volumes are cloned from the snapshots I mentioned that already And then when the desktops come up They're they're fed a cloud init config that provisions the user account And a few other things that are specific to them And then at the end of that process there's a there's a phone home system that Actually calls a webhook within bumblebee so that it knows that the instance is ready to go And then from there users are given a big green button that says Go to my desktop And that link will send them directly to a guacamole server that's based in their availability zone So we try and push as much of that stuff to the local site as we can to improve latency And then guacamole connects to the desktops and it uses RDP on the back end So we found RDP provided by XRDP on the instance Provides really good performance and allows us to do some really neat things like Audio support as well, which is Not not not all that common. I think in a lot of these virtual desktop services So a little bit about the architecture So the user on the left hand side there will connect through a load balancer to The the main web service It's using MariaDB for its back end storage And you can see it links to Redis where It connects to the bumblebee worker that does the sort of a synchronous job stuff And the bumblebee web service and worker They both talk to the OpenStack API for provisioning and querying status and that sort of thing As you can see that that component there is we run it in Kubernetes And we've got a Helm chart for the bumblebee staff Works quite well for us And so this this core component runs in our In our sort of main data center And then we've got the site component So the site component, this is the part that we push out to each of the availability zones So when users have their desktop and they're given the link to guacamole They'll come in through the load balancer and to the Apache guacamole cluster there That linked up there to MariaDB. That's actually a shared database between Guacamole and the bumblebee web service. So that's that's how it communicates through that database And then it connects to the desktops there at the bottom So you can see that that sort of red part there. That's running an OpenStack And we provision that with heat. We've got a couple of different Templates that they sort of build this environment and we've got like an a and a b set So that sort of gives us the ability to kind of fail over or if we need to do upgrades and things like that So that we don't have any There's no adages or anything like that for the users So demo time Now hopefully So I pre-recorded this to make it easier but I mean I prefer an online demo bit How's the wi-fi? Moderate. Yeah I just had it running before too Yeah, that's a great idea Oh, here we go Once the loads will be all right Oh, yeah, here we go excellent All right, so I've just hit the sign in button here I'd already been signed in so you didn't see the full open ID connect process but So what I'm going to do is I'm straight in I'm going to look at this Ubuntu jammy desktop here We've just got a little bit of information and gives the users some idea about what's provided in the image And then I can choose my availability zone from here I'm going to choose Monash in this case and hit the create button All right, so at this point there's been a job sent to the worker and the worker is starting to Create the resources So that I hope you can see this but basically what's happening here is I'm I'm demonstrating So you see the bar what the what the status is at the desktop there And then in here we're looking at the nova and single lists basically for for this This desktop so the volume is created first Cloned from our golden snapshot and then once that's finished the nova instance comes up attaches the volume And then it's going to an active state And so at this point the instance is booting And then we're waiting for cloud in it to do its thing We're just looking at the the logs here So we we've got kibana set up We all the log stream into here so we can kind of keep track of what's happening So you could see the last status update there was the instance was active And so Cloud and it's going to do its thing and then at the end of that that cloud in it process We'll get that phone home back to the the web service and then It'll know that the the desktop's ready to go so It's a pretty quick process considering it's spinning up a whole vm It's it's not too bad. I think it's it's roughly around 60 seconds depending on The availability zone sometimes it'll be a little bit more sometimes it'll be a little bit less but Our web designer thought it'd be great to have this little spinning be right to distract you from the weight It's really nice All right, so the logs all show that It's come up. Yep. So you can see the phone home was successful so So it's ready to go. So at this point You can see a little bit about the this current size and things like that, but hit the green button It opens up a new tab And then guacamole launches So guacamole Is it we sort of loosely coupled the the two components So that users Don't have to necessarily go straight to the bumblebee web service if they want to log in they can go directly to their guacamole server if they want to Both of them have open id connected because it's single sign on if you sign in in one you can go straight in with the other So so that works quite well. So you can see like the desktop is quite responsive I mean the latency between here in australia is a bit rough. So it doesn't look quite as nice, but Better than I expected So we've got you know, like firefox, um, we've got some sort of standard tools in there And I think I just did a little demo to show that it's got How many cores and how much ram's in there? So type in proc there It's a bit hard to see isn't it it's a little small but that says four Uh, and you've got eight gig of ram all right, so So the next thing i'll demonstrate here is um the boost functionality. So Uh, I'll close the tab. So we don't we don't need to log out of this or anything. We just I just closed the tab We can go back to here the management page Um, and then I'll hit the boost button You just get this nice little dialogue. I love the animations. It's really nice Uh, and then we go through this sort of process again. So we'll get the the bar there. So just doing an over resize um Makes it really easy with the volume because they're sort of we can operate on them independently, but um We're just going to do a resize and then Uh, it doesn't take very long and then it will Automatically do the verify resize Uh reattaches the volume and And then it's ready to go and it's just got to wait through wait for the phone home process to happen again So we have a phone home as part of Like the last stage of the cloud init process But then it also sets up a system d service so that Um, if the this is rebooted in any way Cloud init is only really going to run when you first provision the instance Um, so we set up a system d service that always runs at the end of boot For any times that you do any other processes. So, uh, we could see that the 8 vcpu is now 16 giga ram uh, and then we can go straight in and And we're back to where we were before Unfortunately, we can't do live resize like I think that would be really nice if we could dynamically Uh add the cpu and ram, but I haven't been able to find a way that we can really support that in a practical way But I don't think our research is mine too much at this stage So yeah, 8 cpu 16 giga ram so, uh shelving is the is the last thing i'm going to demonstrate and so As part of the shelving process, um, the Then over instance will just basically just get shut down and and destroyed Um, and then the volume will just stay around So we can stay in that state as I said with the with the workflow. We can stay in that state for Uh, up to 30 days before Um, that gets cleaned up. So at any time during that period users can come back and And hit the unshelf button, uh, and they'll get their desktops back again So we're just looking at the logs there. We could see Uh, instructed and over to delete the instance So what happened in the terminal Yep, so as an over instance gone volumes there sitting in an available state And then just takes a few seconds for the interface to catch up There we go. So it's shelved So yeah at this point, uh, I think at the demo. Yeah, I hit the unshelf So this is going to Other job on the work a cue. Um, it'll just create an over instance now We should see that come up in a second. Yeah, there it is All right, so we're just waiting for the for the instance to boot now Is this active? So we're just waiting for the for the phone home. All right, is this ready? So that's it. That's that's kind of the basic workflow of how the virtual desktop works. So We've We've deliberately tried to make it as as easy as possible Um, and I think it works quite well. So I just finished the demonstration by hitting the delete and then And then that's it. So at this point users could create a different desktop At this stage we're any louder than to have one In the future that that could change but We find that once probably enough for now It's it makes it a whole lot simpler too So the worker is now just going through and cleaning up the resources So it powers off So if we go back to logs logs, we'll see that it's Yeah, instructed to delete the instance And then once it's satisfied that part is done it'll go and clean up the volume as well There you go, it's yeah deleting So that's it. That's the that's the kind of whole workflow And log out All right, so that's the end of the demo So future plans we intend to support windows Later on the year windows has always been tricky for us because We're a federation of multiple organizations and windows licensing is Really tricky in these scenarios, but we think we have a path forward and Users are always asking for windows support. So Uh, so we're looking forward to be able to provide that for them That the the actual code base Did have some windows support and we ripped it out thinking we're never gonna We're never going to be able to resolve this license thing, but Looks like we're gonna add it back in But the the the whole the whole platform is set up to support that So it's going to be quite easy for us to to add that back in Uh, I'm looking at creating a docket compose environment. So This project can be evaluated easier because at the moment It is there is quite a few moving parts It is quite complex for anyone else to kind of take it and try it out So to make that easier, I think would be would be a real bonus for anyone who wants to try it We're looking at doing some selenium integration testing There's a lot of moving parts especially when you start talking about the The open stack apis and things like that. So we're hoping selenium will help us Get better confidence in the code and our handling And you do occasionally get open stack failures. So, you know It's been surprising to me how How little Failures we've had with this The open stack cloud We can have the occasional transient failure with one particular site might be Then I have some sort of back-end issue or something like that how we handle those errors is Could always be better. So As we're coming across these kind of strange scenarios where we're building exceptions in the code to handle that better So the code is under an Apache 2 license. So anybody wants to try it. You're more than welcome We've got a repository there for the the main web service There's one for specifically building the images the images don't need a lot But they do need rdp to be available And they do need to support cloud in it so that we can create the user account and sort of things like that And we've got a helm chart there for the plant on kubernetes So that's it Any questions Did you want to use my grip? It's better for the video, right? two things Is it all in one project all of those instances and volumes? How do you how do you handle that? Yeah, yeah, and obviously you manage that on the back end. Yeah, and what's the actual Uh tech for archiving your volumes Uh, so your first question. Yeah on our own set cloud. We do have a dedicated project for this Um, it houses all of the instances and volumes and snapshots and and all that sort of stuff There's private networks and and all sorts of stuff all in that volume. So it's nice and self-contained Users we deliberately didn't want users to have to have a keystone account to to use this service We we wanted to make it separate from that. So it's easier for users to kind of get started So that's why we have sort of a separate user system and And and we manage all of those opensack resources Ourselves and and don't give users access to that What was your other question the archiving the volumes you said after the 30 days you archive volumes You don't actually delete them. Yeah. Yeah. It's just a standard kind of Um, uh image image back up to swift basically. Yeah, Chris. Hello Copy and paste in and out of the guacamole session This is a common pain point for our users. I'm wondering how you guide people through it Uh, just works. Okay. Yeah, um, I know there is there is some scenarios where it's it's a bit funny I know that the guys who developed the neuro desk desktop Um, they had issues. I don't know if it was around firefox specifically and There was something really strange And they published some stuff on their help In their help documentation But but generally I find like pasting commands in and out of terminal and things like that just works for me Okay, maybe I need to try a newer one Yeah, I should also say that, um guacamole supports drag and drop of files as well, which is really neat Yeah, um, it's a bit of a pain point for us because Guacamole itself is running in a virtual machine That's not like guacamole is not running on the desktop itself. And so you kind of got this We've got this, uh Directory on the guacamole servers that we can use for staging the files in and out But it's it's still a little bit cumbersome at this point Like I'd love to have maybe an nfs mounted directory for that So all users files get kind of put into a shared nfs across all the guacamole servers But we haven't done that yet Yes, it will be yeah Yeah, it can't yeah My question my I had kind of two questions as well Which is how do you have a limit on the number of times you let people extend? No, no, we did initially um, and we found users found it frustrating So my question to that my related question to that is how do you handle patching of sort of long lived instances? Uh, we we don't at this state. Well, we do have um automatic Uh security upgrades, okay, so you have like yeah, you have the automated upgrades. Yeah. Yeah. Yeah So for like a wooden two you've got unattended upgrades and you know the sentos has its own or whatever So so we we do do that and we do encourage users to update if they can Um, but we find that the desktops haven't really persisted super long term yet I mean the service has only been running probably about six months. Yeah, so we haven't had any serious issues except recently For the x rdp packages that we use The guy who's maintaining it on behalf of kind of um Epel the epel repo They they pushed one that broke it. Um, and then so some of our users through the normal kind of Security update process got a new copy of rdp and then got locked out of their desktops How do you find x rdp because we're using vnc at the moment? I quite like the idea of switching over because we do already support windows. Yeah using rdp So, yeah, I found it's really good much better than vnc. Yeah I would like to try a spice like I think there was a spice in x11 thing, but I I just couldn't get it to work Yeah, I like it. It seemed amazing on paper then Yeah, also two questions It's funny that we have all two questions the first one it's for multi-user access and some of the research There's always that it's not just me that is the only one person that want to use only use one One virtual desktop to access for my current project or or example of different projects So at that point if I just deploy this Bdm then and that case I will be have access but if we are in a group and so on how Is this supposed to work that another guy can also access this desktop? Yeah, we don't really support that And what we're doing is we're encouraging users who have those kind of scenarios like if they want More cpu more disk or they want it to be multi-user and things like that. We're encouraging them to go down the path of Getting on to our cloud and we're going to support all of these desktop images As applications in our application catalog on our cloud So the process is a little more complicated because then they do have to apply For a project with us and get quota allocated and things like that, but If they want those more complex if they've got more complex requirements, then We're happy to help them through that path instead Okay, and the second question will be about some kind of a schedule support, you know, because sometimes the most big pain points at most of the company's access for example During working times. So this desktop should only use to work on this time And probably on weekends. Nobody is using it and just the resources is running out that could probably Increase the cost of energy and so on. Yeah. Yeah Is there any also playing about the schedule of these vm's that you can also say for example on wednesday I went to the server automatic color Shut it down on monday for example at 8 a.m. It will be really before I Yeah, there's nothing built in the platform But I mean there'd be nothing stopping you from being able to implement that kind of stuff at the open stack level If you wanted to Oh, okay Yeah, yeah, so I'm sure there's options there. I mean for for us. We find researchers just work odd hours and everything Right, so there was no schedule that we'd be able to do but thank you so much. Yeah I think I'm way over time But if you if you want any more demos or anything like that just come and grab me in the hallway Be happy to chat. Um, but Thanks for coming