 So, our first speaker today is Alexander and he's going to talk to us about bio conductor Helm charts. Go for it. Hello, I'm going to talk a little bit about the bio conductor Helm set, which is basically a package back for deploying by conduct studio in Kubernetes. And it's quite agnostic. So it works on. In theory, any Kubernetes cluster and it's been tested on a bunch of the providers that I'll be listing later. So, um, first, I just want to give a little bit background on finalization Kubernetes and what that is. So, I think the finalization has become more and more popular in the past. Five or so years, especially in scientific community. So one of the, so basically what containerization is, is getting a small container that has its own operating system, generally a Linux operating system, and then building all the entire stack that an application or software package needs to run to already be in the container itself. So a container comes with all the tendencies that the software needs to run. It is built in an immutable. So once it is packaged and put out there, it always comes with the same stack in it. And one of the big things about it is that it shifts a lot more of the responsibility of making sure that everything works to the developer. So if the container was built properly, it worked when it was packaged in theory, it should continue working forever because all the dependencies are there, even if there's upstream changes, those upstream changes don't affect the container itself. So one of the big notable benefits of that is that one for the user, they don't have to actually understand a lot of the peculiarities of installing software, especially peculiarities that have to do with their own operating system. So whether you're on Windows, Linux, Mac, or anything else, you can just run a Docker singularity container and in theory it should run the same way because it comes with its own Linux, built in Linux kernel inside. And that is especially helpful for non-computational or less computational scientists that don't want to deal with errors in compiling software or pulling dependencies. And for the developer themselves, it also gives them a little bit more leeway in how often they have to update the software. So in general, it is still good to update dependencies, generally get updated because of security flaws or other things, but it gives them more leeway. So if somebody changed something upstream in one of the dependencies, the package or the software that they're trying to publish doesn't immediately break because they need to update it immediately because the built-in dependency can stay the same. So in terms of Bioconductor, there's the Bioconductor Studio Docker container mostly worked on by Natesh from the Bioconductor core team. It is built on the Rocker RStudio Docker image and notably on top of the Rocker RStudio, it has all the system dependencies that is needed for all Bioconductor packages. So whenever the package comes into Bioconductor, it is checked and built on this Docker image as well, and it does any system-level dependency that needs to be added. It is added to kind of have the contract that any Bioconductor package can be readily installed and run on the Docker image without having to deal with any C or Linux or system dependencies. And the Docker container is perfect for a single node or a local on your laptop deployment. So if you just want a very easy, simple RStudio image that you can run any Bioconductor package in, it's a perfect solution for one command deployment that just gives you a running RStudio instance. Now for a little bit more complicated setups, if you want a multi-node cluster, you want to scale up a little bit. Docker becomes a little bit less of a solution because you need orchestration between the nodes, you need orchestration between different microservices if you're running not just RStudio but also talking to, for example, Redis or any other service and application. So Kubernetes is kind of an emerging technology that has won what was called the container wars or the orchestration wars. There are still other solutions. I think Mesos is one of them, OpenShift, but Kubernetes has kind of become the de facto container orchestration technology. It is fully open source. It started as an internal Google product, but it got open source and has been taken out by the Linux foundation and specifically the cloud native computing foundation under the Linux foundation. And what Kubernetes does is orchestrate a bunch of containers, Docker containers, or it's a container runtime engine essentially, so you can run hundreds of small containers running on different nodes that talk to each other through Kubernetes. So a little bit less technical, so basically what Kubernetes does is it abstracts virtual machines as a cluster. So in the traditional cloud computing, you would have a virtual machine that is what you call a pet. If it goes down, all your services go down. It's a lot of maintenance and Kubernetes kind of takes that to the cattle. So instead of treating each node, each virtual machine as a separate entity, you talk to the cluster as a whole. If one node goes down because of hardware failure or whatever, Kubernetes will reschedule those cards on one of the healthy nodes on that cluster. So the language that is used then is like that your virtual machines are not pets, the cattle and that they're more disposable instead of having to take care of them individually as long as your cluster is healthy individual nodes can be unhealthy. And beyond that it gives a standardized layer so across any cloud provider and I've named here like the four big ones that I know, so AWS, Azure, GCE and OpenStack which is an open source cloud provider. Kubernetes can give an abstraction layer that you can develop things to run on Kubernetes. And once you have the stack running on Kubernetes, you can move it between all the clouds and without with very minimal changes. So it kind of pushes that idea of develop once have a working stack, deploy it wherever you want, move it between all of the clouds. So a little bit more concretely, so we have a just into allocation, which uses the OpenStack interface here you can see there's the three nodes that are three individual nodes. So instead of talking to them as individual nodes, putting them in a Kubernetes cluster you see this is the entire cluster together so you see here there's 64 cores, that is the total of the cores of the three machines together. The 241 gigs of memory is the total of all the machines together. So when you want to deploy a pod or a container in the cluster, you just tell Kubernetes, deploy this pod, the Kubernetes scheduler looks at the nodes, it deploys it to the best one that has the capacity that you asked for it. So you don't have to look at the nodes individually anymore you can just pop to the cluster and let the scheduler. And of course you can assign to a specific node if you want to translate very specifically to have nodes for specific things you can give it affinities or restrictions to actually target a node, but in general you don't have to. So then talking a little bit about Helm. Helm comes as like a package manager for Kubernetes applications. So if you are familiar with the Python world, it's kind of similar to pip. And that does once you package an application you can put it out there and then you can say Helm install this application and it has all the dependencies that it needs all the resources and how they talk to each other just packaged in one single command. For some examples, if you want a MongoDB instance, Bitnami has a MongoDB chart, say Helm install Bitnami, MongoDB, and you have a MongoDB instance, you have the service, inglass, storage, everything packaged within that same for Jupyter Hub. For example, if you want to set up a network file system, the Kubernetes group that has an NFS server provision and the chart that I'm going to talk about the Bioconductor Home Chart for our studio as well. So now going specifically to Bioconductor Home Chart. So a little bit of history on how this chart came to be. Originally was created as the studio home chart. So our studio themselves don't actually have, they do have home charts, but only for the paid products. They don't have a public home chart for the community addition. So was this originally developed as part of the Genomics Virtual Lab and Cloudman projects within the Galaxy group. And when the public GVL instance got deprecated, it kind of got adopted by Bioconductor. And now it's mostly maintained by me and Nuan Gunasekara from Galaxy Australia. We mostly maintain it for now Bioconductor, but also internal things that are still using it within institutions that we work with. So theoretically this home chart can be deployed on any Kubernetes cluster. If you go to the source code on GitHub, you will see instructions and examples to deploy on a local Minicube cluster. Minicube is a way to get a Kubernetes cluster on your local computer. You can deploy it on Azure AKS, which is the managed Kubernetes service, Google Kubernetes engine, or the elastic Kubernetes service on AWS. All of these have been tested and it's essentially a single command deployment. Talking a little bit about what the chart includes. So the main component of the chart is the RStudio deployment resource. So that is where the Bioconductor Docker image is. That's the pod that is actually the RStudio pod. Then there's a config map, which is basically a text file that has all the configurations for RStudio that is attached to that deployment. There is the persistent volume claim. So in Kubernetes world, you have a claim for how much storage you want. And you ask a storage class, which is generally provided by the Kubernetes engine to just fulfill that claim for you. You say, I want a 50 gig volume to be attached at this path and the storage class within that Kubernetes engine just fulfill that for you. So that's what the persistent volume claim would do. And then there's a service, which is the middle layer between the pod itself and the RStudio deployment and the Kubernetes network. So instead of referring to the pod directly and to the port directly, the service abstract that so there's in the Kubernetes internal network. There's just the local DNS that is RStudio service and that itself points at the pod. This is specifically relevant, for example, if you have an application that can run, that can scale. So if you want to load balance between multiple pods. The service would be the single entry point and then it would point to multiple pods so it will balance the traffic. There's the ingress which exposes the service. So this can be an engine X or an Apache, whatever ingress controller you want to use. This exposes the service usually to the outside world. And then finally the service account, which deals with authentication and identity management within Kubernetes. So this is kind of all the resources, the Kubernetes manifests that are part of the chart. So when you do home install, bioconductor, helm, these are the resources that are actually being deployed within your Kubernetes cluster. So as a very quick example on what this and you can see for all the clouds in the GitHub repo, but this is the example for AKS on how easy it is to deploy it. You just launch the cluster, just hazy AKS create, or if you already have a cluster, you just use the get credentials command to point the configuration to that. And then you just say home install, bioconductor helm. There is an example for this in the repository that has example working values for each cloud. And just to actually show what that is, it's not complicated at all. It's just the few peculiarities for each Kubernetes engine. So for example, the biggest thing for AKS is that they call the storage class default, as opposed to Amazon calls the storage class GB2. I believe. So these are the small things that you have to change between the things, but between the different clouds. But once you point to the storage class that comes with the Kubernetes engine, it just fulfills the persistent volume plane for you. The service type load balancer makes you not have to do an ingress yourself. So that's why you can disable the ingress. And once, when you say load balancer, AKS or any of the other clouds will give you an IP address automatically. Put 80 just to so that you don't actually have to put a port. That's the default HTTP port. And then you can add the environment variable to a studio password to set a password for your server. Fox, you're running a little long. Okay. Yep. You can do a lot more parameterization. Technically, this is compatible with any locker studio Docker image. You can change the tag for the bio conductor doctors, but you can also use like the machine learning locker studio image. For example, um, yeah, you can change what storage you want. You can use the network file system as a files, Amazon EFS file store on Google. Um, we added a feature for persisting the libraries so between sessions you can keep a volume with your libraries, tear it all down and then bring it back up and keep what you have installed in your environment. Um, going forward, there's a few ideas of the things that we want to add. Um, specifically, if you went to Natasha's workshop, we're going to try to add a lot of the lettuce and parallel computing dependencies to this chart so that it can be a full stack together. Um, some more ideas for later that take a little bit longer. We're not going to go through. Um, and yeah, if anybody has any ideas or particular features of interest or use cases that they want to do. Feel free to contact me on email slack or just in GitHub issues and propose whatever you think would be useful. Um, this is relatively new and looking for a little users and trying to be useful for specific use cases. That is it. Okay. So there's a question for you in the chat. Is there an R package to help deploy Kubernetes or is there a plan to enable bio C parallel to work with Kubernetes? Yes. So part of this one of the idea is to make you plan, which would be a bio C parallel package to talk to the Kubernetes cluster. A prerequisite for that is to make an actual R Kubernetes client to be able to talk to the Kubernetes API more generally from within our and that's a plan for the project that I picked up. That's a cool plan. Okay. Um, I don't see any of it. Does anyone have a question in the room? Yeah. Yeah, so a volume is parameterizable. So here you, um, you can use a network file system if you want. For example, I usually use the Kubernetes six chart to have a server when it's multi node, you can use a local host volume. If you're running a single node, you just tell it, just use the node of the volume. If you're running in any of the clouds, for example, Azure, um, here when I said storage class default, that's the default storage class within the Azure service that will create a volume for you. So once the claim comes in, it tells the storage class, I want a 10 gig disk or 100 gig disk, it will automatically attach it to the node that is running the pod and automatically not for you. And for example, if that note goes down and Kubernetes moved the pod from that note to another one on the cluster, it will automatically also unmount that volume from the note that went down. It will automatically unmount the amount on the new node and really attach it. So it's pretty automatic. The Kubernetes schedule does a lot of the work for you. All you have to do is know what storage class you have within. And it's all the nodes have access to the same order or the same storage. If you use, if you use a network file system, whether it's one that you deploy your Azure files, files on Google or EFS on AWS, those are the management process that they provide. In that case, all the nodes have access to it. In the lead like one scenario, if you use the default storage class, it just attaches to the one node that's running pod. And it does automatically attach it to the nodes if the mind is moved. But in that case, it's only that node. So you do need a network file system or a separate path so that if you want to. I'll get you water. We're going to run into a real time crunch. Yes, we are. Thank you, Alex. What's everybody going?