 Okay, welcome to Galaxy Installation with Ansible. Now we'll actually get to the fun part of today where we actually get to set up and install a Galaxy server. We'll start by going through how all the playbooks work, and then we'll move on to the actual installation portion where we set up Galaxy on a fresh Ubuntu server. So please note that all of the roles are regularly used on both Sentos and Ubuntu machines. So if you have different versions of these, it should in theory work. If it doesn't, it's above, please let us know. So I'm going to start a little bit by talking about how the playbooks work, how they're structured, that sort of thing. We'll be using the official Ansible Galaxy role to do this. This role can be found in a piece of software called Ansible Galaxy, which is actually separate as GalaxyProject.Galaxy. The official role is very configurable. Anything you might want to change in a Galaxy server, the official role can usually do it. It's widely used in a lot of different scenarios. This forms the basis of a lot of the Docker container set up, a lot of the individual private servers, and all public use Galaxy.star servers are deployed through Ansible. If you're looking for a rock solid production ready way to go, this is it. So let's talk a little bit about the Ansible role. You've learned a little bit about Ansible roles in the first part of today and how they look, how they're structured. The Galaxy role is just like that, except the more complicated. So of course we have the entry point of tasks. And in the tasks, the very first thing we'll access is the main diom. You don't need to be doing any of this. I'm just going to go through all of the different parts of this and how it works. There are four important variables that you'll need to keep in mind. The Galaxy root. This is where Galaxy will be set up. The Galaxy commit ID, what version of Galaxy we want to deploy, and the Galaxy config variable. This controls all of the configuration options that are set in your Galaxy.yaml file. So let's go back to the tasks quickly. As with every role, there is an entry point tasks main, and this will include a few key steps, cloning Galaxy, managing configuration, fetching dependencies, managing a mutable setup, and finally managing the database. So we can see all of that reflected in the playbook. There's a little bit for privilege separation, setting up the directories and the user, managing some paths, cloning Galaxy, managing existing galaxies, static setup dependencies, mutable setup, managed database, building the client, and error docs, which is not used so often. So the very first task is cloning Galaxy. Cloning task is the one we'll be using today. There are multiple ways to set up Galaxy, but we're interested in cloning. Galaxy Git will be run, this Git module in Antsville, and it will fetch the whatever commit ID we want to specify from the Galaxy repo. By default, this is set to the official Galaxy project slash Galaxy repository. It will report if the version changes. So when you update your Galaxy, the version might change and things might need to happen as a result. It will set the virtual environment and also remove any compiled files. So when you update Galaxy, Python compiles the Python code into Python bytecode to make it a little bit faster to load next time. There's a task that just removes those. So any old code that's left around gets cleaned out and you can be sure that every time this playbook runs, you're getting exactly the version of Antsville or exactly the version of Galaxy that you want. The next task is managing configuration. This just sets up all of the static files. So we copy in any additional Galaxy config files. We install more config files. We install any local tools. We'll talk about that on the third day how to deploy some tools and a bit on the second day as well. Configure dynamic job rules. Those will be on the third day and copy out the Galaxy configuration file. This is a very important file that we'll talk about a lot. When you're deploying your Galaxy with Antsville, you sometimes have different additional files you want to configure or different additional Galaxy configuration files like the data types configuration or the genome builds or the email block list. All of these can be set differently on each Galaxy and there's this stands a Galaxy config files which lets you deploy arbitrary files from your Antsville setup onto the server somewhere that Galaxy can access them. Next up is the dependencies. So whenever Galaxy is finally on disk, we've got the git role has cloned the code, the static roles have set up all of the configuration files, then we're ready to load the dependencies. This will install all of the base Galaxy dependencies. So if Galaxy hosts its mirror all of the dependencies to make it fast, it'll collect any additional conditional dependencies so if you configure Galaxy such that you are using like LDAP for authentication this requires an additional Python module. If you're managing Galaxy by hand this won't be installed, but if you are using it through Antsville Galaxy knows to look for this knows to look for all of the different configuration options that can have additional dependencies and then install those as well. One of the nice things about using the Antsville rules is that all of this knowledge and experiences administrators gets encoded into the Galaxy playbooks, and then you get to use that for free. Okay, next, mutable setup. So Galaxy initializes this directory for mutable data and mutable configuration files. Galaxy when it's running manages some of the configuration files by itself. Usually you as an administrator will not need to touch these files, however, Galaxy does need access to them and to update them. Some examples of this are the tools that are installed from the Galaxy tool shed. But these to be available and Galaxy will need to write to them. Okay, once the mutable setup is done, Galaxy is almost complete. Next up, we'll be managing the database. So Galaxy has database management tasks, where every time you run the playbook Galaxy checks to make sure that the current version of the Galaxy database is the same as the maximum version. And if Galaxy's current version of the database is not the maximum possible, then Galaxy the Antsville rules will run all of the migrations that are necessary to get you updated to the latest version of Galaxy. This is another common problem that administrators experience when they're not managing Galaxy through Antsville. This was something that I personally forgot a lot when I was managing useGalaxy.io was to do the actual migrations every time and check for them. Galaxy would start up and it would crash. And when we switched to Antsville, it was so, life was so much better because we didn't have to remember this manually anymore. It was just automatic. All of these things were taken care of for us. So, handlers, like every other Antsville role there are handlers. One of the important handlers here will restart Galaxy. So whenever we make configuration changes, whenever we change how Galaxy behaves, Galaxy knows or rather the Antsville role knows to restart Galaxy when it's done. As with other roles, a number of default values are provided, things like should it manage users and paths and by default the role is very conservative. We'll set a lot of variables to override this and make sure it works how we want to. By default it holds from the official Galaxy project repository, that sort of thing. If you have a fork of Galaxy you want to run, you can do that as well here. So, a quick summary. Galaxy is cloned or updated if it was already cloned through Git. A virtual environment is created if it doesn't exist. The dependency or the configuration files are installed. All of the Galaxy.yml, the jobs configuration, the dependencies, anything else like that. Next, any missing dependencies are installed. And lastly, the database is updated. And when all of that's done, if Galaxy needs to be restarted, it is. And next we'll start with the actual fun part, installing Galaxy. Okay, let's get started installing Galaxy, the actual fun part about today. There are a couple of requirements you'll need before we get started. Some of these we've taken care of for you, some of these you'll have to take care of yourself when you go through the tutorial. So the first one is that you have Antsville installed, of course, you need Antsville to run the playbooks. We've done that for all of the virtual machines you'll be using today so you don't have to worry about that. The Transville version is a recent Antsville version and it will change this a lot over time and it needs to be regularly updated. We've taken care of this as well. Number three, that you'll have an inventory file. We covered this a little bit in the Antsville tutorial, the original Antsville tutorial. You'll need to set up an inventory file with a group of hosts called Galaxy servers. There's a step for this in the tutorial we'll get to it. Number four, that your virtual machine has a public DNS entry. We'll do SSL certificates during this tutorial and having a proper working DNS entry is part of this. We've taken care of that for you as well. But if you're setting up your own server, this may be different for you how that happens. The VM has Python 3 installed. So all of the Antsville packages we'll be using all of the Antsville code will run. Has been updated for Python 3. Galaxy runs Python 3. There's no more Python 2. Lastly, that in your inventory file, you use the full host name that has been provided and not just local host as you might have done in the Antsville tutorial. This is very important because we'll be using that full name that we've written there in a couple of different places in our playbook. And lastly, again, if you're using Ubuntu or Debian CentOS and REL, none of this should matter. It should work identically for all of the different scenarios. So let's get started. First thing we need to do is prepare all of our requirements. We've collected all of the different roles we'll want to use today in a text box that you can copy. And the steps say create a new directory, Galaxy in your home folder. So I'm going to create a new directory, Galaxy, and change into that. It's an empty directory. And now I'm going to set up the requirements.yaml file. Requirements.yaml. And here I will paste in all of our different requirements. I've just copied these directly out of the training materials here. And I'll save that. So you have the requirements file, and it has all of that content. You might be wondering what each of these different roles do. These are the different roles that will be used throughout the tutorial. So the first role sets up the Galaxy project, Galaxy server. The second role we'll be using installs nginx, the web server. The third, Postgres, the database engine that we'll be using today. Additionally, we have a role nakefood.postgresql objects from one of the Galaxy developers, Nate Coraura. We'll have the pip role from gearling guy to set a pythons pip command to install some dependencies when we need to do that later. The mini condor role will set up the conda dependency system, which Galaxy uses to manage tools optionally. Additionally, we have Galaxy systemd. This will help us set up systemd roles in order to manage Galaxy. And lastly, certbot. This will request SSL certificate certbot, perhaps the certbot command from Let's Encrypt. So with that, we have our requirements.yaml file that we've written. We're going to use this ansible Galaxy install command to install everything from the requirements yaml into the roles directory. As you can see, all of these roles are getting downloaded from GitHub and installed on to our roles directory. So you'll see them here in roles. Excellent. Everything's working so far. Next up, we'll need to set up an ansible.cfg. This is an ansible configuration file. We'll just create that in the same directory as our requirements.yaml. This tells ansible that we want to use Python 3. As the interpreter, ansible can still use Python 2 or Python 3. That we have an inventory file, which will be called host, and that we don't want to enable retry files. These can be a bit annoying and not very useful for our case. So next up, we'll need to create the host file. Your host name is definitely different than this one. We're going to write this into the host file. Here, next to our ansible.cfg, the requirements.yaml, we're going to create the host file. And inside there, we're going to paste some content. But this name isn't correct. We can run the host name command, host name minus F, to find out what our host name is. And when we're ready, we can paste that back into the host file. When you're done, your host file should have a group, galaxy server. In your host, this defines the group name. Our group name is Galaxy Servers for all of these servers. We're going to say our host name, it needs to be your own one. It cannot be one that's different from your machine. That we have ansible connection equals local. So ansible has a couple of different ways of connecting to remote machines. You can connect through SSH or it can connect directly and just run the commands on the same machine. We're going to use local here. We could use SSH equally as well. However, it adds some extra overhead that we don't need. We can just tell ansible, hey, we're running the commands on this machine just to run them directly. And lastly, ansible user. Ansible has changed over time how it behaves with respect to this variable. And in some cases, it can implicitly determine which user it is running as. But we're going to set it explicitly just to be safe. So I'm going to save this file and just review what we've done. We've got our ansible.cfg. This will tell ansible how to behave. We have our hosts file with which lists our galaxy servers host, the host name and the ansible connection. If you run the command host name minus F, these two should match. And if they don't, you have an issue and you should correct the host file to be whatever post name minus F says. We've got a requirements file and all of our roles. So let's go back to the training quickly. Okay, let's talk now about the galactic database. So galaxy uses a database for storage and a lot of the different data that's stored within galaxy objects like users, histories, information about data sets, workflows, all of this information is stored in the database. However, there is some data that isn't stored in the database. User data sets and any reference data. All of these are stored outside the galaxy database. When you do it back, when you do backups, you'll want to backup both your galaxy database itself and also the separate pools of user data and any other reference data that you might have installed. So by default, galaxy uses a library called SQL alchemy for talking to different day of different databases. It can interface with a lot of different databases, namely SQLite, MySQL, Postgres. And by default, galaxy uses SQLite. But for production, we don't want that. For production, we want Postgres. There are three different options. MySQL is supported, but we really strongly recommend against it. It's supported by SQL alchemy. It may not be supported by galaxy. So use Postgres whenever possible. Regarding database sizing, galaxy never deletes anything from the database. Whenever items are deleted, users histories, etc. They just get marked as deleted within the database, but the row does not get removed. As a result, the database will grow forever. You should allocate around 20 gigabytes of disk space to start with, but if expanding is difficult, then start with 15. It doesn't require so much memory. The resource usage of the database is not so intensive. But we also recommend running it on a separate machine just for resource isolation. So if your galaxy server goes down, it doesn't corrupt any of the data in the database or anything like this. So the configuration looks like this. There are a couple of different formats for how you connect to the database, the database URL. There is a different format if you're connecting to a local Postgres or Postgres on a different machine. We'll talk about this a little bit more in the training. So whenever you have a brand new database on first startup, galaxy will create a schema within that database. But when you're upgrading galaxy on the other hand, the changes are expressed as migrations. So whenever you upgrade from galaxy 2009 to 2105, something like this. Galaxy has some migrations that will be run that will upgrade the database to make any changes to the schema that are required. You can upgrade as well as downgrade. There are commands for this, but most of this in in the general case is handled for you by the admiral's and you won't have to worry about this. Tuning is an important consideration. So when you have a production galaxy, you probably want to set a couple of these options. We've shown some default values as well as values set by usegalaxy.org and other usegalaxy.star servers. So databases by default have a limited number of connection slots, a limited number of clients that connect at once. Galaxy can connect a lot of times for each different query that it needs to run. One of the options for improving the performance here is have a pool of open connections so you doesn't have to renegotiate the connection every time. Galaxy can take one of the active connections from the pool, run the query, return and return the connection to the pool. Additionally, there's an option for server side cursors. These are useful for when your galaxy gets quite large, when there are large queries that it needs to iterate through. Server side cursors can significantly help performance there. So when you have slow queries or slow routes in Galaxy, one of the ways you can begin to begin to debug this is by setting the slow query log threshold. This is used to print out the query if that was run, if the query takes too long. So when you're debugging Galaxy, when you're working with the developers, having this information on hand can be helpful to figure out what's going on there. Additionally, another optimization that can be made is track tools to install tools in a separate database. This is a an optimization that can enable a couple of nice scenarios if you are doing things like bootstrapping a fresh galaxy with pre built tools pre build instance. This can let you reshare all of the tools that you have installed amongst multiple galaxies something like that. You can also do some other nice things around that use galaxy.org uses this. I'm not aware of other galaxies that use this though. The database can be accessed through Galaxy through Python code using all the models that are available to galaxies to Galaxy has all these different models that describe how the table looks what columns are there how to do operations like resetting password that sort of thing. And when you want to talk to the database one of the ways you can do that is accessing it through the Galaxy model. This isn't something you'll have to do very often but it's good to know about. There are many other useful database queries you can run. We'll talk about this at a later point. Thank you. Next we'll be setting up PostgreSQL. Again Galaxy supports a wide variety of different databases, but it's best tested in production under PostgreSQL. We recommend using this. PostgreSQL maintains its own users and authentication system and we'll come to that in a little bit. So we're going to set up some group variables. You might have remembered this from the Ansible tutorial that we can sign variables in Ansible to an entire group of machines or just to an individual machine. This is a group named Galaxy servers. So we're going to set up some group variables for that. We're going to create a directory called group bars. And within the group bars directory, we're going to create the file Galaxy servers.yml. And this string here Galaxy servers needs to match exactly with this here. Otherwise Ansible won't be able to connect the variables from the group variables file with the correct group. So we're going to paste in all of the content from the training materials. This will do a couple of different things. We'll set up the pip and Python virtual environments just to say we want to be Python three Python three Python three everywhere. The Ansible ecosystem has yet migrated to being Python three is default. So sometimes we have to explicitly inform it that everything we're doing is Python three compatible. Finally, we'll set up some PostgreSQL object users and databases. So we're going to set up a Galaxy user, and then we're going to create a Galaxy database owned by the Galaxy user that we just created. The NaFu PostgreSQL objects role will be responsible for those variables and executing them. And next we actually need to create our Galaxy playbook. This will be the playbook that will do everything for today. So we're going to create a galaxy.yml file. It's going to install Python three PostgreSQL sorry Psycho PG2. Psycho PG2 is a Python library which talks to the PostgreSQL database. We'll have the role that we want to run the PostgreSQL role to install PostgreSQL. And then we'll have the NaFu PostgreSQL objects role, which will take care of setting up the users and the groups and databases. So we're going to go from the file galaxy.yml. This should be at the same level as your host file, your ansible CFG, your requirements. And I'm going to paste in all of that content again. So we'll see hosts, Galaxy servers. This will apply to any host listed under the group, Galaxy servers and your host file. We're going to become true. We're going to become rude, the admin user. And then we're going to do some pre tasks before we get started on all of the actual roles and for that pre task, we're going to install Python three Psycho PG2. Then we move on to the meat of the playbook will run this role, Galaxy project PostgreSQL that will install the database. And we'll run this one to set up the actual users and groups and databases. Note that here we become true and we become the PostgreSQL user. By default in PostgreSQL. The only user that can connect to the database from the start is the actual PostgreSQL user itself. So that's why we'll become that user in order to talk to PostgreSQL. So at this point, we're almost done, almost ready to run it. Let's just look at what all we've done. Your directories should look approximately like approximately like this as well. You should have an Ansible, that's CFG, a galaxy.yaml. You should have some group variables file folder with galaxy servers.yaml. You should have host file requirements file at all of these roles downloaded to say 10 directories by files. And with that, you're almost ready to go. I'm going to check one last thing. I've got my hosts file, which has this galaxy servers. I'm going to check that it looks the same in my playbook and in my group files folder. So here we see galaxy servers, galaxy servers, galaxy servers. These all need to match if things are to work correctly. And if that all looks good for you, then you're ready to run the playbook. So let's copy that Ansible playbook galaxy.yaml. I'm going to take a second to run as we add more tasks, it'll get slower. But this will do everything we need to do to set up a galaxy server by the end of the day we'll have this one playbook file that has absolutely everything we need. Okay, that looks good. We have okay for 1717 tasks were fine and seven things changed. Yes, we should have a postgres database running. We can check that with system CTL status postgres. And we'll see okay active green perfect. We can't connect to it currently because we don't have permissions, but if we run this command as a postgres user. We can see, oh look, here are all of the databases. So I've run psql minus L. And what this does is it lists all of the databases that are currently known to postgres and here you can see the galaxy database that we created owned by the galaxy user. So perfect, everything looks absolutely good right now. If you have any of these errors, be sure to check the documentation for what possibly went wrong there. And we've run this command and it looks good we get our galaxy galaxy. We can connect the database itself. And here we can see a couple of different things so there are no tables. There are some roles. And don't use these commands very often because you don't usually need them you set them in your answer will playbook in your group variables and you forget about them because you know it's going to execute the right thing you have a lot of faith in these. So you can do slash q to exit and you'll be back as the open to user. Okay, awesome. Fantastic. If you look in the postgres directory, the configuration directory for postgres with this command. You'll see a couple of different configuration files. There is the control file the HBA. HBA stands for host based access I believe this controls who can authenticate to postgres and how. To set up a network to postgres. This will be important, but for us it's not because we're connecting to the postgres on the same server. And there's also some other configuration. And you'll see this strange looking file with this extension, we already have a PG HBA, but there's the second one. So what happens is in Ansible there's the ability to set a property called backup. And when you back up the configuration files it creates this duplicate file that has all of the previous contents. And every time a change is made it backs up the file before making the changes so if you need to revert to a previous version of the configuration you can do that. You can see if someone's run the playbook you can see what's changed through comparing these files. So with that we're ready to start on Galaxy. Okay great let's get Galaxy installed now so now we've got all of our database set up all of the postgres setup with the user and everything we need to do with postgres. Let's continue with getting Galaxy running and configuring Galaxy how it works, everything. So be using something called usg mules today to set up Galaxy there are a couple of different options for doing it mules are not the only option. However they are the one that's easiest to get started with and unless you have very strange production needs, then they're probably the one you should be using. So best practice admins don't run Galaxy with root access they run it as a separate user, and we'll also configure that right now. So all of the best practices that we do all security best practices are available through this playbook. So we need to open our galaxy.yml configuration file and we're going to make a whole bunch of changes. So first up, we're going to add some new roles, we need to install some more dependencies, not just the psychopg dependency. So I'm going to open up my galaxy.yml. And I'm going to replace this line with this line. And instead of just psychopg to roll, but also install ACL, which is used for access control and as need for needed for some of the playbooks to run successfully. So these are two compression algorithm get to clone galaxy make to compile some things tar to extract some archives and virtual environment in order to set up the virtual environments that galaxy will need. And we're going to add some new roles to, we're going to add the pip role to set up pip. We're going to add the galaxy role itself the one that does all the fun stuff, and then we're going to have many condo role, which will set up. We should also become user galaxy for this. So role you cheat a mini condo become true become user galaxy. Okay. So that's the first part we've set up some dependencies that will need and we've set up the roles, but we haven't configured these roles right now so they're running with all of their defaults. And as we mentioned earlier, the galaxy project galaxy roles very conservative and what it will and will get will not do by default. So instead, set up some variables, specifically galaxy create user. This is going to tell galaxy we want you to create the user and manage it. If we change the username if we change the shell anything like this galaxy ansible role take care of it. Then we want to separate privileges. This is very important role, or important variable. This tells the ansible galaxy role that it should have a different user managing the code base as running it. So if the user that's running galaxy somehow there's some remote code execution or something that prevents your galaxy configuration from being affected. So it makes it a little bit harder for any attacker to gain privileges. So we'll do this by default. It's a very good idea. Galaxy manage paths, we want to set this to true. This tells galaxy, the ansible galaxy role that it should set up all the directories we need set permissions, everything like that. We'll set our galaxy layout to root there. This will create a single root directory. In this case serve slash galaxy and inside that root directory will be all of the galaxy files that we need all of the configuration the code, the mutable configuration, etc. And we'll control the location of that with this galaxy root variable. Here we'll set this to serve galaxy and everything will be under that folder which makes it really easy to find whenever you want to see what's going on. We're going to find a galaxy user here it'll be named galaxy and it'll have bash as the shell. We'll be setting up the latest version of galaxy will set our commit ID to release 20.09. And that means that the variable is called commit ID but we're pointing it to a branch. And what that means is that when we run the playbook again, if there are any changes to that branch if there are any back ports or bug fixes that are made to the 20.09 release branch. Just by running the playbook again, we'll get those installed for us for free. And now we can use the galaxy config style. We want YAML so galaxy previously used an I and I format, and now we can use a much nicer YAML format. Next we'll do a force checkout true. This is an option I like to use in the past because coworkers can sometimes if you're working on this galaxy server with multiple people sometimes people will make a change. They'll put it down what they've done something like that or they'll make an unexpected change that you didn't want. And so by setting galaxy force checkout true. If any changes have been made to the code, they'll get reverted everything will be back to exactly what you say in your playbook exactly the branch, any of the changes there, and nothing else. So this is a very nice option for that purpose just to make sure that no one affects how your galaxy is running if you have coworkers who are working with you on this. There's no any condo prefix. So galaxy has a tool dependency directory so galaxy manages its own dependencies for all the tools for us. And all those dependencies are in the galaxy tool dependency dur. We're going to set up condo within that directory. Normally, galaxy will install condo for you, and you don't have to worry about it. In the past versions of running this tutorial we've discovered that because we set up galaxy with multiple processes. Sometimes those processes can conflict, and one of them will start installing condo and another will to and they'll conflict and it'll be bad. So what we do is we explicitly install condo to solve that issue completely. We're going to install a very recent condo 4.7.12. And we're going to tell it that it doesn't need to manage its own dependencies we've already got that covered. So we updated this tutorial very recently. It should work with latest version of galaxy. If you read this tutorial or want to do this at home at a later point, there may have been a new galaxy released by them. You can do it with that newer version of galaxy. It should work as well. So we're going to make some changes to group ours galaxy servers. I'm going to copy and paste all of these variables in. So I'm going to edit my group variables galaxy servers. And then I'm going to paste that in and we have to remove all of these little pluses from the start. And so it should look like this. We've got all of our previous Python three and postgres commands. And now we have galaxy configured in here. We create a separate user for galaxy we separate privileges, we have it manage all of this data. We set up our galaxy user in which branch we want to run all of this looks good. I think we're about ready to go. So in the group variables file we also need to set up the galaxy configuration. So all of the configuration that is in galaxy.yaml or in the older galaxy.ini. All of this is stored under this galaxy config key. And then here we have galaxy and then some variables we can set. So now we're going to set up an admin user for the server. We're going to configure the brand. We're going to set the database connection. This is where we want to store data. Check migrate tools. This is just for you for older installations galaxy. And we're also going to set the tool data path. So let's do that now. Copy this in. Down here at the bottom, I'm going to set up all of my galaxy config. And you can set it to whatever you like. If you'd like to set it name it after yourself or so. We're going to set up a list of admin users. So these will be whenever this account is registered with galaxy, they'll immediately be recognized as an administrator. We're just going to use admin at example.org. For our database connection. We're going to connect to postgres field database. And this connection string looks a little bit different than networked postgres. So we're going to connect to bar run postgres ql, which is the postgres socket. When postgres runs it has a socket open on your current machine. And that is how the psql command works it talks to that socket in order to talk to postgres. It will be connecting to the database named galaxy. We're going to set our file path to slash data so all of the files that are created by galaxy users will appear in this slash data directory. There are some options for that on another day. But the important thing to note is when you're backing up galaxy, this is a very important data that this is a very important directory to backup. Additionally, the check migrate tools and the tool data path. We just need to set these to have everything work like we want. So there's some note about data storage. You currently only can set one pool of data. This will hopefully be different in the future. For example, the postgres connection string like we discussed, if you want to connect to a network, something on a different host or a different port. This is how you write the string but for us today we're connecting on logo close just to the socket. Oh yes, variable templating. We haven't seen that before. So in our group variables galaxy servers. We see a couple of these braces. With a variable name inside, and we have galaxy tool dependency directory down here at the bottom if you can see it galaxy mutable data directory. And we haven't actually defined these you'll know if you look for galaxy mutable data directory you won't find it here. So what's happening there is we're setting a variable to this template string. And whenever ansible gets to the point where it means this value, it'll calculate these and I'll say does this variable exist has it been created since then. And the galaxy role itself will create a lot of these variables for us. It defines all of these data directory is this tool dependency directory so we can just use that variable as if it exists, knowing that the galaxy role will construct that for us. It'll take the galaxy root directory and it'll construct all of the different path pieces up to the tool dependency directory the mutable data directory etc. It's a very nice feature of ansible that makes it very easy to use. And we also need to set a bit of USG configuration. So this configuration again goes into group bars galaxy servers down here in your galaxy config. I'm going to paste that in. I'm going to delete all the pluses very important. And you'll notice the whiskey key is also under the galaxy config so under galaxy config there are two keys there is the galaxy and the USG. And these, this will control galaxy while this will control how you whiskey works. We're going to start by having listen on any interface that can find on port 8080. So we'll be able to access that port and access galaxy. We're going to start one process with four threads. We need to do some static mapping this will map this route to this folder. This is something we need to do with the start with us keep because we don't have because it is our web server. It's serving with the HTTP protocol it's serving all of galaxy and all the files that are needed. We also want to set the virtual environment so when you whiskey starts it will load up all of the Python dependencies that we need. We need to specify the Python path, so it knows where to load the galaxy code base. And down here at the bottom. We have mules and farm. We'll talk about that a little bit but the key point is, you can set up different worker friends. So these will be different process and will handle tasks that galaxy needs to handle. And we have also declared something called a farm, and we've declared a job handler farm. So if we send things to the job handler farm, it'll be randomly distributed amongst mule number one and mule number two. Save that. How many mules is a question start with to and add more. There's a lot of good information in some of these info boxes or if you're interested and curious and how all of this works. Give those look in the training materials. And with that we should be ready to go. So let's look at our group variables galaxy servers again just to be sure. So we've got galaxy, we've got all these variables to control the role and how galaxy gets set up. We've got our galaxy configuration. This has the configuration that'll be passed directly to galaxy and control things like how the interface looks or the administrative users or where the data stored and then we have the USG configuration that controls how the USG server will be handling galaxy what it'll be serving where it'll be serving it, and so on. With that, I think we're ready to run our playbook. And we can watch this go. Okay, it's starting up all of the galaxy tasks, setting facts, creating the directories, creating the additional directories that are needed. Now it's updating galaxy to the specified reference so it's cloning the galaxy repository and checking out the branch that we requested in our case 20.09. Some of these steps can be a bit slow the first time they'll get faster in the future so galaxy version change from nothing to some string. As we mentioned before, the role tracks, what version it was previously checked out and what version is checked out now. And by doing this it knows if it needs to upgrade the client or rebuild the interface, all these sort of different tasks that it has to do. The consistency step is a slow one it has to fetch all the different pipeline modules that galaxy needs. Once that's done, it'll be much faster the next time because to know hey all of these are installed, I don't have to do anything. And as you've noticed, I've just added on to the roles at the end of the playbook, we just added the galaxy role and the mini condo role. I didn't comment about any of the other roles we just left everything that we were doing before setting up the database and setting up any other dependencies that we needed. This is just a prefer personal preference of mine. I personally prefer to run the entire playbook every time. I worry a lot that if I don't run the entire playbook maybe I've missed some change, or maybe one of the changes and one of my roles effects one of the other roles, something like this. And by running the entire playbook every time I can be fairly certain that everything is exactly how I specified it. And this just comes from me having come from the puppet world, rather than the Ansible or Chef world, and having the secret preference but you're free to do, of course, however you want to. I know a lot of our galaxy contributors add tasks, you can annotate, I don't call tasks limits there, you can annotate, you can give each task a label, and then select just some of those to run. Of course it'll run just those tasks and it'll run much faster. But I'm human right I'm favorable I don't ever trust that I won't forget that oh yeah this one change might affect some other things so I just run the entire playbook every time. Okay it's checking out the database versions. It's installing node JS so galaxies client is built out of JavaScript, and it needs to be compiled every time the client changes. And this task is quite slow, but it's only run if the client if galaxy changes. So you can set galaxy to a specific release, and it won't change very often or you can spend it to a single commit, single commit ID. And then this will never need to be rebuilt, you've already built it once. But you lose out on the automatic updates. No disease to build the client, which is now based on review JS. It's listed in developer side of galaxy. Not so much of this is covered during this admin training but there will be other events specifically for that purpose, just come discuss with us. Normally this is nice time to chat with students and figure out if anything's going wrong or if everyone's doing things successfully but teaching on video is very different. So you can look through the playbook or the tutorial a little bit to see what's been written here. So free knowledge is one of the key points of these playbooks. A significant portion of the community uses these playbooks, and by using them you get all of our configuration knowledge just for free, which is incredible. All of these years of person hours going into managing galaxy and to figuring out what the best practices are. All of these, all the knowledge from use galaxy.org.u.org.au, all of these big galaxy servers that serve tens of thousands of users. All of our knowledge has been encoded in the playbook whenever we say oh we need to also be doing this task, or it would be useful if we did this, just so we don't forget. We encode that in the playbook and then everyone gets it for free. There's been common on slow deployment deployment. This is just due to JavaScript being compiled. In this case migrations can also be slow sometimes that's something you'll need to be aware of used to be that on the first galaxy deployment that we had to make every single migration from start. Fortunately that's been fixed and now the very first galaxy deployment when it's an empty database it says oh it's empty I'll just do the final final schema that I know I need. I don't need to calculate the entire schema. I can just skip to the end, but when you have an existing database you have to go migration by migration to make sure all the data is updated and moved around appropriately. And when this is done, all of our server code will be in serve galaxy slash server will host the galaxy code base config folder will hold all of our configuration like galaxy.yml. And we'll look at the permissions of this because they'll be different. Right. We've set up this privilege separation in order to ensure that the user running galaxy and the user's managing galaxy are separate and also virtual environment in the end. Okay, looks like it's continuing finally. Now we'll install mini condo and that succeeded. Okay, this looks good. Everything's okay or changed or skipped and changed 21 things 79 things for okay already they were perfectly fine didn't need to be changed. And then here we see a message about the restart are not implemented. So, galaxy says hey I think I'm supposed to restart, but you haven't told me how to tell me how I've managed and how I should be restarted. So it's going to not know how to do this we'll set this up later. And with that we're ready to start exploring our galaxy server so if we run the tree command again, we can see all of these directories have been created for us. So we'll look at the permissions on these directories. You'll see that a lot of them are owned by route, the configuration directory is owned by route, the local tools the server code so if anyone gets into our galaxy server they still can't actually change anything about the code base they can't make things worse. There's a variable data directory and a jobs directory. This is the serve galaxy server. This looks like our normal galaxy server code base we can see all of the managed TV run SSH things that we normally expect. And it's serve galaxy config, we'll see the configuration files that have been created for us right now. This is just the galaxy that young file. We'll see a lot of variables actually. First we see this nice note reminding us not to make any changes because they'll be overwritten. And then we see all of us key configuration down below is all of our galaxy configuration. Look, there are a ton of different variables that we didn't set right we only set admin users brand file path a couple of other things. All is taken care of setting all of these other variables that are useful or interesting for us to set to make sure the galaxy gets configured how we want it to. So for a lot of the directories where galaxy needs to make a cache or do some work. All of these are set to be in our galaxy var directory for all of our variable data. Additionally, things like temp paths are set the job working in directory is set to serve galaxy jobs the galaxy will write the job files there. See our tool data, all of these sorts of different things to galaxy to the ansible role takes care of all of this for us so we didn't have to worry about it. If you're configuring this galaxy by hand, you of course would have to remember that all of these things might be accessed or might be set and that we need to point them in the crack locations. So that's another one of those times where ansible does a lot of work for us. So we've looked at that. And now we're going to do this optional task launching us keep by hand. So I'm going to become the galaxy user. And now I'm the galaxy user. And I'm going to go and serve galaxy server. This is where all of the galaxy code is contained. This is a different environment. This will load all the different Python modules that galaxy needs all the dependencies and then I'm going to start galaxy with us. So, very simple us key listens us key does bunch of checks. And now the galaxy code starts up. Migrations are run, creating the new database from scratch, it didn't need to. And now, here galaxy server instance, it starts up and it's running. So now our galaxy should be listening on port 8080 up here in the whiskey configuration we saw we listened through the HTTP protocol to any device and port 8080. So I'm going to open my browser gap, zero one zero dot training, the galaxy project. And that if everything worked, you should see your galaxy server. Of course, again, this will be a different address for you than it is for me. This is the address of your server, which can be found in the host name command or in the spreadsheet. So we've set it up, we've got galaxy running by hand this is looking good. We see our tools we see the brand that we said customized however we wanted it. Everything looks good. This is the really. And this is obviously not optimal. We want galaxy to start automatically and to restart if it crashes. So we'll use system D for that purpose. Okay, let's talk a little bit about controlling galaxy with system dear supervisor. So, I'm going to talk mainly about the system D aspect. And if you would like to know more about the supervisor portion please come back and check this at your convenience. We're using system D. It's the current Linux in its system. It's used by all of the popular distributions. If you know how to manage system D then it'll be good for you no matter what distribution you're using. It's used to bootstrap all of the user space programs which are running after the system boots. It can manage processes it can manage children of those processes and it can restart processes when they crash. So system D replaces a lot of the existing in its systems. We've had various forms of service definitions the system D one is a lot cleaner and easier to understand than any of the previous incarnations. So system D layout has two main folders. There's the Lib system D system folder, which has all of the package provided definitions like when you install engine X. There's an engine X service unit that comes with engine X and this gets installed into Lib. It's a system D system directory. And these this holds all the user defined services so whenever you want to manage your own service or want to write a custom in it system or in it file to manage a service. All of those go in at some, and that's where galaxies will be as well. So this is what a system D galaxy service unit looks like we're using galaxies the example here. It starts off with an overview of the unit saying okay here is the description it has some name. It needs to start after the network service and also after the time sync service. And then we get into the service description which says actually how this piece of software will run. Here we set a you mask the default permissions that are set for the service. It's a simple service there are a couple of different types of services simple just means it starts and runs forever in the foreground. It starts under the user galaxy in the group galaxy. It has working directory of serve galaxy server so whenever the process starts the process will start from that working directory. It has a 10 second start timeout. If the service doesn't start responding within 10 seconds, or it crashes before then then system D will know to either restart the service or do something else. And then we get into the system D start command this says okay please run this command you'll need to specify the full paths to everything in that command. We also set custom environment variables like the galaxy home directory and where the galaxy virtual environment is. We additionally set a memory limit. This is one of the really nice things that system D can do since it integrates with C groups quite heavily. System D can say oh this service should never have more than this amount of CPU time this amount of disk time and this amount of memory. It's really a wonderful feature system B and whenever those processes meet those limits system D will kill them and potentially restart them. I say potentially because we've set here the restart policy to always but there are all there are alternative restart policies. Additionally we name all memory CPU and block IO accounting. So what these do is they just say, please keep track of how much memory CPU and block or disk usage. And this system D unit uses and all of its children, and that enables you to set the limits later. And lastly it says install wanted by multi user target so this says under a multi user system. This is wanted by the multi user environment, you don't need to worry about that one too much. System D services are accessed by the system CTL command this gives you the ability to see the status start stop and restart services as well as enable or disable services. These just say, should the service started boot or not. System CTL also improves upon previous commands, the previous Ubuntu command would only let you check the status of one service unit at a time. That system CTL enables you to run these commands across many services if you want. So you can see the system CTL status of galaxy and engine X and slur and other things all at once if you want, which is really a nice convenience. Speaking of status this is what the status output looks like. We can see the name of the unit, we can see that it's been loaded that it's been active since some time active is running and good and happy. This is the main PID main process ID. If you need to track that down on the system you can see how many tasks it's running how many different threads it's running memory and CPU as well. You can see how much CPU time it's used and how much memory. Again, all of this can be set to allow you to set threshold set when that service reaches those thresholds it will be killed, which is good for preventing individual services from taking over your system. You don't want the galaxy service to consume all of the memory on the system and then slow it down. And you can also see the C group here. So the C group lists the main process, as well as all of its children. All of its units in its systems didn't use C groups and thus couldn't track all of the children of processes so once a main paradigm launched some children. It lose track of all and C groups enables us to accounting not just on the main process ID but also at all its children, which makes it a huge improvement. So that's it for the system these slides and if you want to know more about supervisor you can read that here as well. Thank you. System D is a process manager, all of the major operating systems have since switched to system D originally they used a bunch of different ones but now we're mostly all standardized on system D. To set this up, we'll need to add a role to our playbook. Copy that. I'm going to go back to my terminal. I'm currently logged in as the galaxy user and I need to log back out. So I'm back to the Ubuntu user and the galaxy directory with my playbooks and everything. And now we're going to add a new role. Use galaxy you galaxy system D. And this is a role that will configure the system D unit files as they're called, which controls startup of galaxy. We also need to set a couple of variables for this. So we're going to go back into our group fires and we're going to set up some variables that control how system D works and delete the pluses again, which look nice and pretty and colorful. So we've got the system D section. We're going to be using galaxy system D mode mule. We talked about the USB mules before. And we just want to tell system D that we're going to be using mules the system D role knows a couple of different ways to serve in galaxy. We're going to set this variable is the list and address and then we're going to set the handler name. So remember when we set up galaxy we ran playbook the first time it said hey I don't know how to restart galaxy. This is where we tell it. Here's how I restart galaxy. We also need to find a handler that will actually invoke that. So we're going to set up galaxy is the actual galaxy role that the handler you should call is restart galaxy is restart galaxy. So now we need to define that hand and we're going to paste that in so in most roles this restarting is done automatically in the galaxy role this isn't because of how galaxy has so many different ways to play it. You can deploy it with Zerglings and Zergs and each of these has different very specific things about how they're restarted. And additionally on top of that the administrator of the galaxy may have different preferences. Like they may say oh I don't want you to restart automatically because I want to do it by hand or I want to do it part by part or something like this. So we give the administrators full control of a restarting galaxy. And yes so we have to find a handler here the handlers are called the very end of the playbook once all the tasks have run all the handlers are invoked. And here's one that just says it's called restart galaxy, which was the same name we told the role that we were going to call it. It's going to run the system D task. It's going to say the service named galaxy, it should be restarted. Handlers are not always invoked. They're only invoked when they're notified. So when galaxies updated something like this or the dependencies change or the configuration changes, then we notify the handler and say hey you need to activate do whatever you do. In this case, restart galaxy. And with that, we're ready to go. We're running through all of the configuration, all the steps, just to make sure it's exactly like we want it to be. And then when it gets finally to the end the only thing that should change this time is setting up the system D units. There we go. These three things changed that we deploy the unit, we enable it so it started at boot time. And then lastly that it needs to start. So if we do system CTL status galaxy, we should see our galaxy service and it should say active running. We can see that it's loaded from this file that says system D system galaxy dot service will go look at that file and second it's enabled. So, on boot. This will automatically be invoked. This is the main process us key and then it has a bunch of child processes that are also running. It has some tasks that are running and a limit memory and a limit. This is one of the nice things about system D is it's highly integrated with secrets. So if you're running system D units, hey, please don't use more than this much memory. Please don't use more memory than I have or save me for gigabytes or something like this to make sure that even if galaxies misbehaving as people are using it really, really heavily that the entire server doesn't become unresponsive, especially if you're running other services on the same server. You can see all of this. It looks fine. The service is running. If you see something else you should check for that. So, let's check that our galaxy is still loaded again before we had run the us key by hand. Okay, we can refresh and it works. And that means it's now running through system D. There is the command journal CTL. I'm going to follow a unit on the galaxy. And this will show us all of the log files from galaxy. So I can open up the log files here and see every time I refresh the page, new logs show up, which is exactly what we want to see. So we can see all of the logs from galaxy in one place, which is very convenient. I'm going to close out of that with control scene. Okay, this is looking good. Next, we're going to serve it with a proper web server. All right, let's talk about gearing towards production. So what is production galaxy server. This is a server that is ready to be used by many, many people. It's designed to be resilient, designed to be easy to scale and it's designed to be easy to manage. When we say resilient, we mean a server that if something goes wrong, if some user is using too many resources, the entire server won't be taken offline or compromised. So all of this is very easy to do with that. So we'll start with covering the configuration options, which are necessary for production server. Under the uski configuration, the most important option for you is changing HTTP to socket HTTP protocol is less efficient than the uski protocol. You can talk as more efficiently to engine x and engine x can handle a lot of the work for you. Processes and thread or additionally important options. So usually okay. But you may wish to tune these in your server for performance. Securing your object IDs is an extremely important option. So galaxy has internally given IDs to all of your different data sets your objects, every history has its own numeric ID starting from one. These are pretty easily guessable. So what galaxy does is it uses the ID secret to generate a reversible hash for every data set. Every day sets sort of encrypted and then decrypted back on the back end for all of these IDs. And this is done with the ID secret variable. This needs to be set to a very good long random secret. And if you have to change this at any point because it's a leaked because it's exposed. Then all of the URLs that have been generated for your histories and workflows and whatnot will be invalidated unfortunately. There's additionally in a variable called new user data set access role default private. This is a very long option. It sets the default permissions on a user's data set to private. In the past galaxies relied on the secrecy of the random looking URLs for each data set to give privacy to each of the data sets but setting this means that even if you can guess the IDs for data sets you still can access them. So if you're running a production galaxy server it's a very good idea to set this. There's some important brand customizations you can set so namely the mast head brand that we've set this in the tutorial and there are also a bunch of URLs you can set that'll help users find support. So a lot of these have default values that are decent they point to the galaxy project ones, but if you run your own support site you might want to change these. You have the ability as an admin to add a notice banner. This will be a banner that appears at the top of galaxy and says hey you whatever the message is something like maybe there's downtime scheduled so anytime you want to contribute or communicate very important messages to users. You can use the message box to do this. This cannot be dismissed users are forced to see this on every page of galaxy. So creating the welcome page is good idea. It can help communicate very important information to users. Things like group news downtime periods new tools that might be interesting to your users all of this can be communicated through the welcome page which people see when they visit galaxy. This is just part of running a production server right that you communicate well with your users you tell them what's going on, what changes are coming if there's any thing they need to be aware of all of that sort of thing. So for production usage also you might want to start worrying about the security of your data sets and the outputs from tools. There are a couple of tools in galaxy that are well used that produce HTML or SVG outputs. These can, in theory, contain attacks against other users. If you share an HTML page with another user from galaxy. There's the potential that malicious JavaScript could be included. So tools that are in the public repositories usually are quite good. You should review them for yourself though to make sure that this isn't impossible. And then additionally on top of that galaxy by default sanitizes all of the HTML outputs to prevent the sort of class of attack. And when you try and view these in the browser they'll look a bit strange because they have been rendered properly intentionally. So if you choose a whitelist of tools though that say okay these tools we know they produce good valid secure output and we're worried about them. There's additionally the sort of excess vulnerable mind types this is just for SGPG file types. And if you have tools that produce SVGs you might want to set this just so your users can actually view the outputs. There's some debugging options which if you're running production server you'll never need. So debugging options should not be enabled in production usually. If you're running a test galaxy, you can enable them there of course to test things out and see trying debug issues. Configuring ftp so when you're running a production server a lot of users will say oh I want to upload my data sets and while galaxy can natively handle a lot of these data set uploads. So having an ftp server can be a matter of convenience for them. You can say oh start ftp upload and let it run overnight or something and you can close your browser and not worry about interrupting the upload. Things like this. Users can upload to ftp and there are a couple of tools they'll let users export data back to ftp in order to download so this can just be a nice user convenience. So libraries will have a special tutorial on this but library management is a common problem for production servers how do I manage who has access to which libraries who can import data to the server that sort of thing. And by setting all of these options you can control who has access to do that and where they can import data from. So ftp is an important part of running production server. Emails can be sent to validate user email addresses which is important if you want to be able to actually communicate with your users again after they've registered. Galaxy can talk to an SMTP server and any error emails can be sent including to those users as well as to your group mailing list things like this. There are a lot more options though that you can set in the full galaxy.yaml file if you're running production server we strongly strongly encourage you to look through all of these options. See if any of them are relevant for you and then set them. Thank you. So running galaxy through you whiskey was good it's a very easy first step to just say hey does this work is everything turning on like we expect. But a more perform an option is to run it through engine x you whiskey natively can speak HTTP protocol but it speaks to the little bit inefficiently. As a result we can switch to engine x and switch to the whiskey protocol itself it has its own protocol named after itself which is very confusing. And this is a lot more performant. And next give us a lot of benefits like if you want to have upstream authentication like LDAP authentication through engine x this is an option or open ID things like this. Galaxy supports all of these things itself but if you have an existing CAS system at your university or so. It's also really good for serving static files. Galaxy has a lot of static files within itself it has both the user data the data sets that are created by users, as well as all of the CSS and JavaScript and images. All of that code can just be proxy by engine x engine x can read it directly from disk can skip the step of talking to you whiskey. It's a lot more efficient. And then you whiskey can save all of its processing time for tasks that are actually interesting like processing jobs rendering pages, etc. We really really strongly recommend that you use engine x or Apache either fine. We use engine x all of these galaxy dot star deployments. So we can recommend that. We also know a lot of users who already had an Apache server and just use that so I'm putting the engine x rule at the very bottom. And then we need to make this important change. If you don't make this change bad things happen. So we're going to edit our group bars slash galaxy servers. And then up here under the whiskey section, we're going to change it from sought HTTP to socket. And what this does is it's changes from us you'll no longer talk HTTP protocol instead will talk us to protocol and engine x is able to talk that talk that protocol as well. Additionally, we're switching it to listen only on 127 dot zero dot zero dot one. This means that no requests from outside of your server will be able to talk to galaxy. This starts to be part of our defense in depth we have our galaxy server we don't want it to talk to anyone other than the engine x server. And so engine x will talk to it on this board on the same post so everything will work well. But it prevents people from outside the server accessing. Next we're going to add a whole bunch of variables for search bot. We're going to go all the way down to the bottom and add our search bot variables. To remove the pluses if you're copying and pasting from training materials. So again search bot is what controls the SSL certificates. We set it to automatically renew at some random hour, some random minute of every day it'll check if it needs to be renewed. It will be authenticating through the web group method there are a couple of different ways that search bot can authenticate to itself to the let's encrypt servers and say hey I'm really gap minus whatever training like galaxy project to you. One of these is the web route method. And with the web route method search bot writes out a secret file to this web route this well known directory. And then it's informed the let's encrypt servers hey come check for this file at this domain name and this path. If you access that and do some cryptography, then they validate to say okay I trust that you have control over this domain name. We're going to share key share the key that is created with some users this is something we added to the role, because we often run this role and we say okay. This user needs access this user needs access this user needs access and so we grant permissions to individual users to access copies of the key. Whenever the keys, the SSL keys are successfully renewed, so I can execute some steps for us. We haven't restart any service that needs these SSL keys. In this case restart engine x and search bot domains this is a very, very important one. The search bot domains to be the actual domain name of our server and we told you earlier that you need need need to set the real host name in the host file. And this is why, because we use this variable inventory host name that ends full calculates when it's running from the host file. And so I know is what the server supposed to be called and it will template this out appropriately. Lastly, we agree to their terms of service. You should read those for yourself to be sure you do too. Once we've configured search bot, we will configure engine x as well. So engine x. There is this option for SC Linux users which is only centers and rail, not going to have him. It'll have a couple of servers and an SSL servers these are groups of engine x configuration that does or doesn't have SSL. We also have a server set up that redirects always to SSL. And then under SSL servers will have our galaxy server configuration this will be the actual engine x configuration that knows how to talk to your scheme. We also have an able default server there's a built in configuration file that comes with engine x and we don't want that we want to have full control over what we're doing. So we set some engine x configuration options, like max body science which controls how big files can be up files can be that are uploaded to engine x. This is performance protection. We define an SSL role. So the engine x role that we're using from the galaxy project team knows how to use some different SSL roles to configure the SSL server or certificates. And lastly, we'll specify the certificate and private key that'll be accessed and that we've configured almost all of how engine x works. If you're running this tutorial without SSL if you don't have a host name, a proper host name, you can read this section in order to make the necessary changes. So we're going to keep copying this time we're going to create some templates for what the actual engine x configuration should look like. So we need to create this directory templates engine x, and then we're going to edit templates engine x redirect SSL dot j2. I'm going to paste in this content. So what this does is it defines a server that listens on port 80. It has this host name so it'll only respond to requests coming in to get zero or whatever. And that's a location well known so this is what talks with the or works with the cert bot role in order to handle the authentication to the legs of correct service. But any requests coming in for any other path should automatically be redirected to the HTTPS version of this. So that's our redirection. We will edit the galaxy version. And what this will do is handle how engine x actually talks to galaxy. We'll go through this as well. So instead of listening on 80 it listens on port 443 and uses SSL. It has again the same server name that it's pulling from the host files so it really really has to be correct. You'd find where our log files will go. And then here is the most important location block so this location says basically everything should be redirected through the whiskey pass protocol so you whiskey protocol. You whiskey pass passes it on to this address. This address needs to be the same as what you whiskey is configured to listen on. So remember we set earlier our group far as galaxy servers and we set up here in the whiskey socket one to set. So you whiskey is configured to listen on 127 8080. And we need to pass it to 127 8080. We will pass some additional default parameters and also pass the scheme. The scheme is HTTP or HTTPS. These files are served by engine x which is a whole lot more efficient than having us get to it having its waste time computation time serving files. And you next can directly read these from static folder and just respond instantly engine x also has its own caching. They'll keep items in memory or on disk if it needs to calculate them. We're going to point our welcome to the HTML right place. And there's some advanced configuration that's just necessary for visualizations to work correctly. Okay. That's looking pretty good. And again there's more notes if you're running this without a proper host name but if you're in the gap, you should be doing this. You don't need that you're you have SSL you have a proper whose name. So with that we're ready to run the playbook. So what changes did we make we made a change to the galaxy configuration how you whiskey is listening. So we should see a changed task whenever that's being processed right there create galaxy configuration file changed. So that looks good. And that is something that will tell the handler at the end that it needs to be invoked and the galaxy needs to be restarted. Nothing else is changing, which is as we expect. Okay, now we're getting into the engine x role. The engine x role will install the engine x package. There's a specific version of the engine x package that has a lot of stuff compiled into it. It'll set up the configuration. And the engine x role works a little bit differently than other roles simply because of the dance it has to do to get necessary certificate. It's a complicated thing but again, you don't have to worry about that because it's all handled for you. Okay, there was nice message up there saying we were approved we got a certificate everything looks good. The task is being changed. And we can see all of our handlers running. Some of the handlers ran halfway through, and then the rest of the handlers ran at the end again that's just a specific thing about the engine x role and how it works with certain. But something that's not important for you to understand. So if we run system CTL status engine x, we should see that engine x is running. We run it with galaxy we should see the galaxy is running. Everything looks good. And now we should be able to access our server. And this time we're going to remove port 8080 we're just going to access just the host name. And you'll note that we got redirected to HTTPS, but now we've got this big potential security risk ahead warning. And we need to explain a little bit about why we're getting that. So in this training, we do not request production SSL certificates from let's encrypt. They have two categories of SSL certificates they have a staging certificate and a production certificate. The production certificate gives you a little green check mark. Everything's good. The staging certificate is considered invalid by every browser. But that's fine. There are rate limits on the production certificates. And we cannot request too many of those at once from specific domain you can't have too many failures, all of these different requirements. And if you're just one person working on one domain. It's fine. It's not an issue. But if you're setting up a training like this, then the hundred people who are in this training over the course of a week will hit those rate limits very quickly. And it will stop working. So what we do is we request the staging certificate for everyone. And you get this big warning message. What you can do is you can click on advanced. Accept the risk and continue. You can view the certificate if you like, you'll see that says fake le intermediate. This is a fake let's encrypt intermediate certificate. But that's exactly what we expect. So we know that's good. Accept the risk and continue. And you'll get to your galaxy. And now this is being served by engine X and everything looks pretty perfect. But earlier that this wasn't quite formatted correctly. You risky, you didn't have all the static maps that needed, but engine X knows how to proxy everything perfectly. So that looks great. Congratulations to everyone who made it this far and getting your galaxy running. There is a note here on role dependencies I'd like to talk about quickly. So we've mostly been adding these roles at the end every time. But there's actually a lot of complex interdependencies between these roles. They're not so bad now, but they get a lot worse as we go on during the week. So the PostgreSQL role doesn't have any requirements. The PostgreSQL object role also doesn't depend on like any variables being set or anything it just depends on this role being done and completed. No dependencies galaxy, no dependencies. The miniconda role depends on variables set by galaxy so miniconda role strictly has to come after the galaxy role. Same for the system D role, and same for engine X all of these point to the galaxy to variables set by the galaxy role. We can see that in the templates engine X galaxy we set up. The editor directory are all set by the galaxy role. So the galaxy role needs to run first. And with that, we're ready to log into galaxy. And if you use an email that is one of the admin user emails that we said earlier, then we'll be an admin user so I'm going to check which emails we set. And here we said admin at example.org. So I'm going to create an account login or register back in our galaxy with one of these administrator emails with that create. And there we're logged in a galaxy admin can see that we have this admin menu that appears. And if you've made this far fantastic touch yourself in the back you've. This has been the hard part, everything gets easier after that. You know the galaxy that's working. You know it works you know the playbooks work and now you can just add stuff on top of that to make your galaxy more exciting more fun. Okay, let's get one of these more fun things set to the job configuration. So by default galaxy runs. It handles all of its own jobs. We'll discuss the job configuration detail on Wednesday when we go through the connecting galaxy to a job cluster. Distributed resource manager but for now we're just going to set up a very basic job configuration file. This will just tell galaxy here explicitly is how we want things to be run. There are a couple of basic sections plugins destinations and tools plugins tell galaxy here are the different types of resource managers we want to talk to the different type of job systems. All of the destinations lists all of the different configurations we want to send galaxy jobs to us such as maybe I have a big memory configuration for some of my high memory tools or I have a five CPU destination for some tools need five CPUs course that sort of thing. And then lastly is the tool section this says okay this tool should go to this destination a very static mapping. We'll cover that again a bit more on Wednesday. Now configuration looks like we've got a plugins section and inside this plugin section is this local job runner section. This local job runner plugin is just galaxy running the jobs itself it'll start the bash it'll start the command on the shell it will wait for it to finish etc. This is very inefficient not great etc. The worst part about the local runner is when you restart galaxy it'll kill all of the jobs so if you're setting up a local galaxy. This works to test, but we'll get to Wednesday on how you should do this properly. We also set workers equals for this is just for friends that are responsible for working on jobs. So we're going to run the IDs local and the runner is local so that runner points over to this plugin. So anything sent to destination local gets run by the local plugin. So we're going to set up our templates galaxy config directory. And inside templates, we have the engine x and galaxy directories now inside galaxy will just have the config directory. And right there, we're going to set up all of the job configuration. So we're going to create this file templates galaxy config job con. That was guys big job con. And inside this job configuration file, we're just going to set up this really basic galaxy configuration for jobs. Very simple doesn't do anything exciting. But it's just a good thing to do explicitly. And now we need to tell galaxy hey we've added this job configuration file. This, you will find it inside the configuration directory, and it should be called john conf.xml. So I'm going to open up group bars galaxy servers. And for those of you who aren't familiar with that, by the way, it just says that in the old this was the old version of this file. This is the new version of this file. And around this area, something was changed and you can see a little bit of context before and after the changed lines. This just gives you an idea of where this change belongs in the whole configuration file because the whole configuration file is quite big right. So we've added 96 lines of configuration, which is a lot. And so this job config file needs to go just above the whiskey at the same level as all of this file path and check my great tools. So we'll see right there is where this line belongs. And below that, we also need to configure galaxy config templates. So, within the Ansible Galaxy role, sometimes you need to send files just somewhere on the galaxy server you want to say I want this template file or this static file to end up somewhere inside galaxy. So what we're going to do here is we're going to say this template source templates galaxy config job job conf should end up this destination. This destination doesn't look like a normal destination right we've written galaxy config dot galaxy job config file. So what's going on there is that when this variable here is being read, it looks in galaxy config galaxy job config file and says oh you want it in the configuration directory slash job con. So what happens there is the galaxy says, or the Ansible Galaxy role says okay I want to template this out to this location, and by defining this location as exactly what we tell galaxy exactly where we tell galaxy the job configuration files should be we're certain that those are in sync right. We know that that job conflict template template that we are describing will go exactly where galaxy is going to look for it afterwards. That's a nice thing that we can do Ansible is refer to other parts of the configuration to make sure that all the configuration stays in sync. But there's only one variable that controls where that file belongs and that anytime we need to know where that file belongs. And that we can run the playbook configuration file has changed again. That's the job configuration variable that we've set being passed along. And when it's done we should be able to check out our configuration file. So again, this won't change how your galaxy works. This is the basic default configuration. We're doing this explicitly because it makes night life a little bit nicer galaxy's restarted up there and everything's done 101 we're okay free things for changed 58 things were skipped that looks good. And as we said we can check out our job configuration file and see it looks approximately like our templates. And again this is super boring right now right like we've just set the config file and oh look it's exactly what we told it to be. This will get more interesting on other days of the workshop where we start templating things out in the job configuration file or we set up advanced things and got job configuration plan. So what would happen if disaster were to strike. Well if disaster strikes absolutely nothing happens, you're perfectly safe. For this disaster, we can pretend that your database is on another machine. You don't need to follow along with this unless you just feel like it. I'm going to do it just to show you how easy it is to recover from everything going wrong. So I'm going to give this workshop in person, one of us will wipe the galaxy directories of everyone in the workshop, and they'll go Oh no what happened you know my galaxy disappeared, and we'll say yeah well, you know, disaster happened. Meteor hit your server room something like that. And because you did everything right because he's answerable because you followed all the best practices because you've written down everything in this configuration. And then your directory. All of this configuration is backed up somewhere and safe. You have your user data backed up and your database back. We're making some assumptions here and say your galaxy server disappears. Well, you're going to be absolutely fine. So carefully, I'm going to my entire galaxy server. There's stuff in there now. And I'm just going to wipe out all of it. And again, we're assuming that the user data is backed up, and that the database is backed up or somewhere else on a different server or something like this. Disaster disaster striking and Ansible saves you from all of the bad things that can happen, because we did everything in Ansible. We know we're going to be fine. We can try accessing our galaxy and nothing it's broken. Oh no, how horrible. So here's going to be the very labor intensive process to revert the apocalypse. We're going to run the playbook. And then we're going to get go get a cup of coffee coffee while it runs. We're going to pat ourselves on the back for saving the day because we did everything in Ansible and everything can just reset. Life is easy. All of this is running again. All of the changes are being made like, oh, we need to update the directories. We need to recreate this jobs directory and recreate the configuration file. And that's what's doing all of that for us. So galaxy will come back up everything will be fine. As long as you back up the parts that need to be backed up like to use your data and the database. And you have your Ansible playbook somewhere safe and hopefully backed up as well. And we have a lot of experience in this we've the galaxy administration teachers. We've managed a lot of hardware our lives we know that things happen that things happen, your things happen hardware dies for no reason. But because we've written everything down, we know that we can just run our playbook, and everything is going to come back, and we don't have to worry about it. We really, we recommend strongly that you put everything possible and Ansible just so you have this one button you can press, you just press playbook galaxy dot XML, or y'all, and everything gets recreated and life is so good. We use this a lot galaxy Europe especially. We have a lot of our infrastructure as virtual machines and anytime something goes wrong with the virtual machine, we just press delete. And we run the playbook and the virtual machine gets recreated with everything. So back up your playbooks back up everything that needs to be backed up. And when something does happen something completely unexpected, you'll be fine. So, while that's running, I'm going to run through the rest of the tutorial production and maintenance is an important section so the lots of people have questions over okay how much time do I need to spend maintaining galaxy with Ansible with a smallish server say I do 25 users, it can be a day or two per month maintenance just to update new tools install new tools. Update galaxy whenever a release is made you may want to take some time to do that. Compare that with large public servers like use galaxy dot org EU and org AU. Oh, these are full time jobs for one to two people. The admins do find time to do other things but it's a very intensive job. So keeping galaxy updated one of the important things here is setting the galaxy commit ID to a release branch. So we set ours to 20.09. And as a result if there are any bug fixes made to the 20.09 branch or any hot patches that we need from the galaxy team. We'll get them for free whenever we run the playbook again, and I'll update the latest version of the branch and we'll be good to go. This is another moment when, if you run the playbook regularly. It's a good thing, because then you can be sure that your galaxy is updated. Everything's good. One thing this won't do however is update you from one release to the next. That is something you'll have to do manually, which is a very difficult process or not. All you have to do is change the commit ID. It's a good idea to check all of the release notes. Check out the latest galaxy.yaml.sample file to see if there are any fun new configuration options that you might want to use. Compare the other configuration files to see if anything's changed that you need to change. Say you've hard coded some data types you may want to check that there are no new data types that sort of thing. You may want to check that there are no new configuration options that you might want to enable just because you're waiting for them or you're excited about the possible feature. But this is the bulk of what you have to do. This is something. When we switched to this, it saved me literally two days worth of effort. It went from a two day process to make sure everything was updated on the server, carefully modify things by hand. I'll change the release variable, and I'll maybe diff some files to make sure everything looks good, make sure that we aren't missing any new configuration options, but it makes life so easy. Use Ansible. User support is a question we often get. How can we help you help your users? And there are lots of resources for this. Help.galaxyproject.org is the primary landing point for user help. So if you have users and they need help with biological tasks or biochemical tasks or biofrags, send them to help.galaxyproject.org. There is a very large user community. They'll probably find help there. And if not, they'll find direction to somewhere where they can get help. User impersonation is a nice variable to set if you're the Galaxy admin. This enables all Galaxy admins to impersonate individual users. So when a user comes to you and says, hey, my job isn't working. If your users are like my users, they send you a screenshot of a red item in their history, and it's like, okay, that's not super helpful. But with allow user impersonation, you can impersonate them, you can be their account, and then go view the error report yourself to figure out what went wrong. It makes life a lot easier. Yes, the note on running on a cluster. So this comes back on Wednesday as well. If you're running Galaxy and you have a cluster, you have a network of computers that you're using to run jobs and to hold the user data, all of these different things. Some bits of Galaxy need to be accessible on that cluster, because when a tool runs or a job runs, it may need to look at Galaxy's dependencies or configuration, things like our server code from Galaxy itself for things like the upload jobs. When those run on a cluster, they need code from the Galaxy code base. So these are all the directories that should need to be exported and need to be the same between Galaxy itself and the cluster. Some of these are the same directory, namely the data directory, the user data directory that needs to be precisely the same thing. However, some of them also can run off of two different copies, if that makes life easier. If you want to have the server directory, you can just reclone Galaxy on your cluster, if that's a bit faster for you. You'll be using something like NFS, and you'll need to have stuff exported over NFS to both your Galaxy server and to your data server, or your compute cluster. Other software, lots of people ask about playing other software within Galaxy or within the Ansible role, and this is definitely possible. Sometimes you can write your own Ansible role to deploy that software, just to make sure everything stays in one place and you have this nice playbook that does everything to get your entire server set up. You don't have to do everything in Ansible though. You can treat it piecewise, you can take just one part of your infrastructure, in this case Galaxy, and switch the server to Ansible. You don't have to switch everything at once. And with that, let's check back to see if everything works exactly. Okay, I'll look at that through a little bit of movie magic, the server is completely done running Ansible again. It took us a minutes, we can refresh, and our Galaxy is back. You've saved the day by using Ansible. Thanks for following this tutorial. Let us know if you have any questions. More importantly, please give us feedback on this content. And if you have any questions, please let us know how it went, what you felt about if there was anything too difficult or any questions you have that weren't answered in the tutorial. Let's know. Deployment with Ansible is really easy, complexity can grow. There are a lot of examples of public playbooks you don't have to start with that you can start with something really simple like this, and it should be really easy. Congratulations and thank you.