 Hi, Helene and Ashley here. I'm one of your Galaxy administrators. Today, you're about to follow the Ansible Galaxy tutorial by Nate Carrar. It's a long tutorial and we've not changed so much since it was first recorded last year. There is one thing you will need to change when you follow this tutorial. Everything Nate says is correct, but at one point he sets the Galaxy release. When you get to that point, you will need to set the Galaxy release to 22.01, the current year, the last two digits of the current year and the current month. This will set up the latest Galaxy for you and this is needed for a couple of tutorials later in the week. Thanks and enjoy. Hi, welcome to Galaxy Installation with Ansible, the tutorial. My name is Nate Carrar. I work for the Galaxy Project at Penn State and I administrate the usegalaxy.org public server. So we're going to be covering today the installation process of Galaxy using Ansible. You should have already watched the Ansible tutorial to learn a little bit about what it is and how it works and what it's configuration and setup is structured like. So we use Ansible to install Galaxy to ensure consistency, make it easy and because the folks who administrate the large Galaxy servers who have put this course together for you all use Ansible to deploy their servers. So when you deploy your Galaxy server using Ansible, you're getting a lot of reused and best practice workflows for not Galaxy workflows but administration workflows for assembling your Galaxy server and running it in a best practice production manner. Looking through the tutorial here, we'll start by talking about the structure of the Galaxy playbook and what all goes into it, all the different components and how it goes together to make your Galaxy server. And then we'll actually run through the process of installing Galaxy including its dependencies and the pieces of software that it works together with so databases, web servers and so forth. Galaxy itself and we'll log in and set up a couple of configuration files that are useful for running Galaxy. And then at the end we'll talk about the production and maintenance of Galaxy server, how it stays updated, and how to upgrade Galaxy. And so forth. As you can see in this box here, all of our roles that we use so the Ansible roles are designed to be compatible with both enterprise Linux so Red Hat Sentos and so forth, and Ubuntu. It says 18.04 here but we're actually running 20.04 and and other Debian variants. So if your system is not exactly like the ones that we use during the trainings, which are Ubuntu 20.04 that's fine. If it should, what we do here today should work with any modern Linux system from these these popular distributions. And if there are any differences between what you need to configure on a Sentos system versus Ubuntu, we try to list those here in the in the tutorial. So let's move on to the Galaxy Playbook. So we're going to be using the official Galaxy role for installing Galaxy. This is found in Ansible Galaxy, which is not of any relation to our Galaxy. It's the same for their sort of app store where consumable reusable pieces like roles can be uploaded and shared or similar to Galaxy's tool shed in that regard. And so we have published the roles that that we write for other people to use and that we use ourselves to Ansible Galaxy. The role that does the task of installing and configuring Galaxy is called Galaxy project Galaxy. Most of the roles that we use today will be under the namespace of either Galaxy project or use Galaxy EU. And most tends to denote which organization, either US based or Europe based organization originated the roles but we work closely together, and all of us have contributed to each other's roles at this point. The project that Galaxy role is very configurable, it can do almost anything but we try to make it as easy to use as possible, without having to configure it very much. So we're going to start with just a few variables that we'll talk about, and we'll talk about more as we go through the training, but in general, it can do a lot. And the documentation for the role, as well as the defaults file for the role are the best places to look to figure out all of the different things that you can do with it. And you will all mention this again but you'll probably find as we're going through this like how would you know to set this variable how would you know to what values to set it to. And the answer is always the role documentation should explain that for you. So the important variables that we're going to be covering here. At the beginning, our galaxy route galaxy commit ID galaxy config and galaxy server there. The route is there are numerous ways that you can lay out a galaxy server on the file system. But the one that we typically use the most is this galaxy route where there is one root directory somewhere on your system and galaxy gets installed underneath there so there might there are separate sub directories for things like the config files for the galaxy code itself for galaxies virtual and for all of its Python dependencies are installed for it's the tools that you install from the galaxy tool shed, and so forth. Everything lives under that galaxy group galaxy commit ID controls what version of galaxy you're installing so you can either set it to a specific commit where it will never change or you can set it. And in most cases this is what we do. And it's what we'll do in the tutorial you set it to a branch so that you are running a specific, the latest commit on a specific version of galaxy. The fig is a large dictionary variable or hash in Ansible dictionaries are referred to as hashes but it's the same thing. Where you set up the contents essentially of galaxies main config file. If you've ever worked with galaxy before that's that's called galaxy dot yaml and the contents of that will go into this galaxy config variable. And then galaxy server dirt is where the galaxy code lives but this is automatically set for you based on galaxy route but it you'll see it referred to a fair amount. Google gives you a lot of places to store variables, but we recommend that you set these in the galaxy servers group variables file. The reason for this is that we're going to define a hosts file or inventory file that specifies which hosts in our inventory or galaxy servers. This ensures that the variables that we set only apply to the galaxy servers. And in some cases you can have overlapping variables where it wouldn't make sense for those to be set on other hosts in your inventory. For this training it doesn't make a difference because we only have one host but as your Ansible infrastructure grows, you're going to want to think about which files you define your variables in. So when you begin to execute a role everything starts in the roles tasks main dot yaml file. And if you were to go and look at that file in the galaxy project galaxy for all you'd see that it does a number of things, which mostly include including tasks from other files in the tasks directory, but those steps that are run through in that main tasks file are these you're going to clone or download galaxy, manage its configuration files. Fetch the dependencies for galaxies, Python application, manage the mutable setup so that means dealing with the config files that galaxy writes to itself that's what we mean by mutable. Manage the database and then build the galaxy client application JavaScript application. So we'll talk about each of these steps in detail. So the first thing that's going to happen is to clone galaxy. This is done using get Ansible's get module, and is the primary way to install galaxy there are a couple of other options but generally to have an up to date running galaxy server production galaxy server you want to use the get clone method. So the role is going to clone it if it's not installed yet or it'll update it if you have set that commit ID variable to a branch. And there are new commits on that branch. It then tells you if it's the version that get commit ID changed between what it was before and what it is after running that. It then creates galaxies virtual and this is a Python abstraction that creates sort of a virtual Python interpreter, where all of the dependencies that galaxy has can be installed in a way that won't affect other Python applications on your system. So it creates a virtual and with nothing in it. And then it updates pip, which is the package manager for Python to the newest version. And then it removes any pyc files so these are Python bytecode files. They are essentially compiled forms of the Python code in the in dot py files. And if they're left behind, let's say in an update to galaxy we've removed a dot py file move it somewhere else. Whatever if the pyc files are left behind you can have some weird interactions so there is a script that runs as part of this this cloning process to make sure that no pyc files are left behind and then it recompiles all of them based on the current dot py files. So, after that galaxy has been cloned to the disk and is ready to be configured, which is the next task. So the static configuration then proceeds, we create directories for the galaxy configuration files. Any config files that we've specified are copied over any templates that we've specified or copied over so these are two different things and you'll see below how they. These are addressed but essentially config files are files that we copy unmodified from the Ansible playbook and templates are ones where we need to fill certain values in out of Ansible variables or whatever. And then finally the galaxy.yaml file itself which is galaxies core configuration file is deployed. The additional configuration files that you can install with galaxy, many of them are not required, but as you add more features or enable more features on your galaxy server. You may add end up adding some of these so here's an example from use galaxy dot you. And for example you have this galaxy data types configuration file this is one that's commonly modified on different servers if you have your own custom file types data types that you want to define. And you can see here that everything in this galaxy config files variable this is a list, which is what this little dash here indicates, and each member in that list is a is a hash with keys source and and the source means that the file underneath in your playbook in at the path files galaxy config data types com.xml will be passed to a destination that is this template variable so you can see these these two brackets or two curly braces rather are around either side of this variable in the middle, which will be filled in by galaxy with it filled in by Ansible with its value. And what is its value. Well it's defined. Eventually we will define this in in the galaxy config variable. The galaxy config is a hash with key galaxy. And then in that key there is a data types config file key. This corresponds, again to the galaxy.yml config file if you're already familiar with with galaxy configuration. So, you can see that this structure up here, galaxy config dot galaxy dot data types config file corresponds to the galaxy config galaxy data types config file down here in the galaxy in the variables. And the reason that we do this is pretty important. And it's explained in this in this box down here. It's because we only want to define the path to this config file once right. This tells this tells galaxy where it is when we define it here. And this tells Ansible how to install it from our playbook. We reuse the value here that we've defined down here, because we don't want that to ever become out of sync otherwise you're going to have errors when you run your playbook. Let's say I just set destination here to a an absolute path instead of this variable. I set it to, you know, slash serve galaxy config data types com.xml. That would work, it would get copied to that path. But if I ever changed where I wanted that to go I changed this galaxy config or whatever. Then galaxy would start reading the data types config file from a different path than where we were actually copying it to. So, so this is sort of a best practice to use always try to make sure that if you have to find a path or anything in Ansible in one place that everything else that needs that value refers to that one place so that you only have to change it once. Okay, so the next step then is to install dependencies. This happens using galaxies requirements file. So inside the galaxy source code for each version of galaxy. There's a list of Python packages that have to be installed and all the versions that they're installed at. We pin our versions which means these are the versions that we know that this galaxy release works with. And those get updated regularly with new versions of galaxy. So, we're going to install all those into the virtual and or rather Ansible do that for us. And then there are a number of conditional dependencies. So the first step there is all of the dependencies that are required no matter what. And then the second set is any that are necessary based on the features of galaxy that you've enabled in in the config file so that the most obvious one of these is, or commonly encountered. The first step is when you enable galaxy to use PostgreSQL as a database back end. By default, it doesn't come with the actual dependency that is needed to use that. But once you enable the option in the config file it'll then install that additional dependency. So these are all installed into the virtual and then it moves on to setting up the mutable config files which I said are the ones that galaxy writes to itself. You do have to be aware that these files exist. They get written to when you do things like install tools from the tool shed. So maybe you're not going to modify them yourself. So that the role needs to make sure that they exist for the first time that you run galaxy and it will do that now. If if they don't exist. The next step is to manage the database. So galaxies database schema is versioned. And with each new galaxy release, you may find that there are new versions of that schema that need to be updated. So the steps that happen in this section are to obtain the current database schema version, and then figure out what the maximum version is what what it should be upgraded to. And then if there is no current version at all, then the database is created from scratch. If there is a current version. And it's not the same as the maximum version, then the process will be run to upgrade that schema. And finally, the galaxy client application will be built so galaxy is too large components, it's a server back end that's written in Python, and then it's a client application that gets delivered to the browser written in JavaScript. And that means that client application has to be built, which means fetching its dependencies and bundling its components, ugly fine code and so forth so that it is delivered to the client as a as quickly as possible. So this process takes a long time, but you will see that it essentially runs and for every time that the galaxy code has been updated or changed. So if you were made that require a restart of galaxy. We use the ansible notification and handler system to to perform that action. And for a long time there was no built in method to do this you had to define your own handler. Now that the role will automatically configure system D or galaxy to start with system D for you. This this can be done automatically but be aware that if you're deploying somewhere where you aren't able to use system D to run your galaxy server. You may have to set this galaxy restart handler name and define your own ansible handler that controls how galaxy restarts. So if you want to look through the role and figure out what other variables can possibly be set. You can see all in the defaults file linked here. And you can see all of the possible things that can possibly be done. So in summary, to recap, the steps that take place galaxy is cloned, or it's updated if it's already been cloned and there's new commits on the branch that you're following. Virtual envis created if it doesn't exist, configuration files are installed, any missing dependencies are installed, which the first time that you run is all of them. The database is created, and if any updates are needed to the schema those are applied, and then the client application is built and deployed. So you could do this all yourself or you could make your own role that does this, but the galaxy project galaxy role that we've written and developed over many years does all this stuff for you. And, and we keep it up to date so that it can always work with the latest galaxy changes. So we need to get our hands dirty and start installing galaxy. So before we install galaxy itself, we actually need to install some of the prerequisites. And the first of those is going to be the galaxy database. And all of the ansible roles that we use to actually install galaxy. So, if you are running this. Part of a galaxy admin training course, then all of these things in this box here should be taken care of for you, but just to show you what needs to be done will walk through these. So if you have ansible install on the machine where you will install galaxy, we can check that here by running ansible dash this version you can see I'm running 2.11.1, which is fine. In a production setup you would be more likely to run ansible from another system. I run it from my laptop or desktop at my office, or other servers, anywhere with an SSH connection to the systems that you're trying to manage. But for the purposes of the training, we have you run it directly on the VMware galaxy will be installed because that allows us to make sure everything's a nice controlled working environment. So, we checked to see that ansible is at least version 2.7. And we need to create an inventory file, and we have to put our hosts into a group called galaxy servers which we'll do in the next step. The VM has to have a public DNS name which is true for all of the VM setup for the admin training, have to have Python three installed, which we do 3.8.5 in our case. And we'll have to put the DNS full DNS name into our inventory file we set that up. That's fine and the ports that we need to access for SSH and the web and so forth are open on our VMs. Okay. And we are running on Ubuntu 20.04 so all these instructions should work for us. So we're ready to get started. Now, the first steps will be to write our playbook and set up the basic things needed for it to run. So, we're going to start by creating a directory. Galaxy. See there. And then we'll create a requirements.yaml file with the contents that you see here. As you can see everything in this training, where you have to enter text is shown in the form of a diff. So, in this case, you can see that the old file here was dev null. The new file is requirements.yaml. That's because this file didn't exist before. If the minus file, the file that you're changing from is dev null, that means that the file that you're creating is brand new. So, here we've specified a number of Ansible roles and these are going to be installed from Ansible Galaxy that Ansible tool shed of no relation to the Galaxy project that we talked about before. And so we've specified all of these roles that we're going to use and the versions that we want to use of those roles. In most cases, these are the latest versions of all these roles as of the time of recording. And I'll tell you what each of these roles does here. So, the first one, we've talked about it quite a bit already this installs manages Galaxy. Second one is going to install engine X, the web proxy server that will sit in front of Galaxy and serve web requests out to clients. The third one is for Postgres, the database which will sit behind Galaxy and store all of its persistent data such as user accounts and all the metadata about the data in Galaxy and so forth. The fourth one then is Postgres QL objects, which is separate from the Postgres QL role which only does the tasks of installing the database, configuring it, you know it's config file and dealing with the backups of Postgres. There's a separate role that we have called Postgres QL objects to do the creation of databases and users in the database and so forth. Managing of database permissions as well. The next one will be a PIP module. So PIP as I mentioned before is Python's package manager, and this module from Geerling Guy, who is a prolific Ansible role author will install PIP for us. We'll also be using this mini Conda role. So Conda is a pre-packaged version of Python and a package manager that can install all kinds of different packages. They don't have to be Python packages. So it's a standalone system that sits on top of your operating system, doesn't depend on apt or yum or one of those other systems. Galaxy uses this to install dependencies. So this training has layer modules in this training have focused on using singularity for dependencies, but Galaxy is still very closely tied to Conda to provide tool dependencies when containers aren't used. Finally, we have the use Galaxy EU Serpot role, which will be used to fetch SSL certificates for our web server so that it's secure. Okay, so we are ready to install these roles, which we do with this command here. So if you want to install Galaxy install dash P roles dash our requirements.yaml and you can see it's going out and downloading all of these different roles and extracting them to the roles directory. So if you look at roles, you'll see that all the ones that we listed are installed and in fact installed them at the versions that we specified in the requirements file. Next, we're going to create a config file for Ansible Ansible dot CFG. And this allows us to define some defaults that are kind of nice and handy to use that will make our life easier as we run our playbook. In the same directory here, I'm going to create Ansible dot CFG and paste this stuff in. And what this is, we've forced Ansible with this interpreter Python option, we force it to use Python three, even if it finds Python two in the system. And this should save us some headaches later as Ansible can have some dependencies on certain Python modules that have to be installed on the system, and we're only going to install the Python three versions of those modules. Here, we've also set the inventory option here to hosts, and this is a file name, this hosts. And what this does is it means that we don't have to specify using the dash i flag on the command line every time that we run the Ansible playbook command to specify what the host is this just sets it as a default. So retry files enabled anytime that Ansible fails. It leaves behind these retry files so that you can pick up from the point where it left off but they don't work all that well, and they clutter up your directory so we disable those. Okay, and we're done in there. Note in here in the training that if you're running. If you're running Ansible over SSH which you probably will in a production environment that you might want to enable this pipeline option for SSH. It's very nice. If you click the link it'll explain more about it. It essentially will make your playbook run much faster, most likely. So we create the hosts file. And this should have a group and Ansible group in it called Galaxy servers. This file is basically I and I format. And so you can see the group is is an I and I section name. And the member, the members or member in our case just one of that group is going to be our galaxy server so in my case, I'm using gap 42 dot training galaxy project. And in here you have to put your full actual host name that your you want your SSL certificates to be assigned to. And so this needs to be the address that you'll access your galaxy server over the web at. If your if your servers DNS host name is different than the host name that your galaxy server is going to run as, then you have to make some changes here. But this is very important has to be the correct host name. So we also want to set the Ansible connection to local. This means that instead of the default which is SSH we're just going to operate on the local system only. Because otherwise it would try to connect to itself over SSH which we don't want to do. All right and finally, we need to set Ansible user. And in this case, it's the username that we connect to the system as that has admin privileges or pseudo privileges, and that's going to be a boom to in our case. Okay. I believe that's it. Yep. Connection local Ansible user went to just make sure that if you are copying this stuff out of your hosts that you change the host name to be your correct host name. It's not going to be this. That should be it. So we're done with the basic setup here. So you should see in your galaxy directory, the Ansible config hosts file requirements that yamble and the roles directory. And now we are ready to install Postgres. But first we'll take a quick diversion to talk about Postgres itself and why we use it. Okay, so a little bit of a chat about databases. So, galaxy is the database for the, all the objects that you work with in galaxy and all of their relations. So that means things like users histories data sets and workflows. We don't have those data sets there. But as you can see down below, we don't actually store the data sets themselves we store information about the data sets we store metadata about the data sets in the database. The data sets themselves which are generally files anywhere from a couple of bytes, all the way up to many, many gigabytes are stored on disk, and you can control where those live. So, we store these these objects in the database, and we use it for things like persisting the state of galaxy jobs so that if you restart the galaxy server we know what jobs are running at that time, and so forth. We don't buy default galaxy, or not buy default galaxy, the only way that galaxy can run is using the sequel alchemy database abstraction layer. So what this does is allows us to write Python code that interacts with stuff in the database without having to directly run it. And so this this makes for much more powerful ability to program, but it also means that as an abstraction layer, it can have multiple different database back ends behind it. And so, by default, when you start clone and start run galaxy, it actually uses a sequel light database instead of postgres, and you can find this database in if you were to start galaxy without switching it to postgres you could find it at database as university SQL light. So, this is actually really nice for development, most people doing development on galaxy don't run postgres. Because it's quick and easy to get a single a sequel light instance of galaxy up and running you don't have to have there's no external services that you have to deal with. But when you're running a production galaxy server, you want to have a much more powerful and scalable database. And for that we recommend postgres. Because we use a database abstraction layer it is possible to use other things, including my sequel. But we don't test on it. We don't use it and don't recommend it unless you have no other option. A note about sizing the database file system so we rarely delete things from the database. When you delete a user. So, because of the history in galaxy. Most of the times we're not actually deleting that row out of the database, we're deleting we're marking it deleted so there's a column that says deleted, we set that to true. So because of this the database just generally is going to grow and never decrease over the life of your galaxy server. I would suggest that you start with at least 20 gigs of data for the volume where the postgres database lives. If that would be if it would be difficult to expand that volume in the future, then maybe consider starting with at least 50 gigs. Eight to 16 gigs of memory is usually sufficient for postgres. I recommend that you run it on a separate server from your galaxy server. In general, the more things that you can separate out to different systems. The easier is to figure out what's going wrong when you have resource problems. You have fewer issues with things colliding with each other and consuming and competing for resources. So, strongly recommend running that on a separate system if that's possible for you. In the case of our training we're going to of course run it on the same VM as galaxy. For the configuration. In the galaxy demo file there's configuration called database connection. And this is just a URL is string that is past the SQL Alchemy and that's how it determines what the database is so in the default option here it's it's going to be the first SQLite file under database for postgres. You have this URL where you can specify a database name, you can specify the host or a different directory where the socket lives if you're running on a local galaxy, or local postgres database. You can also specify the user and password which is usually required if you're running connecting to a remote postgres database. So, when galaxy first starts up, it creates its entire database schema, and if the database is empty. When you upgrade galaxy sometimes we have to make changes to the schema and these are versions. And so, starting up galaxy performs a migration process or tells you that you need to perform a migration process which is done with this managed db script. Now, in the case of running galaxy or upgrading galaxy ansible this is taken care of for you, because there's a migration step built into the galaxy project galaxy role. There are some options that you can tune if you need to these. There are a number of workers that you can keep around in order to handle connections to the database that you can that you can increase if needed, typically, you'll only know if you need to do this if you start seeing your messages in your galaxy log. There's a nice option server side cursors so if you have a very, very, very large result from the database by default that will get sent back to the galaxy server, which then has to load the entire thing into memory. So, our noticing that your galaxy server processes are crashing because they're running out of memory. You can set this option, which keeps that the results set in in postgres on the postgres server, and it just iterates and returns sets of results as you need them. The downside of this is that it makes it slightly harder to see what queries are running at a given time. The downside is that it can drastically reduce the amount of memory being used on your galaxy server. There is an option to log when queries go over a certain amount of time or take a certain amount of time to execute. This can be very nice in trying to find performance problems with your galaxy servers. Figure out where maybe there's an index missing. We as the developers of galaxy try to make sure that there are indexes on every table that and every column that might need it, but there are occasions where you may find that that you're having problems with a particular query, and this can help you find those. It is also possible so when you install tools from the tool shed. The the records for those tools are written into two different places one is a file that you'll encounter called shed tool com.xml. And this instructs galaxy on how to load the tool. And then there's a number of tables in the database that are used to do things like show which tools are installed when you're browsing the administrator interface and galaxy to see which you know the list of currently installed tools. That install database can be separated out into a different database connection so it can be another postgres database, or in my case for use galaxy dot org I make this a SQLite database. Very low traffic low transaction so it doesn't there's no contention problem with with using it in production, like there is with using SQLite for the rest of galaxy and production. And it gives you a nice way to deal with the cases where tool installation may fail. Which was maybe a bigger problem, a number of years ago when that process was not as robust as it is today. But it means that if you back up this database and the shed tool file before you install a tool. And that tool that installation fails you can just restore the backup copies, which is pretty nice. It allows you to do some other things like bootstrap galaxy installation with pre installed tools that then you can ship out to some someone else with those tools pre installed without having to preload something in their postgres database. If you need to as an administrator work with model objects in galaxy directly this is sort of advanced thing. You can do so via the DB shell script that comes in the galaxy server directory. And here's just a little example of how you do that. But as an administrator, I find that it's much more easy to work with the galaxy, the database directly using SQL queries. And at the same time, a number of us who are pretty heavily involved in galaxy administration have gotten together and published our queries into this wonderful script called GX admin that Elena wrote that wraps them all up makes them very nice to deal with. You just run a single command from the command line, and it spits out lots of cool stuff about the about what's in your galaxy database, very useful for debugging problems when you're trying to help figure out what's going on with user issues. There is a training in the in the galaxy training network about it. And if you're doing a full galaxy admin course we will talk quite a bit more about GX admin. Thank you very much. Okay, we've got our answerable directory setup and we are ready to do some installing. The first thing that we're going to install and configure is postgres ql the database. And just a bit about how this configuration works. Postgres has its own user database separate from the systems database. So, and it has lots of different ways that it can authenticate users to the database. So if you're running on the same system, when you're when you're connecting to a postgres database on the same system, the easiest method to, to authenticate users is using the peer system, which, or peer method which just says if the user on the system has the same name as the user in the database then they are the same user and let them in with that password. So that's what we're going to do for the training course. Now if you're running your postgres server on a different server than the galaxy server you're going to end up doing something different probably using password authentication so make sure that you check the postgres documentation to understand how all that is set up. We're going to need a postgres user in the database that matches the user that we're going to run galaxy as the user that we're going to run galaxy as is just going to be named galaxy, and that user doesn't exist yet on our system. But we're going to use Ansible to create that system user and we're going to use that answer that we're use Ansible to create that postgres user. Additionally, we're going to use the postgres role to create backups of the postgres database. Normally these would be stored on some external system. In our case we're just going to stick them in this slash data backups folder and pretend that slash data is some shared remote network file system, where that this all gets backed up. If our VM crashed, we would be able to restore the backup. So let's get started. We need to create our group variables file, which we will be in and out of a lot in this training. So you'll be very familiar with it. So the first step will be to create the group vars directory. Okay, there's group vars, and then we'll edit this file. Group vars, galaxy servers dot yaml, and we'll paste in the contents. So we've got these options here pip virtual and command will ensure that when we create a virtual environment, which we actually create a few of them throughout this training one for cert bot. Later we'll use it when we do the training for training infrastructure as a service, but also for galaxy itself we make sure that we're going to use the Python three interpreter and not Python two. So that's what all three of these options essentially are for. And then, well, the third ones to install pip and make sure that we install the Python three version of that. Next, we're, we're telling the Postgres QL objects role that the name of the user that we want to create the Postgres server is named galaxy. And then we've also said that we want to create a database named galaxy owned by the user galaxy. Now, how do we know that that these variables that we're defining this Postgres QL user objects users and Postgres QL objects databases actually do that anything useful. It's because in the documentation for the Postgres QL objects roles role it explains how to use these variables to to do these things to create users to create database databases and so forth. And then, so you'll also notice that the variables that are used by a particular role are typically prefixed with the role name. So the Postgres QL objects role uses variables that start with Postgres QL objects, and the Postgres QL role starts with variable names that start with Postgres QL. This isn't a hard and enforced role but any role that that I write. I make this the rule that any variable that a user will set that affects that role should begin with the role name. Okay, so next we're going to configure the the Postgres QL backups that we talked about. We're going to do. We're going to set the directory where these are stored to slash data slash backups, and then we need a local directory where it installs the scripts that are used to initiate the backups. And it keeps a a working copy of the in in progress right ahead log, which is a postgres specific term for where it stores transactions that haven't been flushed out to be backed up yet. So, we're using a little bit of Ansible magic here to make this work again we've got these two curly braces that denote that this is a variable expansion. So, you might be familiar with other languages where you'd expand a variable using, you know, a dollar sign or something like that, but in Ansible. Ansible uses a templating language called ginger to, and variables and ginger to are templated with these double curly braces. Anytime that a double curly place brace like that is going to start the variable name or variable value. It has to be quoted, right. So if I were to remove these quote marks, then the YAML interpreter would get confused, because in YAML, you can have a dictionary like this. This is, this is a dictionary under the under the variable named foo where the, with that has a key name bar and evaluating best right so this is a dictionary so to differentiate between a YAML dictionary and this ginger to template. These are the quote marks around anything that starts with this curly brace. Okay, so, then what else is going on in here. Well, we've, we have a string till the, till the Postgres, and then this pipe and expand user. What it does is it takes the till the Postgres, and it turns it into the actual path to the Postgres users home directory. So we use this syntax so that, you know, this, this training can work on both CentOS and Ubuntu, which don't have the same home directory for their Postgres users. So if we had hard coded the path to what what it is for Ubuntu, then it wouldn't work for CentOS. So that's why we use this expand user here. I should mention that expand user is what's called a filter. There are a ton of filters, enhanceable that do different things to the thing that they're modifying. The thing that they modify is on the left and then the filter is on the right and you can chain lots of these different filters together. You will see more than throughout this training. All right, we are done in the variables file for now. We will be back many times. The next thing that we need to write is our playbook file. So do that. If you actually seed seed into the group bars directory, make sure you back up and you're back at the, the slash, you know, the tilde slash galaxy directory. Okay, so open up galaxy.yaml, which is a new file. We need to do these things. We need to add a pre-task to install the Python 3 Postgres library called PsychoPG2. We need to tell the playbook to run a role called galaxy project.Postgresql that we've talked about. And then we need to run the Postgresql objects role. And we need it to run as a different user. So you may recall that in our hosts file, we told it that the user that we're going to connect to this host and run as is Ubuntu, right? But when we go to add users to the database, the Ubuntu user or the root user, either of those actually have permissions to interact with Postgres. We need to do that as the Postgres user. And so that is what these become and become user options are for. I'm going to grab the contents of this, stick it into my galaxy.yaml. So what this says, this is what's called an Ansible Play. It's at the top level inside the playbook. And it starts off as a list element. You can have multiple plays inside of a playbook. But here we've only got one. So there's a list. This has hosts. And it says, this is telling Ansible, okay, connect to all of the hosts listed in this hosts directive. So what's listed here is galaxy servers. And if you recall, that is the name of the group that we put our host name into for our galaxy server in hosts file. So that's where that comes from. And we have a become option down here to with the value true. This tells Ansible that we need to become a different user when we run these play or run these tasks. And that, and that user is on the next line, it's going to be root. So, obviously, when we're installing packages on the system. And so forth, that has to be done as root. So this is how it's set up in here. So we have a pre task to install the Python three psycho to PG to package. So this task has a name this is just a script of thing. And then there are a ton of Ansible modules, these are the things that ship with the Ansible software that do the actual work of modifying your system copying files, all the other things. And then are just collections of tasks that invoke these modules. And so the, the module that we're using is the package module, and it takes an argument. It takes many arguments but the only one that we need here is name. You can find the Ansible documentation on the package module to find out what the other options are and how to use it and all that kind of stuff. So, so we want to install this one package, and then we want to run these roles. And you can see here, first we put the PostgreSQL role with no just just by itself on this line. The reason for that is that it's sort of a shorthand if you don't have, if you're not needing to specify any other arguments for this role, you can just put in the string galaxy of the role name galaxy project up PostgreSQL. But down below here because we need to run the PostgreSQL objects role as a different user than the default one to find up here. So we have to preface the role name with this role colon. So we've got role and the name of the role and then down here we have options to run this role as a different user than what we run the rest of the playbook as. And that's it for the playbook file. So what should we expect to see in our folder. Looking in here we should have an ansible.config galaxy.yaml that group forest directory, posts file requirements.yaml and then the roles. You might have noticed that we've had some true false and so forth in here. And there are many ways to specify boolean values in YAML. I tend to stick to true and false. Vim likes to highlight them better than, for example, yes and no. But any any of those values will work with any amount of capitalization. But I try to be consistent. Okay. Now we're ready to actually do the first step of modifying our system. So we're going to run ansible dash. Galaxy.yaml. And then let's see what comes out. So this is the process of actually running an ansible playbook. You can see the things that were written into the play then become these these output lines and ansible is very good about telling you what things it did or did not do and what things it did that cause changes or caused changes. And so the very first thing that always happens on a host is to gather facts. That means it figures out things about the operating system and so forth that you can use that it creates a bunch of variables and ansible that you can refer to later. The next thing then this is our pre-task that we defined to install the Python 3 Psycho PG2 package. So we named it install dependencies and that's what shows up right here. So it says changed. That means that it did the thing that we asked it to do. And then next it moved down to the roles section of our playbook where we said to run the galaxy project.postgreSQL role and it went to that roles tasks slash main.yaml file looked in there and said okay what do I do. You'll see that role is set up for multiple things multiple operating systems because it gathered these facts it figured out okay this is a Debian based system Ubuntu and I'm going to read the tasks that need to be read to install on a Debian system. Runs through a number of things here. It says okay we're not using these PGDG packages so I'm going to skip that and then just install using apt the Postgres package that's native for this version of Ubuntu. So changed because it did that. All right. Now it looks up a bunch of information okay what version of Postgres got installed I need to know some stuff about that. I've got some internal variables to figure out what it should do next, but what it what it ultimately decides to do is to set some configuration options that are default in the role, and also to specify how the backups are supposed to work. And we can take a look at those in a minute. It installs the pghba.conf which is Postgres QL's configuration file for access control to decide what users can connect to the database. And then it runs through the tasks of setting up backups so create backup directories. So this is that tool to Postgres on on the Debian system resolves to of our live Postgres QL. And under there we have a backups and then we have a bin directory and active. And this created that slash data backups that we specified for the place where our backups would be saved. So it installs the scripts used to create backups, and then configures Postgres to actually create the backups, made a cron job to actually schedule and run the backups, which happens nightly by default that you can change that to any way that you see that you wish. That's for the full backup, but the backup system is constantly making incremental backups in between your full backup so that if your Postgres server were to die at any point in the day. It should be recoverable up until pretty much the moment that it crashed. Okay. So we schedule keeping a copy of the currently active Postgres right ahead log transaction log. And then we then we're done with the Postgres role. So this is the last task in the Postgres QL role we just make sure that it's running which it is automatically when you install it on Ubuntu. So what we're going to do is the next role that we define in our play, the Postgres QL objects role. And what this does is create users. You can see there's a number of tasks that it runs through in here and drop databases well we don't have any databases that we told it to drop so it's not going to do that just skips that. So it creates a user so it creates the Galaxy user. And then down here it's going to create a database, the Galaxy database owned by a Galaxy user this is all the stuff that we defined in our group variables file. And then, at the very end, because changes were made to Postgres's configuration file, a handler triggers and he's always run at the end unless something forces them to run sooner. A handler runs that tells Postgres to reload its configuration. That was done in the Postgres QL role, but that happens at the end because handlers always happen at the end. So your output should look something like that. So you notice I walked through all of those changes. I don't normally look at absolutely everything that happens in here, but anytime that you see yellow or red text in here, which means changed or error. You absolutely want to be making a note of that and make sure that the changes that were made were the changes that you expected to be made. The system is only going to do what you tell it to but sometimes you tell it to do the wrong things. Okay, why didn't we use dash i in our Ansible command. We don't have to because we specified the inventory file in our Ansible.config. So that's taking care of for us. Skipping no host matched. You can see this sometimes when you run the command Ansible playbook command essentially nothing happens. Why is this, it happens usually because of a typo somewhere, and there are a list of troubleshooting actions that you can take if that happens. Okay, so let's take a look at what our system actually looks like now, what did what did Ansible actually do because I can tell you that it did all this stuff but we ought to actually confirm that it did make some changes. So, first of all, I mean we can say that there are now Postgres packages installed I didn't show you that they weren't installed beforehand but they weren't. And they have now been installed and in Etsy Postgres QL 12 main, you can see that there's some configuration files now. Seriously, there's this pghba.conf with a funny name on it, and this is Ansible's scheme for backing up files so many of the roles that we use are designed not to just overwrite the files that it changes entirely. Usually, it will try to make a backup first. So that if something went wrong, you can you can fix it. So, let's see what actually changed in here. Okay, so you can see the old version had this big long comment at the top that we we got rid of all the roles I write I stick this blurb in here so that if anyone edits the file by hand on the system. Hopefully this will scare them off to let them know that any changes they make are most likely going to be destroyed the next time someone runs Ansible. And so what else it removed. It removed this network listening block because we don't need that for our server. If, if that was necessary. Oh sorry this is replication, which which we're not using. If that was necessary then we can add it back in using the role. Okay. Additionally, in the conf.d directory. We added a 20 or the role has added options to enable backups. So you can see this wall archive that means right ahead log. It means that they'll be archived will turn that on and then we'll specify this script which was installed by the role as the script to use every time that we need the system wants to archive one of these right ahead logs. Additional options get set in here. It created this file but we didn't actually specify any extra postgres options that we want to set but if they're, but there is an option in the role to specify additional parameters and as you tune up a galaxy server you may find oh I need to increase the amount of memory that that postgres will use that's pretty typical I need to increase the amount of connections that it will allow. And when you set those options using the role, the Ansible role, they'll appear here in this file. Right. Additionally, we said there's a data directory. Not right not readable. We created for us this data directory, and in there created a backups directory can also see what can also see in the total to postgres directory that under backups. So there is the bin directory where the scripts got installed for backups. It's also already making the running copy backup of the active transaction log. That's what this file is an active directory. And this gets copied every minute out of out of the postgres data directory here where it gets backed up. We can also verify in the database that some of the things that we wanted created in the database are there. So let's take a look here. P SQL dash L run as the postgres user will show us that there is now a database in addition to the three that come with with postgres when you install the postgres and template zero and one. Now a galaxy database. And it's set to unicode encoding and all that good stuff. And it's owned by galaxy exactly as we told it to be. Finally, we can look at the galaxy user. So we'll just connect to the postgres server using the shell. And run dash D. You can see that there aren't any, any relations yet but you can see that there are that the galaxy user exists. Okay, so our database is set up and ready to go and we're ready to install galaxy itself. And we'll do that and we'll also set a few basic configuration options the same time of course we definitely need to tell it to use postgres instead of SQLite. But there are a few other things that we'll do as we go along as well. And we're also going to enable you is key mules. So you is key is the application server that starts and runs galaxy. And when and provides, you know that the interaction between the web server, the engine X web server. And the galaxy application, it's sort of the web component of that of the galaxy server, but it also does about a million other things. And one of the things that it does is the process management and inter process messaging, and we've leveraged this functionality, which it calls mules to separate the job running functions of galaxy into a separate service that will run under you is key on a normal galaxy server where you don't configure any of this stuff it's going to start up one process under you is key that starts the galaxy server and it serves web requests it handles your jobs. It schedules workflows it does all of these things in one process managed by you is key. So it's a good idea for performance reasons to separate the job handling functions of galaxy out from the web serving functions. And so that's what these you whiskey mules are used for. When a user clicks in the in the form to to execute a job that creates a row in the database for the job and then the whatever web server process under you is key receive that request. It signals to one of these mules to say hey. I have a new job in the database can you go deal with it handle it. And so then the galaxy job handler mule picks that job up and submits it to a cluster probably, although in our first scenario here we aren't going to connect a cluster. It'll just run locally, but either way, the handler, the job handler is going to be responsible for that job from start to finish. So, we will set up those you whiskey mules it's very simple to do and best practice to do it. So, we'll do that and then we need to create the the actual galaxy system user. So that we don't we're not running it as a good to or or root definitely never want to run galaxies route. There is some information here about how mules are not the only option there are other ways to run galaxy servers. So some production sites don't use mules. So galaxy.org and use galaxy.eu don't they do we do still run separate job handlers. We just don't run them as you whiskey mules. Be sure to check out the documentation for more information. So, let's, let's get to work actually getting galaxy installed and running. We're going to back in the in the galaxy directory in our home directory. And open up the playbook file again galaxy.yaml. And now instead of these, the single package that we need to pre install, we have a bunch more things to pre install. And the reason for that. I'm not going to get too deep into this. So, you know, ACO is necessary for a post or for Ansible to be able to run things as different users in certain scenarios, and then be zip to and tar and stuff are needed for the mini Conner role. And it is of course needed because galaxy itself is installed by cloning from get make is needed to install the, or build the, the client, the galaxy client application, and then virtual and we've already talked about. So, all those are needed, but by the way the roles specify that these things are dependencies so when if you were building this up by hand and not having me tell you, or the training material tell you how to do this. You would you would the galaxy project galaxy role for example would tell you that it needs these things as dependencies. Okay. So now we need to add some new roles. The pip role, the galaxy project galaxy role. And the mini Conner role. So, you can see here that become user is in fact a template variable again. It's not, it's not just galaxy, although the user that we're creating is named galaxy. One good practice to get into is to try to not have the same variable defined in multiple places right so the system user is going to be named galaxy, that's going to be set in our variables file, and we don't want to duplicate it here by typing galaxy right we're going to refer to the variable that we're going to create in the file. So that, if, if someday down the line we want to deploy a galaxy server and we had to change it the galaxy username something else. We didn't have to find it in multiple places and change it in multiple places. All right. Next, we're going to edit our group variables file and actually set those things that I was just talking about. So down here at the bottom. Let's paste in a stuff in green. Remember, now use release 22.01. So, what does this say, all of these variables control the galaxy role, all the ones that start with galaxy and then we have the mini condo ones that control the mini condo role, both of which we just added to our playbook. So, we'll go through each one of these the galaxy create users pretty self explanatory says, if the galaxy user doesn't already exist on the system, we want to create it. So the galaxy create privileges is a security best practice that we've built into the galaxy role. It means that things that are not needed to be writable by the galaxy user should be installed as a different user. In a, in a non privilege separated galaxy installation you might just install the galaxy server and all of its config files and so forth as the galaxy user. But then when the jobs, your users submit jobs they run on the cluster whatever those jobs also run as the galaxy user. The galaxy server operating on the web runs as the galaxy user so in theory if there were some kind of security issue with galaxy itself. That allowed us, you know, a malicious person to write to the file system, they could overwrite the galaxy code they could overwrite the galaxy config file, and do all kinds of stuff, potentially more damaging than what they can do. So separate privileges is going to install the galaxy code, the galaxy dependencies, the config files and so forth as a different user. In our case it's going to be root, and that way, if, if the worst case were to happen they couldn't be overwritten. Okay, so manage paths is an option to make sure that all of the directories that are needed that you know where the config is going to be stored where the server is going to start all that stuff gets created by this role. Occasionally, it's not really possible to use this if you run on on network file systems, where maybe the user that is running Ansible doesn't have permissions to create files on that or directories on that network file system. That's just a sort of a site specific option that has to be enabled or disabled depending on on what's going on on your server. Okay, galaxy layout is set to root there. We talked about this earlier that that there are different ways that you can lay out the directories that galaxy gets installed to but in our case we want everything under this slash serve galaxy, which is the next option Next we have a galaxy user. This is a YAML dictionary that where we have the name key and its value is galaxy and then the shells is bash. So this just is going to create the system user and Linux system user named galaxy and make sure this shell is set to bash. But you'll also see back in that playbook. If I were to open that. When I was talking about how you've got this galaxy user dot name template variable that refers to this galaxy user dot name key. Alright, galaxy commit ID here we're setting this to the release 2009 branch of the, the get repository that galaxies installed from their releases three times a year and they correspond so their branch name is going to correspond to the year and month of the release. And then force checkout will obliterate any modified files in the galaxy directory when it doesn't get update. Alright. Finally, we're going to pre install conda. And now when you start up galaxy it installs conda itself automatically but since we start up multiple galaxy processes at the same time in a production service, we pre install it just to make sure that there's no race conditions. With that, although galaxy has gotten pretty good about doing that itself. This is more of just a sort of a safety measure. Okay. Now we're going to set a the galaxy config option, which I did talk about before this is essentially the contents of the galaxy.eml file. We're going to paste it in first and then we'll talk about the things that that we've put into here. So just at the bottom of the group bars galaxy service file we're going to create this galaxy config. My terminal doesn't render the pretty emojis, but you can see here. And you might want to make some changes here you can change the brand. That's what will display at the top left corner of the galaxy website. You can change the admin users maybe to your own email address. And then the rest of this stuff. So the database connection then points at the galaxy database that we defined up here. So galaxy objects databases. The name is galaxy, and that's the galaxy down here. And then on a Debian based systems they change the default location of the socket that you connect to if you're making a local connection to the database. And that's what this bar run PostgreSQL is for. We've also said that we want to store data sets in the slash data directory so if you're calling said that we're going to pretend that slash data is some big network file system. That's mounted on to our galaxy server where we're going to store all of the big data because this is where all of the user data is going to go. The galaxy server itself doesn't take up very much space, a few gigs maybe for it and all of its dependencies, but the file path is where all the user data so hundreds, you know, of terabytes over time are going to live. I want to want to think about that one when you're setting up your galaxy server. Not really worth talking about check migrate tools at this point. Just just set it to false the tool data path. This is where tool when you install galaxy tools that need reference data that's managed outside of galaxy. That's stuff that the files that tell galaxy where that data is located goes into the tool data directory. And then there's this object store store by. So, until very recently, when data sets were created in galaxy, they were stored in the file system by number. In the database, there's a table called data set, and that just increments the first data set to ever get uploaded or created in galaxy is number one, second one number two and so forth, and the old way of storing data then was to create a file on the file system file path called data set underscore one dot that and in the second one is data set underscore two dot that and there's some some sub directories to make sure that you don't end up with a billion files in one directory because most file systems can't handle that. But the point is that they were stored by the numerical ID, but there have been some performance enhancements that we've gained by storing them as the UU ID that is generated uniquely for every single data set. And stored in the database. And so this option is very much recommended for any new galaxy server. If you have an old galaxy server you can also enable this for new data sets. You know, come talk to us to find out how to do that. And the ID secret is going to be a randomly generated string that's used to generate all of the IDs in in Galaxy so you want to make sure that this is something that you set this to something and not leave it as the default otherwise attackers can can guess certain IDs. All right. So you may notice that there is a variable in this file called galaxy tool dependency there that we have not defined in this file right. And the reason for that is that there are variables that get added to the Ansible runtime environment as the playbook runs. And so the first one of those I talked about already it's these setup facts that happen when you connect to the server figures out the operating system and all that kind of stuff. But then as you as roles get executed, those roles may set variables and in fact the galaxy project our galaxy role sets a ton of default variables, and this galaxy tool dependency Der is one of them. The galaxy mutable data Der is one of them. So that's why we can refer to these things without having actually specified them you can find all of them in the defaults file of the galaxy role to figure out what other things you can reference without having to set them. Vault ID secret is not set but we're going to do that ourselves shortly. Okay, so we mentioned that we're going to use mules to handle our jobs and we'll set up the configuration need to do that now I'm sorry that's still in the group bars galaxy servers file. But now we're going to add a you is the section. So let me paste this in here, and then talk about a little bit. So what I'm doing here is that you need to make sure that the you and you is he lines up with the G in galaxy should be in the same place. So I have two spaces in front of them. And this is because there's, there's you is the section of the galaxy config file and as a galaxy section. If we didn't have this you is the section. Then the role would put one in there for us by default. But we want to change some things and and most importantly, we want to change we want to add these mule options. So this is all we have to do here essentially to say hey start up a couple extra processes to do job handling for me. So we say mule, we have a list here of two entries and these are just the the file inside the galaxy code that actually starts and runs the galaxy server. So live galaxy main dog pie and then we assemble these into what you whiskey calls a farm. The farms name is job handlers and we say that mules number one and two so these two up here are part of that job handlers farm and by doing that, they automatically become job handlers. So there are a whole lot more options in here, the ones that you would have to change most often, or need to worry about the first one is the socket. This is the, the, the port that you is key is going to listen on and engine x, when we connect it to you is key in in a future step here is going to have to know this this port number 5000. So these are the processes and threads. So, these are the number of processes and threads that will be started for handling web requests so if you notice that it seems like your you whiskey server is struggling to handle the number of web requests that users are creating even you can increase these values. Okay. So we talked about how there is this ID secret. And, but we haven't actually created yet, and all it needs to be is some small random string that we're going to generate but to, but this has to be a secret and generally your playbook is going to be stored somewhere that it's not. It's going to be somewhat public. For example, the playbooks that are used to deploy use galaxy or use galaxy dot, you are all publicly available on the, on the web. And so, in GitHub. So because of that we can't put our ID secret just directly here in the file. But for this we use it ansible feature called vaults which is essentially an as 256 encrypted file that gets decrypted at runtime. You can run the playbook. But, but otherwise keeps that data encrypted so that it can't be read. So, we need to create our vault file. And the vault file is protected by password so the very first thing we're going to do is generating password. So we then I've done this with open SSL just randomly generate some base 64 text and stick it into this vault password dot text, yours is going to look different from this, but that's what it generated in my case. Next, then in ansible.config. So you want to tell it where that vault password file is, of course, just having it as a file in the playbook isn't secure it certainly wouldn't be secure if you committed it and pushed it to a public repository. So you want some other means to distribute this to the people who are going to run your ansible playbook. So in the training here we don't want to have to type this password in every time. And so we're going to stick it into a file, and then tell ansible just to read it from that file. And finally we're going to create the actual vault file itself that will create contain the ID secret. So you do that with Ansible vault create. So we've got just we've been dumped into it in editor here in an empty file. And so we're going to just put this vault ID secret, you recall this this variable name vault ID secret was the one referred to back in the in the regular group bars file. And finally you generate a random string I'll just use whatever is in the training this is fine. But that's that's my secret now if I look at that file. You can see that it is this as encrypted junk and not, you can't read the actual ID secret that I put in there, but I'm going to go to run my playbook that vault ID secret that was placed into that file will be substituted in here and read into the galaxy config in the ID secret option. Finally we just need to tell a galaxy or Ansible to load that secret file, which we do using this Mars file option. So, in a standard Ansible playbook layout, the things in the group bars directory are automatically loaded. If they match the group that the host that you're operating on is in. So, you recall that in the hosts file, we put open up the host file. In the host file we have our server name here, and it's in the galaxy servers group that's how the group bars galaxy servers variable file automatically gets read. But the secrets file is not automatically going to be read. And so for that. We add this extra bars files option to the root of our play. Here to be consistent with the training. And this tells it okay load load those variables, even though you're not in a group called secret. All right. And now we should be ready to run the playbook. And this is going to what's it going to do. What did we do. We added a new role. So it's going to, it's going to run the PostgreSQL and PostgreSQL objects roles again but there shouldn't be any changes. Now though we've added the pip role a galaxy role and the mini condo role and those are all new and should run after the PostgreSQL. So let's see. So far I don't see any yellow. That's a good sign. Nothing changed. Okay, there's the first thing that changed create galaxy user. So now there should be a system user Linux system user named galaxy. Next thing that happens is that we created the galaxy root directory. So we'd set that to serve galaxy slash serve slash galaxy. And then it created a bunch of paths under that. And you can see it created some privilege separated so these VN and server and config and local tools are all going to be owned by the root user. Whereas these directories the galaxy itself needs to write to are all going to be owned by the galaxy user. The next thing it did scroll back up here. And it was to clone galaxy. So there was no galaxy installed to begin with. And you can see after it finished it told us that said galaxy version change from nothing to this commit ID which should be the latest commit on the release 2009 branch in the galaxy repository. It created the galaxy virtual and it made sure that pip and set up tools were the latest version in that virtual and and then and then remove the PYC files that I mentioned quite a while ago. All right. So that it then created the galaxy configuration file galaxy.yaml and is installing the base dependencies so it's doing essentially a very large pip install of the all of the dependencies in galaxies requirements file. And this process will take a while so while it's doing that, we can actually have a look. What has been done so far so there's a serve galaxy. And in here you can see these directories inside of config is a galaxy.yaml and you can see the options that we put in there and not not exactly with the same spacing and ordering and everything but the options that we put into that galaxy config. And the variable in our group variables file have now been placed into the galaxy.yaml file galaxies config file. But there are some things in here that we definitely didn't define shed data manager config file we didn't we didn't define that integrated tool panel config, you know, metrics file whatever we didn't define these things and that's because the role does it for us. We have to set the paths to all these files that are not in the default place for galaxy would look for them because because we have a separate config directory outside of the galaxy directory. And so because of this it knows to set all these additional options for us. That way we only have to set the ones that we care about in our group variables file. Additionally, you can see that it substituted that ID secret in from the vault file. So that went successfully. All right, let's go back to Ansible now, which has done a bunch of things while we were gone. Can't scroll back that way and scroll back this way. Okay, so you can see, okay it installed the base dependencies that installed conditional dependencies. So for example it's installed the Python psycho PG to library into the virtual lab so that it connected that the database. And then it created the mutable configuration files it does that once when when the new galaxy servers created and then it should never do them again shouldn't overwrite these files because galaxy is going to start writing to them but it has to have them in place to do that. The next thing it does then is check to see whether or not the client is built. And it finds out that it is not built. And then it goes through steps needed to do that so it installs node j s those yarn, and then it runs the make a client, make client production to build that client application. This process will take quite a while. It depends on the size of the system that you're running on. But yeah, you'll have to wait a bit for this. Okay, and we're back. I cheated and cut out a little bit in between there while we were waiting. But as you can see, after finishing the client build. We're going to move pretty quickly on to the mini conda role installed mini conda. And then at the very end you can see that there is a handler that ran, but the handler actually is just a message that says restart or not implemented phase restart galaxy manually. Galaxy is not running yet. And the reason for that is that in a bit, we're going to set it up to run under system D. And doing so will automatically set up a handler for us. But as of now, the role doesn't know how to how we want it to start galaxy. So we'll fix that shortly. Okay. So we've looked at the galaxy config file already. And we can go and poke around a few other things. There's serve galaxy. You can see the galaxy server directory here, of course, is just the galaxy clone. There is a jobs directory created for us. That's where tools will actually execute. And then there's this var directory that contains a few things like config files that the galaxy is going to modify itself. And then there's the virtual end. So let's take a look in bar specifically this contains things like the galaxy tool dependencies so the conda that was installed is under here. Take a look there. And then in the config directory, these are the things like the shed tool cough where when we install tools from the tool shed their definitions will be written here. But right now it's just empty, empty and defines basically just the path where tools will be installed. The tools themselves, not the definitions. And that directory has been created for us as well. So it is ready to go with our galaxy server. We just need to get a web server in front of it so that we can actually access it. Oh, but before that, we need to actually set it up to run. We're going to use system D. So if you're not familiar system D is essentially the Linux in its system. Now, so it manages it's basically the first thing that starts after the kernel starts it manages all the processes that start run after that. So, we're going to schedule essentially galaxy to start up when the system starts up the same way that that you would for any other server process. So, this is done by the galaxy role so back in our. Clear this out and go back into our playbook and then in the group bars galaxy servers.yml we can tell the galaxy project galaxy role that we want to use system D to manage galaxy. Now this wouldn't be applicable on systems where you don't have the ability to start, you know, set up system services to run at boot. Or if you're not using system D something like that but it probably applies to 99% of galaxy servers at this point. So, we've added that one option that tells the role that yes go ahead and use system D. And then we'll go ahead and run the playbook. Again, and you can see, as expected there aren't changes to most of the things that are are being most of the tasks that are being run here so. You know, there are these options to basically everything that's happened so far is all been just no changes. Sometimes it does take some time because it's to verify that there is nothing that needs to be changed but yeah. Right now we're down to the point where it is installing the galaxy unit. That's that's system detox for the service file that describes the service that's supposed to be run. So we can actually look at, well, let me point out first that now it ran a handler that says galaxy meal restart, which is the handler that is automatically been set up for us. And now anytime we make a change to galaxies configuration or anything else that requires a galaxy restart, it can actually be restarted for us. So, we take a look at the system D system. The service is a service unit file and this defines tells system D how to run the service is a set the working directory to where the galaxy code lives, and then run you whizgy with the yaml options pointed galaxies config file. And we set some other options here the role documentation will tell you more about things that you can set but for the most part. So we're running for for our purposes. And now I do a system control status galaxy, it should show me that the galaxy services running. It's running with four processes which if you look back in the US key config section of our variables you would see that we had told it to run with four. You see in the service units defined as 16 gigs. So if it were to ever exceed system 16 gigs it would automatically be killed and restarted. But but this is great this is good news and you can see there's also the contents of the log file here from galaxy, which you can view with journal control so this is another part of system D it's it's log collection and service. So you do this with general control dash e galaxy, it the e tells it to go to the end of the log, and the you tells it what, what service to look at what service unit. And you can see this is this is galaxies log. And we'll go into detail about this but if you've ever started up a galaxy server server before this should look familiar to you but now instead of a file. It's in the system D journal. Alrighty. Another point that that is made here is that you can also restart galaxy by hand. For some reason, you can always do that with system control. If you're familiar with managing services with system D and system control. It's the same way that you would with anything else so so system control restart galaxy would restart it. I use journal control dash EU to open it in less and in view, be able to scroll around the log. If you want to follow it like you would do a tail dash F on a log file, you can do journal control dash f you. And now if there were things actually appearing in the log you would see it scroll. Of course, nothing's interacting with our galaxy server right now so there's no new log messages. One thing that's important to note about handlers. Now that we're using a restart handler is that if your answerable galaxy if your answerable playbook fails during the execution of it, and something happened that would have triggered the handler to run at the end of the playbook. And you go and fix whatever caused that error and you run the playbook again. And that triggered it was before the failure. Right. And so that task has already been done the first time you ran it. Now you run it a second time. There's no change and so it doesn't know that that the handler needs to run again. And so this is sort of a gotcha with running answerable playbooks and you just need to be mindful of the fact that if a playbook runs and changes some stuff and fails in the middle of running that there may have been handlers that that should have run afterwards and you're going to have to manually do the things that those handlers would have done. Okay, we have Postgres running. We've got Galaxy up and running managed by system be connected to Postgres, but unfortunately we can't access it yet. So you whiskey can serve HTTP for us directly but it's not a production configuration running engine X or any other reverse proxy, but we use engine X, gives us some additional features that are very nice to have compression caching static content serving and so forth. So we're going to move ahead with setting that up. The first step is to open up our playbook file and add a new role to the bottom. At this time we're going to add the galaxy project engine next role, which, as you might imagine is going to install engine X for us. So that's the only thing that needs to be done in here. And now we need to configure what that role is going to do. And as a case with with all the roles. This is done by setting variables in the group bars file. Let's open that up. And down here at the bottom, we need to add a new section that will control cert bot, which will get us some SSL certificates and the engine X role. Now you might be wondering why we're setting all these cert bot options when we only added the engine X role to our playbook. And the reason for that is because cert bot and let's encrypt certificates work in a very specific way where you have to have a running web server. And then after that running web server is serving on the internet. And then you initiate a connection to the let's encrypt servers using cert bot, and then it has to verify that that web servers is function, and only after that point can you set up SSL on that server. So we need to run the engine X setup tasks in sort of two stages one to set up the basic HTTP, and then a second time to run SSL. And as of this, we have integration in the engine X role, where it will run the cert bot role in the middle of it to, to allow this process to happen. So, explaining these cert bot options you can see that we have some control over when it, it renews certificates. So we have control over the authentication method for getting new certificates, and we're going to use the simplest, which is web route where a, where cert bot places a file on the web server somewhere that the let's encrypt servers will look for when they go to validate that you own that web server. So this is the cert bot, well known route, which is defined down here. And this is a path that gets served by the engine X web server. We're going to install cert bot into a virtual environment, so that we can run the latest version of it, and set a couple of options to make sure that the certificates stay up to date. This sort of important option under here is the cert bot environment, you can see that we have this set to staging. But the reason for that is that there are limits on the number of production, which is to say trusted real let's encrypt certificates that we can get in a time period. And so we use the staging environment, which will give us invalid certificates, but will actually be able to get enough for everyone participating in the training if we do this. Also it's just good practice to get use the staging environment, until you have a production ready web server that you know is ready to use. Because sometimes you have to wipe things out and fetch new certificates, and you don't want to be doing that multiple times in the production environment. Okay, we've configured cert bot to share keys with the engine X user that means that when engine X starts and runs, it'll be able to read the keys, the SSL certs and keys on the system. But when new certs are obtained, that engine X will be restarted automatically. And we say, importantly here that the domains that we want our certificates for are this inventory host name which is an automatically set Ansible fact that matches what we've put in the inventory for this server. So this is why it was very important that the DNS name be correct in our, our inventory file, because the certificate name has to match the name that we're going to put into our address bar when we go to view our galaxy server. If they don't match then that certificate is invalid. And finally we have an option to automatically agree to the terms of service here. So one in the engine X options settings, or section, we have an option to make it so that the proxy works. If se linux has turned on although it's not the case on the training instances that that are VMs that we're using here but but necessary on other environments. We also have then two options. One defines the virtual hosts that will be available prior to when cert bot runs that's engine X servers. And this is just a basic HTTP virtual host that will serve the cert bot, well known root directory, so that let's encrypt can verify that we own this server. But then all other requests will be redirected to the SSL virtual host, which is set down here under engine X SSL servers and this is the actual galaxy virtual host this is what serves galaxy. We want to disable any default server that runs most opera most distributions of Linux when you install engine X, it sets up a default server we don't want that. And we, we set one generic sort of option that goes into the HTTP part of the engine X server config, and, and this is to increase the maximum body size. So that you can upload large files, although in a modern galaxy application. The file is split up into chunks and then those chunks are uploaded. So you shouldn't have files uploading this large by default anymore but it's still a good idea to increase it because there are some circumstances where where it may be necessary to upload larger files without the chunking. Okay, so as I mentioned, the engine X role has this ability to run a role in the in the middle to fetch certs. And that's defined by this engine X SSL role option, where we've said we're going to run the use galaxy EU cert bot role. We give the pass to the SSL certificate and the private corresponding private key that will be installed by cert bot when it runs. Okay. And it is possible to use different SSL certificates that's explained in this section here. And it is possible also not to use SSL at all, although that's very much not recommended and since we're trying to help you set up a production galaxy server. We don't cover that. Okay, so that's it for the settings, but I mentioned that there's these virtual hosts. I don't have a patchy terminology here but these virtual hosts of a non SSL and an SSL virtual host that that need to be set up and we have to create the configurations for those. And so we do that by creating templates. So, this will be our first set of templates we need to make a directory. And the next. And this is in the galaxy directory so again if you CD into group bars make sure you CD back to the galaxy directory. I made this directory, and I want to edit this new file. Whoops, do this. Okay. And now I'm going to paste in the contents here and what you. What we have is just listen on the default web port, the server name is that inventory host name again it's our server. And then we have this special location here for the well known place that the lets encrypt servers are going to look. And that location will point to a place on our file system that is given in the cert bot well known route. A variable. So cert bot is going to place a file here, and then it's going to say hey let's encrypt, go check, because I put this file here, so I can prove that I own this web server. So encrypt will do that, and then we'll get certificates. Every other request then that doesn't go to the well known directory then is proxied to the HTTPS version of the website or redirected to that. So that is it for the basic configuration now to the more interesting one. And then we have a list. Galaxy, which is the virtual host configuration for our actual galaxy server. We'll just copy that in. And back up here at the top. You can see that. Anyway, we have a listen directive of this listing on the SSL port, and then most requests are going to be proxied to using this UW SGI pass option to the value from the galaxy config option our group variables file. So that UW SGI socket. So again, open that up. We have this variable up here that we set galaxy config, and then under the WS GI section, and then the socket. So again we've only defined this once in right here, and then we reference it in the template that way, I could hard code the port in here directly. And then I'd have to change it twice if I ever need to change that socket address. So most can most routes go to this using the UW SGI protocol. It's a native protocol that's supposed to be more higher performance than just a straight HTTP protocol between engine X and UW SGI. In addition, then we have these other routes where static content is going to be served directly by engine X so these requests to anything at under slash static so that includes like the style sheets and the JavaScript and so forth will not be served by galaxy. They'll be intercepted by engine X and serve directly. We've also got the welcome page, which, which actually lives at this sample but you can change it later and if you do then you just remove this directive. And then some setup for for interactive environments and then you know robots dot text and fabric on just to round that out. Okay. But that's it for the, the galaxy, the virtual host. And now we should be able to run the playbook. Again, this is going to run through all of the, the stuff that we've done so far, but it should mostly be or all be no changes until we get to the engine X steps. Okay, here we are in the engine X role. Okay, it's taking its time here to install the app, which is the right pick for for our operating system. All right. What else did it do disabled the default virtual host remember we set a variable that told it to do that set additional config options such as that one gig body size. And then it's installing the the non SSL virtual host configs. There, installed it and enabled it so that it was live, and then forced the handlers to run so I mentioned that the handlers always run at the end, but it is possible to force them to run earlier when you need them to. And in this case the role needs them to run earlier so that the non SSL config gets read and is available when the cert bot runs. Okay, so at this point it hands off control to the cert bot role, and then cert bot gets installed into a virtual and as we told it. It creates that well known directory, and then proceeds to request the shift certificate and you get the output here which is very helpful. And then you get to verify that it got what you want tells you where the paths of the certificates are and so forth. And then writes the script that will move, make sure that the copies of the certificates are in the right places where engine X expects them afterwards, and runs that script sets up a cron job to make sure that the script is up to date and renewed. And then, now you can see here control so cert bot has finished what it needs to do. And the cert bot role is finished and hands off control back to the galaxy project engine X role. And then proceeds to do SSL setup so configures options and so forth and then installs that SSL V host that serves galaxy. And then triggers those handlers to run once again reload engine X, and it should be done and ready. You can see the changes that were made. If you take a look and it's the engine X sites enabled. This is a Debian ism, I suppose, where there's a directory called sites enabled under engine X that contains sim links back to the same files in the site's available directory. And so if you don't have to remove your configs you can just add or remove sim links to control what which virtual hosts are enabled. Anyway, as you can see there's the redirect SSL and the galaxy virtual hosts. And if we take a look at galaxy for example, you can see this is that template, but with values filled in it's filled in my server name. And then the host in the port for the whiskey server, the whiskey connection to galaxy and so forth. So I should now be able to access my galaxy server. Let's see if it works. Yeah, 42. Okay, this is a good sign if it didn't make a connection at all that would be a bad sign. So we're getting a privacy warning and this is because we're using the let's encrypt staging environment your browser doesn't trust certificates issued by the staging and they're not, they're not trusted certificates, but in our case that's fine. And we're going to be working with these certificates throughout the training, because we don't have the option of getting correct certificates. And these are throwaway instances anyway so that wouldn't be a very good thing on our part to be using real ones. So, I click through and allow that certificate and you can see, here's my running galaxy server it's got the brand that we set. And everything looks fantastic. Okay. So, the next step will be to log into galaxy and you can you can do this. I know what your admin user was. So, I set mine to my own email address. If you just copied the, the galaxy config out of the training materials it's admin at example.org. But if you're not sure you can always look. And users group bars galaxy servers, and you can see here mine is my, my email address. So galaxy, when you install it doesn't pre create these admin accounts for you the first person to actually go and register it would would be the one who gets it, although. So, of course you have to use that that email address that you put in admin users. So, this in. Oh, no wait, I meant to register not long and very secure password. And there we go. Because I'm logged in as administrator. I now have this admin option up here in the masthead and click through. And now I'm in the administration side of galaxy. So we have a running galaxy server. The next thing that we want to do is set up the job configuration file. And in this exercise we're actually not going to configure it to do anything special in more than it would normally do with its default job configuration, but we're going to use this file a lot throughout the week, or the training and because of this and because it's one of the most regularly modified files. The first thing that you're going to do when you're setting up a galaxy server at your site is modify this file. We'll cover it right now. So this is the file that controls how galaxy actually executes jobs, galaxies designed has been designed to be very extensible and to be adaptable to almost any modern cluster or job execution system that is out there. And later in a later training we cover connecting galaxy to slurm, but there are many, many different options and you can look through the galaxy documentation to to learn about all of those options. So as mentioned down here, the default option is to run jobs on the local server which is our VM where we're running galaxy. But common options include drama which is just a library to interface with lots of different clustering systems, which support many things such as slurm condor, torque and grid engine and more. And then there is galaxies own remote execution engine called pulsar. And this is the only way that you can execute jobs on systems where you don't have a common shared file system between galaxy and the cluster. So by default, Galaxy expects that all of the data sets that it sees on the file system are also available at the same path on the cluster and sets up the jobs to run as if it was running on a cluster headnote and was writing a script, a job, job script and was going to execute it right there. So if you don't have that if your galaxy server is running somewhere else, and you can use pulsar to actually stage the job data in and out of your cluster. And that will be covered in another tutorial. Okay, so when you're looking at the job configuration file. It has three basic sections, there are the plugins, their destinations and their tools. The plugins section defines what job runners are going to be loaded job runners are the essentially the different interfaces between galaxy and your cluster scheduler so there's a job runner for slurm. There's a job runner for PBS there's a job runner for condor. There's a job runner for local jobs. And that's the default one. So, so, so those are loaded in plugins. Destinations are how define how jobs run through those plugins. So, every destination says I run with this particular plugin, and then it allows you to define additional parameters about the how jobs should run on that plugin so you can see that destinations are many to one, or many to many I guess, mapping so so you can say, I have five destinations. One is just the basic just submit to the cluster with one core, and whatever the default amount of memory is and that's it. And then the second one. It's like four cores and twice or more memory, you know, and so forth, but it's a bits to the same cluster the third one goes to a special queue maybe. And so that's what the destinations are for they're defining how jobs run on those different plugins or runners. And then tools the tools section allows you to map specific tools to destinations. So, most, most tools in galaxy fall into one, I mean two or maybe three categories. So the majority of tools in galaxy. They're not multi core they only need a single core and somewhere around, you know, between two and eight gigs of memory depending on the inputs and parameters and the tool. So, so typically I just map those to sort of a default destination that gives them a small amount of resources, but then there are additional there are some tools that typically run with more cores. They have better performance if you can allocate more course to them. So at times those tools need more memory, and then there's the sort of large tools that need huge amounts of Ram. And typically the same number of cores as the multi core tools but a lot more memory. So the way that you map these tools to different destinations that provide different amounts of resources is via the tools section of the job config. So this is what it looks like. This is in fact the default job config finds one one plugin called called local. That's its identifier, and it's a runner plug in and it loads this code in the galaxy code base which is the local job runner. And one destination and because there's only one of them, and there's no default defined on this destinations tag. This is the default, the destinations name is local, and it uses the runner plugin named local so that local here maps to this local there. And then we haven't set up any specific tool mappings because our galaxy server has no tools essentially. So we're going to go ahead and create a job config file for our galaxy server to use. So we need a directory called templates galaxy config. We should already have a templates directory but we probably don't have a galaxy directory. Just do make their dash P templates galaxy config. And we're going to create this job com.xml.jt and in here will paste the contents of the job file. And that's it. Now, we've modified this slightly to make it a little more obvious what's happening here. So we've explicitly stated that the default destination is the local destination we've also said that the ID of the local destination is local destination, and ID of the plugin is local plugin, just so it's clear. You saw in the previous example both in the destination but the ID and the runner were local and it wasn't clear what they referred to. So that's that's the reason for the differences here. So that is our job configuration file. And now. Oh, one thing to point out here, you can see that that we have this option in the plugins tag called workers equals for what does that mean. That means for the local plugin, it means that the local plugin can execute for jobs concurrently. And actually that's per handler so in our case, if you remember we ran two mules. So that's two job handlers. So that means that up to eight jobs would be running concurrently if if users submitted them for every other plugin that's not the local plugin that means the number of Python threads that will be started so that Galaxy job handlers can use that thread pool to do all the tasks related to setting up jobs and and and finishing them up. Sometimes there are things that block on IO in there. And so having a thread pool to be able to handle that work increases the throughput of handling jobs. Right, so we've created that file but we need to know, we need to tell the role where that that file is where it will be installed or where it should install it to, and then where to set in the galaxy config that that path to where it's installed So that will be in our group bars galaxy servers file and we're going to go down here underneath ID secret but above you WS GI in the galaxy config variable. So this has to go in the galaxy section here. Make sure you're in this galaxy section and then paste in. It looks like it doesn't line up. If you see in the diff here. That's just because the diff is hiding some stuff from us. But if you paste it in, you should see that it lines up this job config file needs to be at the same level as ID secret. Okay, and this says this tells configures galaxy to say for the job comp file in the config there at job comp that XML. And then we need to tell the role to install it. That means copy it from the playbook, the template that we just wrote template it into over to the galaxy server. So, I'm going to do that right under the galaxy config variable create a new galaxy config templates. This diff is a little bit off here. You need this galaxy config templates. And then copy the two lines below it and paste that in. And then the galaxy config templates and then you have this this list that starts with two spaces and then a dash, and then this source and destination, and I showed you this, this construct before. But, but just go over it again. So the source is that file that we just created with the job comp in it. And then the destination refers to galaxy config dot galaxy dot job comp file which is a job config file but the option that we just added here so again with that the path to the file on the on the galaxy server is only defined once. And we should be able to now run our playbook. And as the training said there at the very end we should see that it has restart galaxy for us, because we've added a new config file and we changed galaxy dot any time you change galaxy dot em all the galaxy project galaxy role is going to say okay I need to restart galaxy to reread that configuration. So what did it change change the galaxy config file you can see there, scroll off the screen and it copied the config template. So from from the playbook to here's that templated values serve galaxy config job count that XML. There's the handler. It ran not quite at the end of the playbook because there's this task in the engine x playbook that that flushes handlers so it runs then but that's fine. It ran. And also verify that the job config file was installed by running cat on it at the at its path in the server directory. And there's nothing, nothing to view in galaxy because nothing has changed as far as it goes but we can at least confirm that this step completed. So there are some additional options that we might now want set for production galaxy server, and we can copy these into our group farce. And then I will explain what these are. So, I mentioned the server side cursors in the database slides and the slow query log threshold. And then there is this option for engine x xxl redirect base. What this is is a nice feature in engine x where if someone makes a request to a proxy web application like galaxy is. And the response is just going to be a file off of disk. Instead of responding with the content of the file galaxy just sends back a header in the response. It's called xxl. xxl redirect like this, and and then engine x will intercept that, see that it's in the response and return the file that's in that header so it might say data, you know, data set, you ID, whatever. And then engine x will just send that file back probably way faster than galaxy which Python application would do it. And it makes it so that if you restart galaxy in the background that transfer that download that the client is making is not going to be interrupted, and you get this pretty much for free there's a option or there's a little bit of configuration that needs to go into the engine x file the virtual host for this, but this is the only option for the galaxy, or that has to be set on the galaxy side to make this work. Okay, we have some additional niceties here. Later on when we start adding dynamic job rules, which I'm not going to explain right here, we'll explain it later but when we add those. We can make galaxy watch and automatically reload anytime that those change so you don't have to restart galaxy just to update those that can be done with this option. In galaxy data libraries, which is data that is is stored centralized that anyone can access and important to their histories and galaxy administrators can add to that, and this path based option makes it so that they can just put in file system to import data directly into those data libraries quotas. So galaxy has a full quota system keeps track of the data that that users use and create known. And then you can set quotas on them through the admin interface and galaxy or via the API. By default users can't be deleted so this option allows them to the expose options are sort of privacy centric options. That it's possible that that this could expose certain information that some sites may not want to. These are explained in detail in the documentation. So docs dot galaxy project or click on admin and go to the configuration section explains these options. There are some issues with NFS attribute caching that can cause problems when jobs finish. And so this option here retry job output collection will if if that attribute caching is causing problems it'll just retry after a small waiting period until things work. And then finally, there is, there are these cleanup job options. Controls whether or not the job directories where where galaxies jobs run, get cleaned up after they're completed. So that by default, they will get removed no matter what whether the job finished successfully or if it failed or whatever. If the on error option, they'll be left behind. If, if the job fails which makes it way easier for you as an administrator to debug that problem. In addition to, to that, when you're helping users debug problems the, this allow user impersonation option is huge help, because it allows you to go into the admin interface as an administrator, and then log into their account and see exactly what they're seeing in Galaxy, which is a huge help when when doing support in Galaxy. Finally, we have this outputs to working directory option. So normally a tool when it runs in Galaxy as job writes its outputs directly to that files path that we set at the very beginning. Actually further up here, which is set to slash data. Instead, if you enable outputs to working directory. What happens is that the tools outputs will be written to the, the job directory instead of the files directory, and then Galaxy will move them back afterwards and this is a crucial setting for enabling running jobs inside of containers later, because the file path will be mounted into the container read only for basically safety sandbox, you know security reasons. And so outputs to working directory allows the tools outputs to be written somewhere that is a writable path outside the container. Okay, so I mentioned that you have to set up the xxl redirect option in the galaxy virtual host. And here is the way to do that so open up templates engine x galaxy dot j2 which we've already created. So just down here at the bottom inside the last curly brace, but outside the one above. We're going to paste in this. There's actually two directives here. So the first one does that xxl redirect that I talked about in the second one allows you to use GTN and Galaxy. They, if you load up the GTN, the galaxy training network from the masthead menu up here. And you enable, whoops, it's the training option, and you enable it. So if you have this option set in your engine x config that then when users view the tutorials that actually execute tools in galaxy. They can click the execute or the buttons inside the tutorial and it will run the jobs on your server, which is a really really nice feature for people following along trainings on your galaxy server. So, with all of that set, we can now run the playbook one last time, and all of those options will be set galaxy will be restarted, and it will be a really nice functioning production galaxy server. So what happens if disaster strikes. Well, we can simulate that and in some some trainings we do not not online asynchronous trainings but sometimes in person we like to do this. And so I can show you here in a minute. If I remove everything that we've done today, not the playbook itself but if I remove everything that the playbook is installed. For example, installing all of galaxy and stuff under underserved galaxy. As long as the things that can't be recreated such as the user data and the database, the Postgres database, as long as those things are backed up somewhere, or, or on a separate machine, then I can blow up this entire vm and restore it essentially just by running the playbook again right. So, if, if that data directory that slash data directory was actually on an NFS mount, and this vm ceased to exist. And we brought it back up completely wiped out. Like, all I would have to do is run the playbook again for store the database using the database backups that run the slash data, and then everything would be back to normal. A functioning galaxy server all the way back from the beginning. So, why not arm dash RF. As you can see, everything is gone. All of galaxy is gone. And all I have to do to get it back is run my playbook again. So that's the real value and this is, it's, it may seem complex, and putting together the playbook and making it all work is probably more time than it would take to just, you know, type app get install Postgres app get install engine x myself. And the whole point is, now everything that I've done has been recorded into this procedural set of actions that will be replayed again, exactly the same way, if I ever need to run them again. And every change that I make to my galaxy server, all of the config changes. They're all recorded. So, it can always be recreated exactly the same way. And this is this is a huge, huge, huge, huge, huge benefit. So this will take a while, especially as it needs to rebuild the client, which is a long process. But we'll move on to talking about some other stuff. For example, so maintenance, and time needed to run this running a very large galaxy server like use galaxy.org or.eu is a full time job. Although it, you know, we do find time to work on other projects as well. But for a smaller server that that maybe supports a lab, you could expect to need to spend maybe a day or two a month dealing with galaxy, you know, cumulatively cumulatively over the period of the month. As you saw earlier there's the galaxy commit ID option that controls the version of galaxy that's being run. So we set it to release 21 or 20.09. So from September of 2020. And upgrading your galaxy release is typically as simple as updating the galaxy commit ID to the current release or the one that you want to run. So generally you want to go and read the release notes and stuff to find out all the stuff that has changed, but it really is that simple when you're using the ansible playbook. There is a separate training that you can go find that covers upgrading galaxy in more detail for support. We have built in links in galaxy that go to our discourse site at help that galaxy project or, which is a great place to for user support. And of course, the for administration support development, and also user support. We have channels on Gitter that are pretty much always populated by people who are part of the galaxy community and the galaxy team. So we're happy to help and be here for you and your users. I talked about user impersonation. And so I won't cover that, but one final thing here when running galaxy on a cluster which almost all of you will certainly do. So the way that we've installed galaxy currently everything lives under that serve directory slash serve slash galaxy, except for the data sets which are on data slash data. There are these ansible variables that you can set that are defined by the galaxy role. And these are the ones that have to be on a shared file system so that that is a file system that's mounted on both the galaxy server and the cluster where you're running jobs. And it has to be mounted at the same path on the on those cluster on the cluster and the galaxy server. So that is the shed tools directory where galaxy tool shed tools get installed the tool dependency directory where like the tool, the dependencies of the tools themselves get installed usually conda packages. So that's the file pass. So that's that slash data, the job working directory, which is where jobs execute server, the galaxy code itself, and the galaxy virtual and directory. So if you set all of these two paths somewhere on your shared file system and redeploy galaxy, then it should be able to operate on a cluster. And there is some additional information about running on a cluster and shared file systems and so forth in this section that is good to read. If that's what you're doing. Okay. So, that is pretty much it if you have made it to the end here then you have essentially what is in this picture. And there is a galaxy application that's running. It's connected to storage. It's connected to compute, although our compute is just local at this point. It's connected to a PostgreSQL database. It is being served by by you, you know, she I, and that you, you know, she is proxied by engine X. So congratulations. And thank you for joining me.