 Hello everybody and welcome back to the Galaxy server administration course. My name is Simon Gladman and I am one of the Galaxy administrators of Galaxy Australia and I work at the University of Melbourne in Victoria in Australia. In this session, we're going to be going through monitoring Galaxy using telegraph influx and Grafana. To start off today, we are starting at training.galaxyproject.org. And you'll see this website. We're going to go to Galaxy server administration. And then if we scroll down, we can see here, Galaxy monitoring with telegraph and Grafana. And we're going to click on the hands on tutorial here. Hopefully everybody has already watched the slideshow. If not, I suggest you do that first. Okay, so just a bit of an overview of this tutorial. Some of the things we hope to be able to teach you is how to monitor Galaxy with telegraph and how to set up influx DB, how you can make graphs in Grafana. And importantly, how you can be alerted on important metrics in Galaxy. For example, you may want to know when your Galaxy server is doing something or he's broken and Grafana can help alert you with these kind of things. Some of the objectives for this tutorial are we going to set up influx DB, we're going to set up telegraph, we're going to set up Grafana and we're going to create some charts in Grafana so we can monitor what's going on inside Galaxy. Some of the requirements that we hope you have completed already. We hope you understand what Ansible is because we're going to be making extensive use of Ansible again. Hopefully you've completed the Galaxy installation with Ansible. If you've already completed the Galaxy monitoring with GX admin, that's good if you haven't yet that doesn't matter. I'll be going through how to install GX admin in this tutorial as well. But if you have already done it, feel free to skip that part. This tutorial will take around about two hours to complete. And as I said before, there are some supporting slides to go with this. Okay, bit of an overview. So as a Galaxy administrator on a large Galaxy server, I am very interested in monitoring what happens inside Galaxy. It's my job to keep the Galaxy server running and so I have to understand how many jobs I'm processing, how many users I've got. If there's any problems with the web server, if there's any problems with any of the other parts of the Galaxy, and then be notified of those problems so I can go and fix it. And because my Galaxy server is funded using public money, I also have to report on the use of my Galaxy server to the funding bodies. And so Grafana and GX admin and telegraph all help me do those kind of things. Okay, a bit about data flow. Every computer that I use, every virtual machine that I use as part of my Galaxy server, be it the web server or the database server or any of my storage controllers or my worker nodes, they all report data back to my monitoring database. And so on each of my nodes, I run a data collector called telegraph and sends it all to a central collecting point which runs a time series database, a specialist time series database called InfluxDB and all of the data gets put into that. And then I use Grafana to provide a visual interface to that data and I can query the data from Grafana and make all these really nice charts. So a bit of a thing about InfluxDB. So this is the database that I use to collect all of my data that's coming off telegraph. It's a time series database and it's been designed specifically for storing time series data like monitoring and metrics. So everything that goes into this database has a timestamp associated with it. And that's really cool because it means we can set retention policies like we can say easily say keep this data for two weeks or keep this data for a month. And then don't worry about this series, but for this series or for this feature, I'd really like to keep it for a year. There are Ansible roles for installing InfluxDB. However, they don't support configuring databases or users or retention policies. Ansible itself contains several modules you can use to write your own roles, but there's nothing generic available. At UseGalaxy Europe, they wrote their own role for setting up Influx database, but it's not really reusable to be used here. And so if you plan to automate your entire setup, this tutorial will provide you inspiration. Of how to do various things. Unfortunately, though, it's not a one-stop shop for setting up monitoring for your particular setup. Okay, so the first thing we're going to do today is we're going to set up Influx database on our Galaxy server. So it's just a little database program that will run in the background and we'll be sending data to it. Okay, so the first thing we need to do is download the Ansible role for InfluxDB. And the way we do that is the same way we've been doing it all week. We add the Ansible role to our requirements and then install it. So let's do that. Firstly, I need to log into my machine. So I'll access H2, my machine name, which is cat5. And I'll give it my password. Okay, now as you can see, I'm logged in. I'll go to the Galaxy folder, which is where we have all of the Ansible files that we've been working on all week. Hopefully, everyone doing this has also seen all of these things. If not, I suggest you go back and do all the requirement tutorials. Okay, so our tutorial says the first thing I need to do is add this to our requirements.yaml file. So I'll do that now and add that InfluxDB, save it. Then I will use Ansible Galaxy to install that module to my roles directory. Hopefully you've been doing this a few times during the week. All right, so now we check to see if that's there. Yep, you can see here we have used Galaxy EU InfluxDB. Okay, we'll go back to Galaxy again and I'll clear the screen. Okay, now because you don't necessarily want to put this InfluxDB on our Galaxy server in a production setup, you probably have another virtual machine or another machine sitting around somewhere just running Grafana and InfluxDB for us. And then we'd get everything to send data to that. But however, in this case, because we're doing training, we only have one machine available to us. And so we'll be setting up InfluxDB and Grafana on our Galaxy server. But we still want to create a new playbook for it anyway. So we're going to create a new playbook called monitoring.yaml. Monitoring.yaml. And obviously, if you don't want to use Vim as your editor and you want to use Nano or Amax or whatever you'd like to use, feel free to use it. I'm comfortable in Vim, so I'll be using it. Right, and posts monitoring true. And then roles is use Galaxy underscore EU.influxDB. Okay, so this is our playbook for the monitoring. And you can see we're going to work on host monitoring, but at the moment we don't have any of those hosts in our host file, but we will add them soon. Become true just means that if we need to, Ubuntu can do pseudo commands, so we can install things like app packages, etc. And then the role that we want to run here is use Galaxy EU.influxDB, which is the one we just downloaded. Okay, so we'll save that. Now we need to add the monitoring group to our host file. So if we have a look at our host file at the moment, you can see here that we have Galaxy servers and we have our Galaxy server listed here. And then we have answerable connection equals local. Now that's great. But as I said before, we may not want to run this playbook on our Galaxy server. We may want to run it on a different server. And so we're going to create another group in our host file. In this case, we're just going to be adding the same machine under that. In the future, if you had another machine available that you wanted to run influx and Grafana on, you could just change the name underneath it. Well, so we'll do that now. In first. Okay, so we're going to add a group called monitoring. And we're going to give it the full full name of our machine, like the one above answerable connection is local. Not it's running on the local machine. So that's fine. That's all we need to do. All right. Okay, but that's good enough for me. Now the next thing we need to do is run this playbook. So we will answerable playbook monitoring. And because we need to specify the user will put minus you you want to. All right. As you can see, it created a user and a group for influx. And it's installed the repository into our repository, our app repositories list. And now it's installing the dv package out of that. And something failed. Let's have a look and see what that was. Ignore the error. Didn't install the influx dv client. Okay. So basically what's happened is we've set up a I'm not sure what that error was. It's set up an influx dv server and this dv server is listening on port 8086. It's currently unauthenticated, but it's only listening on local host. So we don't really care. But so if we send anything to local host 8086, that would be sent to the influx dv. Okay, we can access the influx dv service just by typing influx influx. And we can say show databases. There's an internal database. We can say use internal. And we can say show measurements. And you can see here, there are a bunch of measurements. All right. But basically influx dv is installed and running and listening for us on local host 8086 and anything that we send to it. In its correct format will be stored. And then we can then query it using SQL later on to pull data out of it. Okay, so let's quit and get out of that. Okay, clear the screen. Right. The next thing we need to do is install and set up Grafana. And Grafana is going to be our visual interface to all of our metrics. It's going to be the web application that we use to look at all of our stuff. When we create a visualization in Grafana, we create a thing called a dashboard. And dashboards can each have multiple graphs on them, etc. A lot of the use galaxy star servers. So galaxy main in the US, galaxy Australia, which is the one I administer and galaxy in Europe. We've already created a whole lot of data dashboards that you can just download and copy and then use them on your own server with a bit of modification. There are some really nice examples of dashboards that we're talking about from public galaxies and you can have a look at them. Because I built the usegalaxyorg.au ones, I'm going to click on this. It's actually the URL for this is stats.genome.edu.au, but I'll click on this link and show you some of these. As you can see, the first one that comes up is a summary of our users, etc. You can see how many jobs have run, how many users we have, how many datasets we're holding on to, etc. And then if you go up to here at the top and click on the top, you can see here we have a lot more. If you click on this one that says galaxy detail, you can then see a lot of detail about my galaxy server. It tells me pretty much everything I need to know, including where things might be slow in my web server. And so I can see if there's problems just by looking at this. Another good thing is if I go to pause the loads, I can see the load on all of the machines in my cluster. I can see all of my network. I can see the disk use. I can see the slurm states. I can see how long they've been up for, etc. So yeah, a lot of things you can do with Grafana. All right, so I'll go back to galaxy monitoring with telegraph. And use Galaxy EU has a very similar page to that. And because I'm in Australia, sometimes it takes a bit of time to load. But basically here's Galaxy Europe's summary page. And you can see what their loads like, how many jobs they're running, how many active users they've had in the last 10 minutes, etc. And a whole lot of other things. And this is really good for looking at our Galaxy servers and understand how they're being used. Okay. So to install Grafana onto our Galaxy server. We once again entered our requirements.yaml and we add in another yet another answer will roll. This one's by a mob called cloud alchemy. Grafana. So we're going to copy that. And we're going to add it to our requirements.yaml. Save. And then we're going to run, where are we? Answerable Galaxy install. Again. It's like we have done a lot of times this week. And that will then install Grafana. The answer will Grafana roll into our roles directory. Check to make sure it's there. Here it is at the top. So yeah, it's been installed. Fantastic. Go back to the Galaxy and I'll clear the screen. All right. Now we need to add this to our monitoring playbook. Not the Galaxy playbook, the monitoring playbook. Let's do that. Cloud alchemy. Close that. Now Grafana, the answer will roll for Grafana needs a few variables to be set. So some of the variables that needs to have set. What URL are we going to be operating from? So where is Grafana going to be located? And in our case, it's going to be located at our inventory host name Grafana. We also need to set some security things like a username and a password. And we're not going to use password as the password. We're going to change it. And then we need to add a data source. And I'll explain to you what that means in a minute. But the first thing we need to do is we need to create a new file called group files monitoring.yaml and copy all of this stuff into it. So let's copy that. Make a new file. Group files monitoring.yaml. And then we'll paste in our things. So inventory host name Grafana. Now our inventory host name comes out of our host file. And that will be the long form or the fully qualified domain name of the machine that I'm currently running on, which is gat-5.os blah, blah, blah. Grafana security. We don't want to use password as the password. I'm going to change that to something that I remember like I like beer. It's not a very good password, but hey, at least it's not password. Okay. We're going to set up a data source for Grafana. And that means that this is basically so that Grafana knows where to go and get data from. We're going to call our data source galaxy. And it's a type influx to be so we're telling Grafana to go to influx DB to get all of its data from and it's a proxy connection. And the URL is the local host that port 8086 and it uses HTTP to access it. So influx runs a REST API and Grafana uses that REST API to access it. This we want to set this to be the default data source set a version. We don't want to be playing around with we don't want to make it editable. And the database that we want to get to is one we haven't created yet, but we're going to call it telegraph. So we're going to create a later on in this tutorial we creating a database inside influx called telegraph, and that's the one we want to point to. Okay, so let's save that. And now what we're going to do is run the monitoring playbook again. And it's a board playbook. I'm monitoring. And once again, we'll set your bunch of user. I know why that has an error. But never mind. All right, we're up to installing Grafana. It started. So it's down here we started Grafana. So it's a web application that's running. That's well and good. We have a web server running on this machine. And now what we need to do is tell engine X about our new web app called Grafana. And then we need to be able to point to it. So we need to update the engine X configuration in our templates. All right, so let's do that screen again. So we need to edit templates. Engine X galaxy.j2. And then at the end of it, we need to add a new location. It's engine X galaxy.j2. And here's our thing. We have our server. We have our root location, which will point to Galaxy. Some static locations, et cetera, et cetera. We'll go all the way to the bottom just above the last curly brace. And we'll add in another location. Learn how to type. Now it's important you put the slash Grafana slash there. Open curly braces. Then we'll do proxy pass and then the web address that we need to go to, which will be HTTP 127.0.0.1, which is local host. And the web app is accessible on the port 3000 slash. And then semicolon. Don't forget semicolon breaks things. And then close the curly braces. Right. And now we need to rerun the Galaxy playbook because that will restart engine X for us and update engine X. Okay, because we've just edited the engine X template. And if you remember from the installing Galaxy Ansible, hopefully that was a couple of days ago now, you'll remember that when we do an update to engine X, we need to run the Galaxy playbook for that to happen. So let's do that now. And once again, as you run to... And what this is going to do is just basically rewrite the engine X config and restart engine X for us. And we could have done it manually, but that's not the point of using Ansible in the first place. The whole idea of using Ansible is that we never do anything manually on our machine again. And that way, by using the Ansible code to do everything for us, we have a record of everything we've done. And it's really good when you combine the Ansible scripts with GitHub or Git repository because it means that you've got version control of your Ansible scripts. And you've also got a record of everything you've done and all the changes that you've made. And this is really important if you're in a production environment. All right, so just set the V-hosts and reloaded engine X. Okay. So now if we go to my Galaxy server, which is here, and I'll copy this URL onto a new tab. At the end of it, I'm going to put Grafana and then slash. Here we go. Loading Grafana. Okay. So we need to log in so we can just type in admin, which is the username. And my password is I like beer because I do. And I log in. Okay. And this is the homepage for... No, please don't add that to the last pass. Here is my Grafana web page. Okay. Well, that's pretty cool. Let's go back to our tutorial. What does it say to do? Okay. So the web application is running, but there's no data available to it. We'll get back to it shortly and we'll configure dashboards. Once we start sending data into our influx DB. So now we're going to install telegraph onto our machine. Normally, you know, in a production setup, we've installed telegraph on all of our machines and configure it so that it sent data to our central collecting central collection point. Today we're only going to install on our one machine and we're going to tell it to send data to itself or to send it to influx DB, which is running on this machine as well. Telegraph has a lot of documentation. How to configure all different types of inputs. It can support a large array of different inputs. Things like it will monitor a postgres database for you. It can monitor minus QL. It can monitor just hardware settings. It can monitor temperatures. It can monitor all sorts of things. But you can also write queries or you can write a program like a bash script or something and have telegraph execute it regularly, just like a cron job. And then whatever data telegraph collects from that executable, it will then send it off to influx DB. It puts it into a line protocol format. And this looks like this box here. Each measurement has a name attached to the measurement. And then it has some key value pairs. And in this case, our key value pairs are Country Germany, City Freiberg, temperature 25 and wind zero. And then the last section here is the time stamp. And if you recall me talking before about influx DB, that it was a time series database. This is really important. Okay. And you can see here that the temperature in State College was 33. That must be Fahrenheit because I don't think it ever gets that hot in State College. Okay. Yeah. So you can get all sorts of different data. This one here is the disk data. So you can see here that we have VDA2. It's XFS. It's on the host name. What mode it's in. How much space it has free in iNodes, et cetera. And then a lot of data can be sent in. And then once again, we have that time stamp at the end, which is really important. GX Admin, which hopefully you've seen before. If you haven't, don't worry. We'll talk about that shortly. GX Admin has a special type of query called an iQuery. That when you run, we'll produce an influx DB formatted output, which is really nice. Okay. But what we need to do first is we need to configure telegraph. So telegraph is going to be our little demon that runs on our machine and records lots of different metrics from your machine and sends them off to influx DB. And we're going to use Ansible to do that, just like we have with everything else we've done this week. And so we're going to first thing we're going to do is install the Ansible role, DJ Wasabi called telegraph. And the version is a 0.12. So once again, copy that. You need to edit our requirements. Now we'll add this into the end. Okay. Close this. And as per normal, we'll install it using the Ansible Galaxy install command. And it's done. So we'll check to make sure it's in there. You don't really have to do this, but I like to each time. So DJ Wasabi telegraph. Very good to download it. All right. So now we need to do a little bit of setup. And this setup, because normally we would install telegraph onto all of the machines that we have. It's a good idea to put any of the telegraph settings into groupbars all.yml. Well, a certain telegraph. The things that you want to monitor common to each machine like basically the machine statistics, like the CPU usage and the disk usage and how many processes are running and how much memory is being used, all those kind of things are common to pretty much every machine that you have. And so it's a good idea to put this kind of setup information into the groupbars all.yml. And that way, when you install telegraph on your machines, it will end up on all of them. All right. So we're going to edit groupbars.all and we're going to add this to the end of it. We'll copy that. Now, as you can see, I've already done the CVMFS tutorial and so I already have something in my order.yml. But I'll just go down here and paste this in. Okay. So some of the things we've just said to do are we want to download and install the latest version of telegraph. This one says here, where do we want to send out output and telegraph? So we want it to be a type influxDB because that's where we're sending it. And the URL that we want to send it to is actually an array. But the first thing in our array is the local host at port 8086. And if you recall before, that's where our influxDB is listening. And we want to send all this data to the database called telegraph. Okay. And then these are the input plugins that we want. So by default, we want to install the CPU plugin, the disk plugin, kernel, processors, IO memory, system, swap, net and net stat. CPU obviously monitors how much CPU is being used. Disk monitors how much disk is being used. Kernel monitors all the different kernel processors, et cetera. IO is a good one. It tells you how fast your disks are running. Memory is another good one because it tells you how much memory you're using, if you're using swap or not. And then net stats and net can monitor your internet traffic or your network traffic for you, which is really good. Okay. So now that's done. We're going to close that. Go back to our tutorial. You can see here it explains why we're putting it into the old .yml file, which I talked about before. However, because this one, here's the Galaxy server. We also want to add some telegraph configuration to our Galaxy servers .yml file. And we want to call this one telegraph plugins extra. Look up here, we said telegraph plugins default in our old .yml, because all of our machines are going to get these plugins. But then for this particular machine, our Galaxy server, we want to have telegraph plugins extra. And because we're putting it into our Galaxy servers .yml file, when we run the playbook, only the Galaxy servers group will pick up these extra settings. Okay. So we'll copy this and we'll add it to our group bars, Galaxy servers .yml. And we're going to go all the way to the bottom of all these telegraph settings. And we'll paste that in there. Okay. So what we're basically doing is we're saying there's some extra Galaxy things that we want to listen to. We want to use the plugin stats D. Now, Galaxy is pretty cool. If we ask Galaxy to start collecting stats, it will then run a little stats D demon and make all sorts of Galaxy metrics available at the stats D local host at port 8125. And we're going to use a metric separator of . And we're going to allow to pull in 10,000 messages at once, which is pretty cool. All right. So we just do this and this allows us to monitor everything that's going on inside Galaxy. All right. All right. So one last thing we need to do is we need to tell Galaxy that it needs to start collecting the stats D stuff for us, the stats D stuff. And so the way we do that is we just say in our Galaxy config, we just say that stats D host is local host and that we want all of our stats D, our stats D to be formatted as in flux DB. So we just need to add these two lines into our Galaxy configuration at the top of our Galaxy servers group file. And this will allow Galaxy to start collecting stuff from stats D send it to local host in the in flux DB format, which then telegraph can then find and use. Right. So let's grab that copy. Minutes need to go to the top of this file up in our Galaxy settings, which is up here and add it to here. Okay. So we've added two things stats D host local host stats D in flux DB. Right. Close that. And now the last thing we need to do is add telegraph to our, to the roles in our Galaxy playbook. So we'll do that. VLM. Galaxy.yaml. And the last one here, we will add DJ wasabi telegraph. DJ dash. Okay. And we'll save that. Okay. So now, now that we've added the role to the playbook we need to do is run the playbook. And this will install and configure telegraph for us. You'll notice here it's pausing a little bit here in conditional pausing. Sorry. Installed a conditional dependency. You see there was a bit of yellow flash past and that was it installing the stats D software. It needs to restart Galaxy for us. Okay. We're installing telegraph and it's completed. Okay. You see here we copied the extra plugins, which is important. So that means we can listen to the Galaxy routes and restart telegraph and everything should be good. Okay. So now if we do in flux DB, or in flux on our command line and then show databases, you can see there's a new database here now called telegraph. If we say use telegraph and then show measurements. You can see here that now we are measuring CPU disk, this guy, a whole lot of Galaxy stuff, kernel, man, etc. And if we say show series, you can see here that we are collecting a lot of data from the CPU about my machine. We're collecting all sorts of things, which is pretty cool. So my host name is a bit broken. Okay. So now what we're going to do is we're going to start monitoring our machine with Grafana. And the best thing we're going to do is kind of look at all the detail of our node. And we're going to borrow or steal a dashboard from Galaxy use page. Right. So that's the easiest way we can do it. So we're going to go to import the dashboard. So we'll go to use Galaxy use node detail dashboard. Here it is here. And it says what we need to do is look for the sharing icon at the top and click it. And then under the export tabs, click save to file. All right. So I'm going to go to here. I'm going to click share dashboard. I'm going to click export and then save to file. And you can see I've just downloaded a file. I was going to my downloads. Okay. Next thing I need to do is in my own Grafana server on the homepage, hover over the plus icon and use the import from the menu. Okay. So here's my Grafana server. We go to the plus and I'll go import. And then it says click upload.json file from the one we just downloaded. All right. So upload Json file and my downloads. Here it is there and that and we'll import it. And here it is here. And as you can see, we are already it's already picked the right host. Unfortunately, this is a weird host name. Probably needs to be changed in my host file, but never mind. And then you can see here that we have it's already looking at the load on the machine. It's doing the load, the average load per minute, the average load per five minutes and the average load to 15 minutes. Not collecting any CPU data. It's using some disk usage and some process states and showing us some memory, how long it's been up for, et cetera. I might just zoom in a little bit so it makes it a little bit less. You can see how many context switches the machine is doing, the interrupts, all sorts of extra things that we may want to look at. How many IOPS are currently in progress, et cetera. All right. Okay. Let's go back to here. So now that we've done that, our first dashboard is live. It's kind of like H-top on all of our systems. And if we had more than one machine that we've installed things onto, we could just click on this host up the top here. Click on that. And then say actually change to a different host and show me the detail of a different host, but we've only got one. So that's fine. And we've been collecting data for about what? A little bit now. It's showing three hours up here. If we change that to the last five minutes, you can see here, we've been collecting data for about five minutes now. Cool. And you can see the current disk usage, et cetera. So yeah, lots of interesting information on this page. Let's switch it back to three hours. Okay. Now what we're going to do is we're going to actually set up a dashboard. And we're going to try and figure out how to get Grafana to display something for us. So to create a dashboard, we need to click on the plus icon on the ingrafana. And then we're going to click add query and we're going to do a whole bunch of things. So let's follow this part of the tutorial along. I'm just to make it a bit easier. I'm going to drag this into a different window. Just so we can have them both open at the same time. Okay. So this is my Grafana. I'm going to add a new dashboard. And now we have an empty dashboard. It's called new dashboard. So let's click add query, but I'm going to add a new panel because that's the same thing. This is a little bit different than what you might think. But now basically here, we have that set up. You can see down there. And this is the query building interface. Now what we need to do is let's build a query. So from the default, we're going to select this measurement. We'll click on here and we'll set that to Galaxy. And then in our field value, let's change that to field mean. And then down here, we're going to group it by fill null, change it to fill none. And we'll add new. So we'll click on the plus and we'll say tag path. And then we will alias by tag path. Now at the top of the page it says last six hours. Let's click this to say last 30 minutes. So last 30 minutes. And then we will save this dashboard by clicking save at the top. And we're going to give it a name. And we're going to call it Galaxy. It's called Galaxy. Okay. So now you can see here we're clicking a bit of data. This is actually internal Galaxy job handlers monitor step. So we haven't actually done anything on a Galaxy server for it to record anything. But let's track to see how long it takes the interface to respond. Let's just move around Galaxy on our Galaxy server. Let's go ahead and analyze data again. I'll look at my file, download it. And hopefully this will start appearing in here. I'll click on refresh the dashboard. You'll notice that now we have a new thing. It says web root index. So basically what we did then was we loaded the page and took 48 milliseconds, which is that's pretty quick. Okay. All right. So this little graph here is just measuring how long it takes for all of our Galaxy server to do things. I'm actually going to set this up so that it will automatically every five seconds it will update. I'll change it to 10 seconds. Every 10 seconds this is going to update now by itself. And if we hover over that we can see that how long bits and pieces of the Galaxy interface are taking to respond to our requests. And this is really, really handy data to have if you're trying to figure out what's wrong with your production machine. Okay. Going back to the tutorial, we're going to add a second query to an existing graph. We're going to change the title. So we click on panel title and we go to edit. And we can change the panel title by clicking in this box here by saying it. Galaxy UI. That would do. That's probably a good title for it. Next thing we want to do is we want to add another query. Okay. So to do that, we go down here and we click on plus query. Right now we want to select measurement. Galaxy again. Field value is field mean. Field mean. We want to change the field null to field none. We want to alias by percentile. Oh, I'm sorry. We also want to add a new selector. So select. We want to add selectors percentile here. All right. Escape out of that. So we hit save first. Save it. Escape. You can see. Oh, it can make it bigger by the way by just clicking and dragging on it, making it bigger. So that's pretty easy to do. Okay. So now what we're going to do is we're going to style the graph a little bit. So we're going to go and we're going to make this 95th percentile line and stand out a bit. So what we're going to do is we're going to add a series over override. So we go back to you and we click edit. And then down here, it says series over rides. We can say add series override. All right. So the alias that we want to do is percentile. So that's what we want. And then we want to click the plus button after that. And we say we want to change the color. Down here and change. And we want to choose a color. I kind of like purple. Let's do dark purple. All right. And then we click the plus again and change the line width. And we'll change it to five. So it's nice and thick. As you can see here now, it's nice and thick on the line. It's really standing out from the rest of our graph. All right. Now in the next section down, we can edit the axes because we don't have any tiny units on it at the moment. It just says unit short. Well, let's change it to a unit time. Just down the bottom, usually. Time. And we want milliseconds. So now here you can see that we can actually know how long these things are taking to respond. And so our galaxies are running pretty fast. And 95% of all of our things are taking less than 80 milliseconds to occur, which is really cool. All right. You can change the scale. You can change it to linear. You can change the decimals. You can set the Y min and max here. You can even put a label on it if you want to. All right. Now we're going to change the legend. So to do that, we close the axes bit and go to the next one, which is legend. And we're going to say show yes. We want to show as a table and we want to show to the right. So now you can see we have this shown here on the right instead of all hodge podge on the bottom. Then down here we want to show some values. We want to show the minimum or show the maximum and we'll show the average. And so now you can see we have a table of all the different things that are occurring inside a galaxy server and their minimum times, their maximum times, and their average times. And this is really good troubleshooting for when things go wrong in your galaxy server. Sorry. This is the old version. We've got back up to here to panel. And we panel titles galaxy UI, but galaxy request times, galaxy UI request times. All right. So we save that dashboard. And then we'll hit escape. We're back under our, you can see our dashboard here. And so now we have, I'm just resize it a bit. Now we have our graph showing us all of our time series, web UI things that are going on inside galaxy. And then here we have a table of everything that's occurred that's currently being displayed. And we can see the minimum amount of time, the maximum amount of time in the average. There we go. All new bits of information showed up and all the times that it took for all of those things to happen. Okay. Going back to here again. All right. Enough of that. Let's get on to monitoring. Now monitoring is where we can get Grafana to let us know when things have gone bad. So we're going to add an alert to our graph. So we're going to edit this. We're back in our edit thing. And then on the left-hand side, it's like the alert icon. So down here, it's like the alert. And then we're going to click create alert. All right. So alerts consist of a rule. Usually with a name evaluated every in seconds for a period of time. The four can be an important parameter, which you can read more about in lots of documentation. But then we add some conditions. We say when we want this alert to activate. So we say average B is averaged over a minute. Goes above 50 milliseconds. Then maybe something's going on. And then we'll configure a notification. Right. So let's do that. So we're going to evaluate every minute. Five minutes or so. We want to when the average. So we want to make this one minute. So we want it to be a query B. And then we will say when this gets to above 50. So the conditions are average of query B and then we'll say when this gets to above 50. So the conditions are average of query B is above 50. And we want to do something. All right. We are going to configure a notification channel in this tutorial. However, what we can do is we could send a notification to something like a Slack channel. Send it to Githa or we can send it in an email or we could make it do whatever. We're not going to set that up in these tutorials. It's a little bit involved and complicated. But there's a lot of documentation that explains it for you. Okay. There's lots of different services you can use. So we're just going to save this dashboard for now. Save. And we'll escape. All right. So now you can see here there's a red line. And if this purple line goes above the red line, it stays there for a minute, then it will send an alert out. Okay. Final part of this tutorial. We're going to talk about how we can use GX admin and telegraph together to collect information from our Galaxy database. And we can send an alert out. Okay. Final part of this tutorial. And then send it into telegraph. And then therefore send it into infox and then put it into Grafana. Okay. So we need to install GX admin if you haven't already done it. If you've already done this section, if you've done the GX admin bit, then feel free to skip this. But I haven't done it on mine. So I'm going to edit the useGalaxyEU GX admin. So I'll go back to my terminal. I'll go to VIM requirements. And then I'll add in this. Save it. And then install it like we would have done with everything else. And then add the role to our Galaxy ML. UseGalaxy underscore EU dot GX admin. Save that. And then run the playbook. Okay. And there's GX admin installed. And we can test it by becoming a Galaxy user and running GX admin. And we'll try GX admin query. Let's do monthly jobs. And look, we've run a job. Fantastic. Okay. So GX admin is working. Go back to the Ubuntu user and clear the screen. Okay. So GX admin is installed and working. Now what we need to do is configure telegraph for GX admin. Now to do that, we need to give telegraphs and permission to run one GX admin, but also so that it can talk to the Postgres database. All right. So we need to add telegraph and password null into our Postgres object users. And we also need to add privileges for the telegraph user to the Galaxy database. But we don't want to modify things. We only want it to let it select stuff or view stuff. So we'll do that. To do that, we're going to edit our groupbars Galaxy servers. And if we go all the way to the top of this file, that's not the top. What's it doing? Here we go. All the way to the top. Here we have our Postgres QL objects users. And so we will add a new one to this telegraph. And we will add a password null. So that I need a password to login. Okay. And then underneath all of this, we're going to add a new section called Postgres QL object privileges. And I'm just going to cut and paste this. Cut and paste. Okay. Now we need to also configure telegraph to run GX admin. So under our telegraph plugins extra, we need to add some more stuff. So we want to be able to look at the Galaxy Q, for example, to see how many jobs are sitting in our Q. And so we're going to add this section here to underneath our telegraph plugins extra. Sort of copy that. And we'll go all the way to the bottom of this file down to our telegraph. Here's our telegraph plugins extra. And we'll paste that there. So now basically we've got a new plugin that's called monitor Galaxy Q. It's an exec type plugin, which means it's an executable that we want to run. The commands that we want to run. And we need to set an environment variable for PG database. It's Galaxy. And then we want to say, right for run user bin GX admin. We want to run an I query because we want the data that's coming out of GX admin to be formatted for InFoxDB. And the query we want to look at is the Q overview. And we only want the short tool ID instead of having the full tool ID. We want to run this every 15 seconds. We'll give it a time out of 10 seconds. So if it doesn't run for 10 seconds, it will just time out. And the output data format is InFox. Okay. So we'll save all of that. And then we'll run the playbook. So Galaxy Q2. So now we're I'm not sure if you saw, but just flashed past in yellow that we've added a Postgres user and gave it some permissions. And then soon we'll be adding the plugin to Telegraph. You can see there we've configured the extra plugins. And we've restarted Telegraph. And that's done. Okay. Now we're going to build a new graph in Grafana. And this will display our current Q. Okay. So we go back to our Grafana interface, which is on here. And we're going to add a new graph to this same, I might just make this one a bit smaller. So it's not taking up so much real estate. There we go. Okay. Just before we move on, there's actually an error in this Telegraph plugins extra. This needs to be user local bin gxadmin. And so we're going to go back to our Galaxy servers, YAML file, change that to user local bin, gxadmin, and then rerun the playbook. So it's located here. We're going to change this to user local bin gxadmin. And then we'll quit out of that. And then we will run our responsible playbook again. We're almost up to Telegraph. Here we go. And we've made a change. Restarted Telegraph. All right. Let's one check the seed Telegraph has run properly. And to do that, we'll just do a system CTL status Telegraph. And it's working fine. All right. Very good. Okay. So now we'll go back to our Graphana server. Here. And now we're going to add a new panel or a new graph to our current dashboard. And to do that, we click on this little plus the add panel button on the top here. Look at that. And we say add new panel. Okay. Now, back on our edit our panel. I think again, we're going to go down here where it says select measurement. We're going to change that to Q UE UE-overview. And we're going to aggregate it over some. Okay. And we're going to group it by time interval. Going to change that to time 15 seconds. And we'll change the fill to none. Okay. So we've added the time. We'll now add a tag of tool ID and another tag of tool version. So we go here, we say tag tool underscore ID. And then we add another one. Tag tool underscore version. We go to here. And we are in only spy. And we go tag tool underscore ID slash tag tool underscore version. We're putting it inside square brackets. Each tag inside square brackets so we know what it is. Right. And as you can see, we've got no data showing up here. And that's because we haven't run any jobs in our Galaxy server for a while. And so in the future now, if we run a job, if they don't finish within 15 seconds, there should appear on this panel. At the moment though, we obviously don't have any. So on the visualization on the over here on these settings where it says visualization. I'll just play in fact. Draw bars is no lines. Yes. Points. No. Down here. Mode is we want to have staircase or switch staircase on. And then we'll stacking in now values. Well stack. Yes. No value. As zero. No zero. And then we'll go down here and say. Another legend section show as table to the right. And. We'll show. Max average and current. And then. Hide series. With only nose or with only zeros. Okay. And then back up the top up here. Will change the panel title to be. Galaxy. You. Over. You. This tutorial needs to be updated for this new version of which only came out like a week ago. Right. So it's done. And you can see here we have our Galaxy Q overview or make it full width. There's nothing here because we haven't run any jobs. So if we go back to our Galaxy server here and start running some jobs. We should they should start showing up in this Q overview here. So let's I have a I've got BWA installed. So I might grab some data from Zenodo. Okay. So I'm going to import some data. I'll just grab a couple of fast day files. So I'll paste fetch data. I'll grab that from there and paste it and grab R2. So this is just a tiny little data set out of Zenodo. If you want to go looking for it. This is the hands on data from the microbial variant calling tutorial. But basically I'll set these to last Q. Sanger. And click start. Now hopefully this will take a little while to run and they weren't finishing 10 seconds or 15 seconds. And so it may appear here for us. You can see here we had an upload happen and you can see here we had one running. So there you go. All right. So now we're going to do a mapping. So I'll go to mapping, map with BWA mem. I'll map it against Ecoli. Even though they're not Ecoli doesn't really matter. And I'll run that. So we have another job here that runs. It's gone to slur. Here it goes. Any minute now how Galaxy will pick it up. You can see here it says oh look you're running BWA mem. Cool. So that's showing up as well. We have one of them running. And we can see here we had some some times appear. And you can see here that one of them took 5 seconds. I wonder what that one was. Let's have a look. Choose this column. It should be one that says 5 seconds. Oh yeah. There it is. So the slim job run it took 5 seconds to queue the job for us. So there you go. You get some interesting information from these things. Okay. So you can see here we've run some jobs and yeah. We've seen them appear and disappear again. We have done a lot of work in this tutorial. We installed a telegraph in FluxDB and Grafana onto our Galaxy server. We've configured Grafana to monitor our Galaxy for us. We've configured Grafana to monitor our machine. We've configured it to look and monitor our queue as well. So we can see when jobs are appearing in our queue. Some of the key points to take away from this that telegraph is really cool and you should probably install on all of your servers. Galaxy can set its own internal metrics to telegraph. Telegraph can run arbitrary commands like GX admin. In FluxDB you can collect all the metrics from telegraph from all of your machines and then we can use Grafana to visualize all of these metrics and monitor their values. And that is the end of this tutorial. If you could, please, could you fill in the feedback form for this tutorial just so that we know if you enjoyed it, if it was useful, or if there are any problems, could you please let us know here. Great, thank you very much and goodbye.