 G'day, welcome to my introduction to the Tool Factory session. The Tool Factory is a Galaxy tool wrapper and it's designed for developers who are new to Galaxy. So in this talk, I'm going to cover from a developer's point of view tools as a general concept in Galaxy. I'm going to talk about programming this, the abstract tool interface that the framework offers. Then talk a bit about how automation for simple tools is possible and what it can do. And then I'll discuss some of the limits and also some of the aspects of the value proposition the Tool Factory offers to developers in Galaxy. So from the point of view of a user, if you ask them what Galaxy really is, they'll probably talk about the tools because that's really what they see. They're drawn to Galaxy if there are tools available that support their data and their analysis. Otherwise, not so much. And from the point of view of any given user, depending on their scientific domain of practice, any available tools that are seen in a Galaxy instance will be valued purely for their utility in terms of how they can be used as analysis components for their data and their analysis. So I'd argue that tools are really what users see as Galaxy. The framework itself is generic and most users probably don't even notice it. It's just there. And because of the way Galaxy works, tools effectively provide a kind of scientific domain flavor to any given Galaxy. Some tools are very, very generic. Nearly all users are going to use them such as the upload tool. Most tools, I'd argue, are probably fairly specific to some kind of data or some kind of scientific practice. And if you think about a mapper or an assembler, it's got utility in a couple of different broad scientific domains, but in climate science, perhaps not so much. So that leads to the idea that in order for Galaxy to grow by reaching new scientific communities and new users, we need domain specific tools. As you all know, Galaxy originated in genome science. And as a result, early on, the only tools that we were wrapping were genomic tools because that's all we were using. However, more recently, it's used in many other branches of science. So those doing climate science require quite a different toolkit from those doing natural language processing science. And the value of any given Galaxy to any given user will depend on the mapping between their interests and the capabilities of the tools. So I'd argue that if you're interested in seeing Galaxy grow, as I am, you'd be interested in attracting users from new scientific domains and attracting users from existing scientific domains by extending the range of new relevant tools. In other words, tools are what drive Galaxy uptake. The more tool builders there are, the more people who are capable of building tools, the more tools they're likely to be. And thus, the faster Galaxy can grow and spread, leading to world domination, which is of course our main aim. So in order to build tools, you as a developer, and I assume you're a developer, if you're watching this talk, you're going to have to come to terms with the abstract tool interface that Galaxy offers. The framework is very efficiently and effectively decoupled from tools and tool execution by an abstract tool interface. That interface allows virtually any Linux command line package to be wrapped as a tool. And the framework doesn't care. The framework is agnostic to all of the complications that happen in tool space, including things like the language that the tool package is implemented in, the domain in which the tool is used. None of these things are of any importance to the Galaxy framework. It just sees a tool at the end of an abstract interface. It just sees the interface. The wrapper that's written for the abstract tool interface for each package to turn it into a tool, more or less plugs the package into the framework. And that kind of pluggable interface specifies all of the things that the framework needs to know about the operation of the tool, including things like what dependencies have to be present, what the inputs are from the user's history, what happens to outputs from the package that's running inside the tool. And of course, most importantly, what the command line for the package that's emitted by the tool looks like. We're only going to talk about XML here but there is a whole infrastructure for a common workflow language about which I know nothing so I won't say anything. There exists already naturally because Galaxy has been around for 15 years, a pretty comprehensive infrastructure for making new tools. And the basis for most complicated Galaxy tools is the command section of that XML document, because it allows templating and Python code to be included in a kind of bash like environment. And the big advantage of doing this inside the command section of the document is that you have access to the tool namespace. Because the complex conditional logic is actually embedded in the document. It's very efficient because you've got access to all of the tools variables without any parameter passing they're just there in the namespace, and you've got access to Python so you can do some pretty fancy logic. And that logic is far preferred by experienced tool builders that's what they prefer to use and that's really what most tools are built on. And the project supports that style of tool building with Plenimo the new galaxy language server. And of course there are other tutorials in this training week, which you can take advantage of to learn about how that all that infrastructure works. However, today, and now I'm just talking about the tool factory. Now when we talk about tool rappers, we're talking about XML documents is one of them. This is the bow tie to rapper from the IUC repository. And there are a couple of features I'd like to point out just this may be the first one you've ever seen. So there's a command section. And that's where all all of the interesting logic is usually placed. There's some other places but that's the most important one. And in that section you'll find expressions like this it starts with a hash, and it's actually a Python expression. And that that is in line Python using Mako. And it's pretty handy for writing complicated logic. And the handy is the fact that this dollar sign preceding the word reference and the school genome dot own file refers to the characteristic or the property of own file as a variable somewhere deep in this form of the reference genome variable. So we've got access to all sorts of variables that are part of the namespace they're part of the tool they appear on the tool form. And that templating makes this whole process pretty efficient because the template that that variable is there when this code is executed, because that's when the tool is executing. Okay, so we've talked a little bit about tools and the framework from a user's point of view and from a developer's point of view. And I'd now like to move on to the, the main thing that interests me which is automating the building of simple tools as far as as possible. So to what extent, is it possible to generate that XML wrapper that I showed you for a fairly complex tool. Well, we can't generate those complicated ones. But if you think about what's needed and for the tool factory, what I what I found is that there are really five things that you've got to think about as well as lots of other things but the first thing you need is something that's going to do the work of the tool, the workhorse the thing does the calculation or the manipulation of input data and writes the output files in the tool factory the focus is on working scripts that is things that you've proven work on the command line with small data sets. It will also handle content dependencies that can be part of the script, or that there can be no script. That's an option in the tool factory but the focus is really on scripts because this is where you can put a kind of logic that you would need to write by hand for your tool if it requires complex conditional logic involving the parameters. The second thing you need aside from the workhorse. There are some inputs you need to know what the script will consume in terms of inputs from the users history which data files are going to be needed. And for each of those you should have a small test sample because every Galaxy tool should have a test and the tool factory will build one for you if you supply. In fact it won't build the tool unless you supply samples of each input data set. You also need to know what the script is going to emit. Because the tool factory or manually you you have to provide a route for those newly created files to be to appear in the users history when job execution completes. You need to know what the user is going to be able to supply to the script some some variables may be user controllable others may be built baked into the script. Finally you need the some logic and a pass criterion for some kind of automated test because because every good Galaxy tool has an automated test at least one. So let's think a little bit about how we might tackle something trivial like the demonstration Hello World program that everybody writes for the first time they need a new programming environment. We need to be a script. So we could create a text file called Hello dot sh containing echo Hello dollar one all in quotes and we could test that in the shell by typing because the reason we want to test it in shell sorry is that if the tool if the script does not work. The tool will not work either it can't so the tool factory will not fix broken code unfortunately pull requests are being accepted. So if you run bash hello sh tool factory on the command line, you should see hello tool factory because that's the first parameter, it should be substituted here and then this string should be echo to the output and so you should see that in your shell. So given that you've got a working script. So what inputs do we need for Hello World well we don't need any files from the user's history. We do need an output, because we want to produce a new text file, which contains the emitted string. We also need at least one well we only need one parameter, and that will be the user supplied string that follows the word Hello in the output. We need to build a test and the logic of that test should be that the output from the tool when it's run with the default value should be exactly the string Hello tool factory. So, given that we've got all that let me just run you through a quick demo that there's there's another one online but I'll do one live because that's, that's one of the traditions of, of galaxy products. I'm in the tool factory appliance it's all detailed in the introductory tutorial in the GTN, so you can get your own. And I'm going to fire up the tool factory and very quickly walk you through the Hello demo by naming the tool I'll just call it Hello, we don't need any content dependencies because we're going to use bash, which is pretty much always available. We could use a condo depend in fact bash is available as a condo dependency and if you really want to fix the version of bash in your tool, you can, you can just put bash here and it will, it will always use the same the most up to date version. The script will be, as I said before this trivial Hello dollar one exclamation mark for added bonus points. And the tool factory is now asking how do you want to pass the command line parameters for your script. Well we need positional parameters. We now get to the input section of the tool factory form. We don't need any input files from the user's history. We do need an output file and that's going to be called Hello out. And it will be of type text. And it needs a position on the command line, but we're going to use a tool factory trick which is to put STD OUT standard out. And the tool factory will take the output from your script and put it into this output file in the user's history when the script runs. We don't need any more of these things. We do need one command line parameter of the type that the user can set. And that's going to be called say hello to and it will be a text string and its default value will be tool factory. The user will need some information about what this parameter does so we'll put a label on the form next to the text box which would say say hello to and help goes here if the user needs any. And in this case the positional is going to be position is one because we want this to be dollar one in the script. Or good we don't need to fiddle with any of these will give it a synopsis this will appear beside the name in the tool menu. And for what it does we can put we can put explanatory text here. And all of these I'll show you where all these fields end up on the form. Okay we press execute. It's just a normal Galaxy tool here it is running. It's not quite a normal Galaxy tool but it's basically a Galaxy tool. And if I click on the icon of the output XML. I can see a pretty normal Galaxy tool this is not quite as complex as the bow tie to as the example I showed you. But all this has been generated and it contains all of the stuff that we put on the form. And I hope you find that that's quite interesting. However, let me show you something really interesting particularly if you're if you know something about Galaxy, if I click on analyze data. You may notice something's changed, namely in this tool menu I've got a new sub menu and that sub menu has a new tool. And if I click on that new tool, I will see the tool that we just created the text here is say hello to that came from the form. The what it does section has put more explanation here because that's what we typed into the form. And it's just a normal Galaxy tool why don't we why don't we run it. I'll change this just to show you that it really does work by saying hello to myself. And what I expect is that when the generated tool that we're running now is completed. I should see hello Ross. Let's check. Yes, it appears to have worked. So here's the output from a tool and it's a redoable normal Galaxy job it's got all the job it's got all the normal things that Galaxy tool has because it is a normal Galaxy tool it's just been generated by another Galaxy tool the tool factory. Now, if you want you can rerun the generating job and change it. And so we could, for example, add something like echo. Hello, dollar one. And we could add something like goodbye. And if we add another parameter, let's go down here and add one more command line parameter we've got one we're going to have another one, and this will will be the goodbye parameter. And its value will be Ross. And the label will be say goodbye to what put any health will make it parameter to and once again we don't need to do much. I'm just going to execute this and I didn't change the tool name this time so what happened is that the tool that's already installed in the tool factory appliance is going to be overwritten with this new XML, which has a new parameter say goodbye and and a new script, I've actually updated the script. So this time, when we say, when we execute the generate tool, it's still saying hello to the tool factory I'm going to put this round and say goodbye to all your troubles and execute that. I'm pretty sure you could guess what I'm hoping the execution of this revised tool will show Ross goodbye all your troubles. Well, it works. So that's really the tool factory in a nutshell. It's really an entirely trivial and confected kind of example, but I hope that that's convinced you that the tool factory actually does automate generating trivially simple scripts, it can also generate more complicated ones but we'll get to that. So what is the tool factory well now you know it's a tool that runs in Galaxy that runs in a special Galaxy for reasons that that are explained in the repository the GitHub repository. So it's a tool that generates new tools effectively inside Galaxy, the developer is responsible for specifying all the elements that match the script that's being wrapped, and they must match precisely or obviously won't work. The script has to work. Otherwise the tool won't work. But when you press generate when you press execute the tool factory will generate a new script that will generate new XML with all those IO and parameters baked in. When you specify them on the form. It's immediately installed, which is probably one of the more interesting aspects of the tool factory. And you can run it immediately to get instant feedback on what your form is going to look like what your tool, how your tool is going to behave from a user's perspective. The, the probably one valuable aspect of this all is the redo button because it's quite unusual to have an integrated development environment for tools. I mean the, the, the, the, the, my goal is to produce a really simple way for newcomers to build new tools and this is probably about as simple as it's ever going to get, because you can use the tool factory, generate a tool, modify it, regenerate it, modify it, generate a new tool that the possibilities are, of course, open ended. The tools that the tool factory generates are perfectly normal Galaxy tools their first class Galaxy tools no different from from once they're written by hand. They contain all of the form settings and all of the inputs and outputs they functionally equivalent to anything that's written manually. And if you have wrapped a useful script not something trivial like hello world. It can be converted into a tool shed ready archive using a utility that the clinical test utility that's built into the tool factory appliance. The converted archives the finalized archives contain the generated test and they are ready for the tool shed to be shared like any other proper Galaxy tool. Okay, so training material for the tool factory is available in the Galaxy training network. And there are introductory and advanced tutorials in the developer section. The tool factory is supplied as a tool factory flavored Docker Galaxy stable appliance so it's easy to pop up and throw away when you're done. There are samples provided in the appliance in our script in Python bash pearl, and even Lisbon prologue. For those of you who know what they are old people like me. There are features in the tool factory that are explained in the advanced tutorial including how to configure repeats and select some collections and demonstrations of simple filter tools. And I guess the most important thing is that if you're, if you're not a developer if you don't write your own scripts the tool factory is not going to do you any good and the training materials won't do you much good either because you have to bring your own programming skills it's assumed that you're an experienced script writing developer, because otherwise the tool factory doesn't do much. The design of the training is unusual it's not like most of the other Galaxy training because writing tools are so open ended it's a self study guide. And you can learn you start out by learning by, I think the best way to start up is by changing the samples and see what happens and regenerating the sample tools with modifications, and then you can start bringing your own scripts and and wrapping those by providing appropriate parameters. And that way, I think pretty quickly, if you're not already convinced that it could be useful with a bit of experience a bit of learning by doing, you'll quickly evaluate whether it's going to be any use in your work for Galaxy. Alright, so we're getting to the end I'm just going to make a small pitch about the value proposition and the limits of the tool factory. Obviously it has profound limits because it's an automated code generator. It's not a skilled developer. And a skilled developer can do much much more and cover many more requirements than the tool factory can. It can only automate certain constrained requirements. The complex conditional parameter logic like is needed for things like the sample that I showed from from the IEC. You're going to need to do manual coding. Now, the interesting thing about the tool factory aside from being useful for learning about and building simple tools is that the complex parameter logic. The manual coding, whether or not you're using the tool factory, that can actually be expressed in ways other than the normal way of building Galaxy tools up to date, namely writing the complex logic inside the tool document itself as Mako, Python, templating. There's nothing stopping us from writing that complex logic in our or Python or whatever takes you fancy. The cost is that the parameters have to be passed. So comparing what's done for complex tools now. It's possible that many of the existing tools could have been constructed using other scripting languages. It's just logic so it can be moved to another language. And I guess the point that I'm trying to make is that many people who are new to Galaxy are capable of scripters, they're perfectly capable of writing useful scripts, but don't yet know how to write scripts that involve templating inside the tool namespace. Well, for those newcomers, the tool factory offers an opportunity to write complex logic in a script. And exactly how far you want to go with that just depends on what suits your style of development. Certainly the tool factory, once you've gone to all the trouble of filling in the form, the tool factory will take care of any underlying parameters that need to be passed. You'll still have to, if you go this route, you'll still have to pass those parameters in your script. But that's, you know, that's not necessarily a big deal. And if you can't write a complex script using the conventional tools, you may be able to do it in the tool factory. It's less efficient than templating because of the need to deal with the parameters. And it is more familiar and possibly more accessible to more developers than the existing conventional infrastructure. The really good news is you can ignore everything I've said, and, and, and just choose whatever suits you best because ultimately that's what that's normally what developers do. So takeaway messages. Just to summarize the tool factory, it's a galaxy tool, it's an unusual one because it makes new tools from scripts. It only runs in a specialized environment, which is that the, the appliance that's provided it won't run outside the appliance, because it installs new tools, which is forbidden normally in Galaxy. It allows you to use, if that's what you want to do, it allows you to use Galaxy as an integrated development environment. It's kind of clunky, but oddly satisfying, in my opinion. And it certainly potentially expands our collective tool building capacity, because we're introducing an easy way to build simple tools through the Galaxy graphical user interface. And turning it into a kind of clunky idea. So this potentially helps to speed up tool suite development for new scientific domains. And it may also be useful as a learning tool because as new developers come on board. So we can start out with, with the GUI kind of Galaxy approach to building tools that can build some simple ones hello world, and more complex than that. And looking at the emitted XML I think will be a boon for learning the dark arts of, of Galaxy tool wrapper manual construction. So I'm going to stop there. Once again, as I always do I'd like to thank you for using Galaxy, and just say that I hope that the tool factory proves of some use in your work. Okay, thanks for watching.