 Welcome to the second session of the BioXL building blocks lecture of this BioXL summer school. We are here in the second session, Computational Biomolecular Simulation War Flows using the BioXL building blocks. It's going to be 30 minutes with that and after that remember that we have a query and answer session. So please take notes and be prepared to answer a comment or suggest everything that you wish to know from the BioXL building blocks. After that we'll have a virtual break, 25 minutes, and then we have the final session today which is the hands-on session where we are going to work all together on a Jupyter notebook, trying to set up molecular dynamic simulation of a protein-league and complex using the BioXL building blocks. From the previous presentation you already know what are the BioXL building blocks, what is the philosophy behind the software library, you know that if I have convinced you that you can easily build workflows, biomolecular simulation workflows using these building blocks and you also know that you can control these workflows, you can orchestrate them using different workflow managers, graphical user interfaces such as iMoreGalaxy and also Jupyter which is the one that we are going to use today in the hands-on session and also HPC-based workflow managers such as PyCons or Toil. The tutorials that we have in the website are these four that I already presented in the previous presentation. They are all built in Jupyter notebooks. We like a lot Jupyter notebooks and I will tell you now in a moment why we chose Jupyter notebooks for our demonstration workflows. We chose Jupyter notebooks in general because we think it is a fantastic tool for training events like the one that we are going to have in this afternoon session. We think that it's really nice to be able to inspect intermediate results like the one that you have here which is a three-dimensional structure seen using the NGL viewer. It's really nice to be able to interactively modify the parameters for all your different steps in the workflow so imagine you want to change the type or the size of the box in your simulation or the number of steps in your final simulation. For example, it is really easy to modify the parameters and run it. You know that you can run cell by cell the different steps of the workflow. I know that because you already used Jupyter notebooks for the morning session in Gromax so you are already familiar with Jupyter notebooks and they also have the possibility to run the whole workflow in MyBinder. We are for the ones that are not familiar with MyBinder we will see an example in this presentation and I will tell you about this. But it's not just all of these general keys and functionalities of the Jupyter notebooks. There is also particular things that are really interesting for us for the BioXL building blocks. One of them is to be familiar with the BioXL building blocks syntax so with the Jupyter notebook it's really easy to understand the syntax and you will see that in a moment. It's also easy to understand how to build workflows using these Jupyter notebooks and you will see also that using the tutorials that we have already prepared for you. And it is also easy to understand how one can package the workflows with the quantum packaging system to export these workflows and make them reproducible and shareable in the community. We will see examples of all of that. We will start with the, be familiar with the BioXL building blocks syntax. Here you have an example and it's really easy. This is an example of one particular building block which is called EditConf. As you can imagine it is a wrapper of the EditConf tool of the Gromax AMD package. That is the one that is generating this fantastic box, system box for the AMD system. The way in which you use the building blocks is always the same. You have a first part when, where you import the module that you need. In this case we are importing EditConf from the BioBVMD category, MD module. After that you define inputs, outputs and properties. In this case we are defining this output property which is basically a path, a name of a file that will be our output file, our result. And then the properties which are the parameters of in this case the EditConf tool that is being wrapped by the building block. In this case we are selecting that box type, a cubic box and a distance to molecule of one nanometer. Remember that Gromax is working always in nanometers. This property is here thanks to this layer that we have, this input parameter adaptation. These parameters are adapted to the tool execution parameters, to the local input parameters. That means that this cubic will be transformed if you remember from the morning session to a minus C parameter for the EditConf and the distance to molecule will be transformed to a minus D. But this is internally, you don't need to know that, you just need to know the properties that you can put in the building block. And after that it's just launching the building block. Inputs, outputs and properties, it's always the same, the name of the building block and the launch at the end, always the same. So here the input, usually an input of one step of a workflow takes the output of the previous step of the workflow and that's exactly what is happening here. So this input, this output PDB2 Gromax is, as you can imagine, is the output of a previous step where we run a PDB2 GMX tool to prepare the topology for the Gromax simulation. We put the output path, which is this one that we prepared here, and we put the properties, which is this dictionary that we have prepared here. Always the same and you will see that inputs, outputs and properties in the rest of the workflow and the rest of this presentation. Examples of that. You have seen an example of an edit conf. Here is an example of how to mutate a residue. This is a particular example. We are mutating an arginine to an alanine that you can see here using the modular tool. So we are wrapping the modular tool. Here you have the import of the module. Here you have the definition of the inputs, outputs and properties. Again, only an output and the properties because the input we are taking the output of the previous step, which is this fixed PDB. We call the mutate with inputs, outputs and properties. Again, it's exactly the same syntax as before. Here is an example that I took from one real example. Actually, that's why you have here the different comments and etc. But this is a real example. But this is the same. Importing the module. In this case, we are trying to generate a cluster of an ensemble of structures from a trajectory using the Gromax cluster tool. We import the module. We define the inputs, outputs and properties here. In this case, we define inputs, trajectory and topology. We are taking our PDB as a topology. We define the output, ensemble.PDB, just a path to a file. And we define the properties. In this case, we define that we want to fit the selection to just the protein atoms and we want to output the ensemble of structures, just the protein atoms again. And we launch the cluster building block with inputs, outputs and properties again. Always the same. As you can imagine, now these are different tools, modular and Gromax, but less the syntax for the biaxial building blocks is exactly the same. More examples. It will be more difficult. In this case, what we are doing is to call different building blocks and joining them together. How are we joining that together? It's really easy. So we are taking the output of the first one and we are using the output of the first one for the inputs of the second one. Let's see the example. We are here importing the modules. In this case, we are importing a couple of modules. Actually, we only need this one, which is the ligand one. We define inputs and outputs and properties. And we launch the building block. In this case, the building block is a REST API. It's wrapping a REST API that it's downloading a ligand extractor from one PDB mirror. So in this case, we are downloading the ligand extractor, which has a code IVP, which is the ibuprofen, DRAC. This is downloading the extractor in a PDB format. We are using this input extractor is the name of the file that we are using as an input for the second step of the workflow, which in this case is adding hydrogens using the open bubble tool. Again, importing the module, defining inputs, outputs and properties. Here we are just defining one output. We just leave the properties by default. All the properties by default. We just need to add hydrogens. We put the inputs and outputs here. We don't care about the properties. And we launch the building block. This is generating the hydrogens to the ligand that we have downloaded in the first step. Here we are just taking a look with NGL viewer at the extractor with the hydrogens added. In the third step, we are taking the output of the second step. What we are doing is minimizing the energy, minimizing the extractor with the newly added hydrogens and atoms. Again, the same. We are importing the module, defining inputs, outputs and properties, launching the building block. Finally, the last step of this mini workflow, we are using AC pipe tool to parameterize, to obtain parameters for this particular ligand, the hypoprofen ligand, to be used in a molecular dynamic simulation. In particular, we are interested in the parameters for the GROMACS package. We import the module, we define inputs, outputs and properties and we launch the building block. This example is a mini workflow, really easy, as you can see. We are using the building blocks, wrapping REST APIs, wrapping OpenBable, wrapping AC pipe, different tools, but with exactly the same syntax. Basically, this workflow is the demonstration workflow that you have here as an example, which is called automatic ligand parameterization. What you have here is more information and you will see that in a minute. But the workflow is exactly the same as easy as that. The second point that makes Jupyter Notebook so interesting for us in the for the BioXL building blocks is that it allows you to learn how to build our workflows. This is basically done because we have already generated these tutorials for you and you can take a look at the tutorials. This is another one of the tutorials, this one here, protein MD setup. You can see that the Jupyter Notebook has a lot of information and documentation about what the workflow is doing, what is taking as an input, why the modules that we are using, the auxiliary libraries that we are using, etc. So it's a lot of information because of this markdown possibility for the Jupyter Notebooks. Here, again, information about one particular step. This is the step where you create the protein system topology using this PDB2 Gromax, PDB2 GMX tool. This is the building block wrapping the PDB2 GMX tool in Gromax and look at that. It's import the module, creating or defining inputs, outputs and properties, in this case no properties, and creating and launching the building block. It's exactly the same syntax. This is generating the topology using this PDB2 GMX tool. You can, of course, visualize the structures generated using, for example, NGL, and you will see examples of that in the hands-on session this afternoon, but you can also see intermediate results in a way of plots like this energy minimization. You can see how nicely it's reducing the energy of the system during the energy minimization in Gromax. Here is just pure Python. This has nothing to do with the building blocks, but it's just to illustrate you can mix the building blocks with pure Python, and this is included in the example in the protein MD setup. Actually, for the ones that are able to understand that, it's a special price because this is from one of our developers, Pawandrio, which likes a lot to mix map, list and zip from Python in one single instructions. It's not easy to understand, but this is basically taking the information from the energy, these numbers here, and it's generating this plot using the plot library. This is something that you can do with the Jupyter Notebooks, which is really nice. This particular example is the one that we are going to work this afternoon, and it's basically the sum of the protein MD setup and the ligand parameterization. We will parameterize the ligand, we will sum up the ligand endostractor, and then we will run the monochromodynamic setup of everything, the protein ligand complex. Finally, the Jupyter Notebooks allow an easy way to understand how to package the workflow and to make it shareable and reproducible. Let's see what I mean by that. If we want to start with a workflow, a bimodular simulation workflow using the BIOXIM building blocks, the usual thing to do is to create a new conda environment, that it's just one instruction like this, conda create. Of course, you need to have a conda installed in your system, but we also have tutorials for that. You can go to the website and you have tutorials to install the conda, and you will see that in Windows, in Mac, and in Linux Operator System. It's really easy. Once you have the conda environment, you create a conda environment with this command line here, with one particular name, the one that you want. You activate the conda environment, and you will see that the conda environment is activated because it will appear in your prompt, and then you install the packages that you need. For example, I'm going to install the bio-bibic chemistry module of our package. You know that we have the BIOXIM building blocks divided in different categories, so this is just one, the chemistry, and I'm going to install this. After that, I can just type Jupyter Notebook and open the Jupyter Notebook and start working with the Jupyter Notebook. This conda install will install, remember all the dependencies needed for the chemistry building blocks to properly run. Here you have all the modules that we have, and you need to look at the different bio-condas that we have here and to understand which one of the packages you need for your workflow and install these packages accordingly. This one's here. You can also go to the bio-conda repository and you will find all the bio-bibic packages in the repository. How can you export a workflow using these conda packages? Really easy. What we are doing is to generating one file, one particular file that is called environment.yaml, and this environment contains all the dependencies needed for a particular workflow. If you remember from the automatic legal parameterization that I have shown you just a minute ago, we needed the bio-bibic input output to extract the ligand from the PDB mirror. We needed the chemistry that is wrapping open-babel and AC pipe, and we also needed the common, because we always need the common for all the modules in the bio-conda building blocks, and then we just need nb-conda kernels, which is an auxiliary package to allow Jupyter notebooks to be able to see the conda environments, nglb to see the molecular structures, and the conda by itself. This environment.yaml is telling conda that all these dependencies need to be installed before running the particular workflow. We generate for each of our tutorials this conda environment, this environment file, and we put this file as you can see here in the GitHub repository, and that makes the installation on launch of our demonstration workflows as easy as these lines here. So for all of the tutorials that we have, you just need to clone the repository in your machine, enter into the folder which contains the source code, create an environment. How do you create the new environment? Using this environment.yaml file, which is able to install all the dependencies that you need for the particular workflow. So this conda create will install everything that you need. Then you activate the new environment, you just enable a couple of extensions for the Jupyter notebooks to be able to see the structures with nglb, for example, and you execute the Jupyter notebook with the particular tutorial. That's all the steps that you need to share your workflow built using the BioXL building blocks, as easy as that. And this environment.yaml is also helping us to use these demonstration workflows in platforms like the binder, the my binder that I was telling you before. The binder is able to just with a GitHub repository URL go to the repository, understand that this environment.yaml is the environment all contains all the dependencies for the particular workflow is able to read, this is able to install all the conda packages, and then it automatically starts the Jupyter notebook with all the dependencies already installed. So you can play with the workflow in this my binder public interface. This public interface, of course, being public, it doesn't have a lot of computational power. Sometimes you get the timeouts because they are offering this for free to everybody. And that's why we are working in BioXL to integrate a local my binder installation in our BioXL cloud portal. Our BioXL cloud portal will contain all the Jupyter notebooks that will be automatically deployed using this my binder in our cloud infrastructure in the YAML EBI premises. This is coming very soon. And we will use that to control the number of machines that we'll be able to deploy, the number of workflows that we will be able to work with at the same time, and basically to do type this type of training event like the one that you are today. So BioXL building blocks workflows. How can you start to create the workflows? I was telling you before, you need to create a conda environment. It's the easiest way to start using building workflows using the BioXL building blocks. Once you have the conda environment created, you launch a Jupyter notebook installed inside the conda environment, and then you start connecting the building blocks. You know that there's a unique syntax for all the building blocks. You just need to write them and connect them. I think that there's two different ways that you can start building workflows using the BioXL building block. One is to clone existing tutorials, such as the four that we already have in the website, and then start playing with the tutorials. It's a way to start understanding how you can modify the building blocks and how you can connect them together, or starting a new workflow from scratch. I will start with this one, which is easier, and now you already know how to do that, because all of our tutorials contains a set of lines on how you can install and launch the workflows. You know these ones. They are exactly the same in all of the different workflows. The only thing that are changing are the dependencies needed for the particular workflows to properly run. Here you can see that we have more modules. We have building block analyses. We have the BIVB2 model extractors. We have the BIVB2 run molecular dynamics simulations. You clone the workflow in your local machine, and you start looking at the documentation that is inside, and start playing. For example, the easiest way to start playing with this is taking the protein MD setup tutorial and modifying the BIVB code, which is the input parameter of all the workflows. The one that is by default is a Lysosine protein. If you run the entire workflow with this Lysosine protein, it will generate a system completely prepared and equilibrated to start running molecular dynamics simulation for this Lysosine structure. But if you modify this one, you can go and put something like Parapydoid kinase, a really complex protein, and see if this still works with the complex protein, or you can put a DNA, or you can put a protein DNA complex. So it's just modifying this BIVB code here, and you can play with the workflow. Of course, you can modify directly the building blocks. So in this case, for example, it's really easy. You take the properties of the building blocks and try to modify. You can modify the type of the box for the system, the molecular dynamic system, from QB2, wherever, or to Romberg, for example. You can modify the distance to the molecule. You can modify the ionic concentration, for example, of the molecule. You can, of course, modify the number of steps of the unrestrained simulation, the last step of the MD setup workflow. You can try to take different observables from the information that is produced by the Gromach's energy building block. Here we are extracting the potential, but you can extract the temperature, you can extract the density, just modifying the keyword here, and you can see what happens with that. Just a note here. We have, inside the building blocks, we have something that are a little bit smart. There is, for example, for the MDP. If you remember the molecular dynamics configuration file for Gromach's from this morning session, it has many, many, many different properties. Actually, it has so many properties that we have a dedicated webinar in BioXL. You can go to the bioxl.eu webpage and try to find it. Talking just one hour about different options for the MDP configuration file in Gromach, so it has so many different options. But there are some of these options that are repeated usually in these processes that are done in each and every molecular dynamic setup pipeline, which is the minimization of the energy, the NVT equilibration, MPT equilibration, and the unrestrained free molecular dynamics. We built these preconfigured MDP files for you, and you just need to type in the MDP property that the type of simulation that you want is a minimization, an equilibration, or an unrestrained simulation. And this will be prepared for you automatically. Of course, you can overwrite these properties then, these parameters here. For example, here we are overwriting this EM tool that was by default 1,000, and we put here 500. This is something that is inside the GromPP BioXL building block. You cannot build in blocks, of course. Instead of modifying, you can start trying to play adding new building blocks, and you can do something a little bit complicated, such as introducing a simulated annealing equilibration step, which is not present in the tutorials that you have in the website. Just putting all the needed key comments into the in the MDP configuration file. A little bit of caution here, because this is only running in GromEx versions 2019.2 or higher, because there is a back on the simulated annealing. But this is an example that you can do, or you can, for example, introduce a mutation in the protein MD setup workflow and see if you can still run the whole pipeline with a mutated structure and see what happens. It's really easy to add a new building block. If you want to go from scratch, what will you need? You will need to take a look at the table that you have here that is always updated, because we have this table in the website connected to the GitHub repo using hooks. That means that every time we modify a version of the source code, this table is automatically updated. And you need to think about what you want to do. For example, I want to start working with a ligand. I will put here in the search by text that I'm interested in ligands. And I will find something like a building block that is downloading a ligand file from a REST API. That's fantastic. I want to start using this. Then I should click on the documentation for this particular building block. Actually, the documentation is for all the module, all the input-output package. When you click on the documentation, you will find the read-the-docs typical documentation. You click on the Python API, you will find your building block that you are interested in, the ligand one. You click on the ligand, and you will obtain, guess why, inputs, outputs, and properties. It's the only thing that you need to know to run the building block. So in this case, you know that there is no input needed for this particular building block, but there is an output, and there is a properties that you can put, for example, the ligand code, which is interesting. And then you can start writing, importing the module. Actually, we just need the ligand one here. This pdb shouldn't be here. Defining inputs and outputs. Remember, we don't need inputs. We just need an output. So we define, although it is called input structure, it is an output. It will be used as an output here in the launch. And we define also the properties. In this case, the ligand code. These are properties, and the ligand code is the ibuprofen. And we put the properties here, and we launch the building block. This is how you write one building block, and then it's just a matter of connecting the different building blocks, one with the other. Examples of that, different examples. EditConf, for example, building block. You click on the read the docs. You will find all the information, inputs, outputs, and properties. Here you have the properties, the distance to molecule, and the box type. They are both of them. This is a string, and this is a flow-out. And here in parentheses, you will have the default parameters for them. So we go to the Jupyter notebook. We import the module. We define inputs, outputs, and properties according to the documentation that you have here, and we run the building block. Don't worry, we will have time this afternoon to understand all of that, and to go one by one with the different steps of the protein ligand complex MD setup simulation. The same for another example, which is the mutation example in another module. It's exactly the same. You go to the documentation, input, outputs, and properties. In this case, you have different inputs, as you can see, and properties, and you do the same here. In this case, what is different from the other one is that we are using a docker container. We are not wrapping a particular tool installed in the system, but we are wrapping a docker container. The building blocks has also the possibility. And you will find all the properties here to use a container, a docker container with the building block. This afternoon in the hands-on session, we are going to prepare a protein ligand complex. We are going to split the protein and the ligand from the PDB file, parametrize the ligand, obtain the protein topology in a separated way, and then join together the protein and the ligand topologies and structures, and then run the setup of the whole system. After that, there is a quality check analysis, RMSD, and radius of variation that we will use probably, for example, to analyze, to see the results in a graphical way. And after closing this second session, let me just introduce really briefly what you can do with the workflows, by actual building blocks workflows in a common line way. As you can imagine, you can export directly the Jupyter notebooks. Here you go to download as Python script, and you will generate, you will have a Python script with all your workflow, and you can, of course, run this workflow in a common line way. But this has some disadvantages, like there's no graphical cells, of course, you are losing interactivity, but you are gaining high throughput, so you can automate it, you can run it many, many different times, because this is a common line workflow now. The problem with that is that if you want to modify the parameter, if you want, for example, to run this 100 times, you need to modify the parameters for a certain step inside the Python script. And for that, we have another way of doing this, which is the command line interface Python and YAML, splitting the workflow into the workflow script in the Python and the workflow parameters, remember, inputs, outputs and parameters, and the dependencies between them, between the different steps in the YAML file. If you are interested in that, we are not going into details today here, but you have a command line workflow tutorial that will explain how to do that, how to convert from the Jupyter Notebooks to this command line way of running the BioXL Building Blocks workflows. And with that, just a summary of everything, so Building Biomolecular Simulation workflows is really easy with this BioXL Building Blocks library. I hope that I have convinced you about that. If not, we have this afternoon session, the hands-on, where we will try to convince you even more. You just need to create a conda environment, be a little bit familiar with the syntax, the BioXL Building Blocks syntax, and start connecting the Building Blocks together. So Jupyter Notebook Graphical Use Interface, as you have seen, is a really good tool to start playing with these Building Blocks. And you can package all the workflows with the conda packaging and export the workflows in a really easy way. And finally, you can export your workflows to a command line way of executing if you are interested in high throughput. And now, please join us together for the query and answer session, together with Pao Andrio and Janice Viary. And thank you all for participating in this BioXL Summer School.