 Welcome to the second session of the BioExcel Building Blocks lecture on the BioExcel Summer School 2021 edition. In this lecture, I'm going to talk about the computational biomolecular simulation workflows using these BioExcel Building Blocks. Remember, we are in this second lecture, second introduction lecture from the BioBB library. And remember also that tomorrow we will have a live session where we are going to run together a tutorial on a protein-leagon complex molecular dynamic setup with Gromax and a Jupyter notebook interface. We are also having a couple of query and answer sessions tomorrow. One for the doubts that you have on all the theory and introductory sessions, and another one at the end of the session for the tutorial and hands-on. Okay, so previously I hope that you still remember from the previous lecture the basics of the BioExcel Building Blocks library. So the idea is that we have developed a library which are a collection of Python wrappers on top of biomolecular simulation tools. These wrappers are giving interoperability to all the different tools so that we can use them and join them together building our own biomolecular simulation workflows that then can be launched and run using different workflow managers, such as for example a graphical user interface called Jupyter Notebooks. And I hope that you're familiar with Jupyter Notebooks, but if not, I will try to introduce you why we are using the Jupyter Notebooks and the powerful and useful tool that they are. Remember that we have a website where you can find all the information that you need to start using the BioExcel Building Blocks library. In particular you have an about section where you will find all the information that I gave you in the first introduction session. But also you have a couple of sections that are also really important. The first one is the tutorial section where you will find basically a set of tutorials to install Anaconda packaging system in your system, no matter if you have Mac operative system Ubuntu or Windows. You can install the Anaconda package system and then it's the only software that you need to reproduce all the workflows that basically are in the workflows, top workflows section of the website. All of these workflows we implemented them for you are demonstration workflows using Jupyter Notebooks and you can download them and execute them in your own local infrastructure using the KONDA system. Or you can just execute them in binary, we will take a look at that in this session. The important thing here is that from the different demonstration workflows that we have the protein small molecule complex, molecular dynamic setup using Gromac is the one that we are going to use tomorrow in the live session in the hands-on session. Okay, why are we using Jupyter Notebooks? Why we chose Jupyter Notebooks for our demonstration workflows? I will try to convince you that Jupyter Notebooks are really a very powerful and useful tool in general for training because they have the possibility to run the different cells in the workflow step by step. You can run one cell, check intermediate results, interactively modify parameters here, run again the cell, see what are the differences, modifying just one parameter or one input here, inspect intermediate results, not just the text or the log files, but also structure in 3D, interactive 3D, but also plots of different analysis. And also on top of that you can run it in platforms, remote platforms like MyBinder, I'm going to introduce you that in a minute. And using this MyBinder you don't have to install anything in your own computer. If you just want to start trying and playing with one particular workflow and see what you can do with the workflow and the BioXL Building Blocks Library, you can just run it remotely, play a little bit and then if it is interesting for you, you just need to download it and start working on it in your own infrastructure. This is in general, it's not just for the BioXL Building Blocks, but in particular for the BioXL Building Blocks, Jupyter logs are a really good tool to starting to be familiar with the syntax of the BioXL Building Blocks Library. Also, it helps on learning how to build the workflows and also in understanding what the workflows are doing step by step, and I will show you that a little bit in the next sections. And also it will help you in understanding how to package the whole workflow using the Conda packaging system and the BioXL Building Blocks Library. So one by one, starting to be familiar with the BioXL Building Blocks syntax. This is like I showed you in the previous lecture, but let me repeat that, I will repeat that today and tomorrow so that you can save it in your brain. This is the syntax that you need to understand and this is the syntax that you are going to use in all the different building blocks for the library, and it's importing the module, defining inputs, outputs and properties and launching the building block, nothing else than this. So importing the module, EditCon, in this case, the one that generates the box, defining inputs, outputs and properties and then running, launching the building block. If we take a look a little bit more closer to this example that I was rushing through in the first lesson, you will see that you have four different building blocks connected one to the other. So the first one is downloading a small molecule and is importing the module, creating, defining inputs, outputs and properties and launching the building block. Here the output that is generated, it's called InputStractor. InputStractor is just a Python variable that is the code of the ligand plus dot pdb, it's a string, a file in your file system. The variable in Python is called InputStractor and this, which is the output of the first building block, is used as an input for the second building block. And the output of the second building block is used as an input for the third building block and so on so forth. And this is how we connect the different building blocks. We define input, outputs and properties and launch the building block. We are downloading a pdb file here. We are adding hydrogens here. So properties for adding hydrogens with open bubble, well, a particular pH that we can define and then we say that the input format is pdb and the output format is mol2. So we are converting a format of the file that we are producing from pdb to mol2, for example, and we are using this mol2 here to energetically minimize the hydrogens that we have recently added in the previous one. And then we parameterizing the last step, the molecule, but look at that. Import the module defining inputs, outputs and properties and launching the building block. It's always the same. So it's a way to familiarize with the syntax of the BIOXL building block. Actually, the workflow that I have just introduced you, it's a real workflow, a real demonstration workflow that you can take a look at because it's one of the workflows for the collection that you have in the workflow section in the website. It's called automatic ligand parameterization and it's doing a little bit more than what I have just shown you, but the basics are exactly the ones that you see. Okay, second part, learn how to build the workflow and understand what the workflow is doing and what each of the cells of the workflow is doing. For me, this is fantastic because Jupyter Notebooks is not just to write code and execute code, they are also used to document the code. And here you have an example. This is one particular demonstration workflow from our collection. The molecular dynamics setup tutorial of a particular protein. You have a summary of what the workflow is doing, the modules that are used in this workflow in case you want to reproduce it. The auxiliary libraries that are used, all of this is markdown code. This documentation for the workflow is not something that you can execute, it's just for you to understand what the workflow is doing. Step by step, we have also documentation about what the step is doing. Here you have a PDB to GMX. If you're familiar with Gromax, you will identify this is the tool that Gromax has to generate the topology of the system. For the ones that are not familiar, don't worry. Tomorrow you will have a session about the introduction to the Gromax MD package. But basically here you see that we have documentation about what the cell is doing, is building the Gromax topology, using a particular force field, using a particular water model, disgenerating these different output files. The building log that we are using is this one, you have a link. All of this information can be included in the Jupyter Notebook. I think that is really important to understand our biomolecular simulation workflows in particular. And I was telling you before, taking a look at the intermediate results in an interactive way. This is also another fantastic feature that the Jupyter Notebooks offer. In this case, we are using NGLBware tool to take a look at the protein structure in 3D. You can rotate, zoom, you can inspect the structure that you have just downloaded in the previous step. And you will see that tomorrow in the live session. Not only structures, but also as I was telling before, plots like this one, for example, 2D plot of the energy minimization along the potential energy, along the energy minimization process in the dynamic setup. This is the workflow, the demonstration workflow that we are going to use tomorrow in the live session. And in this one, you will see all of these examples, documentation, intermediate results, how to play with the different cells, how to modify the parameters. And you will see that in the hands-on session tomorrow. And finally, Jupyter Notebooks are useful also to understand how to package the whole workflow using the KONDA packaging system. And for example, what you can do to start working on a workflow using the Pyrexel building blocks library is to create a KONDA environment, a new KONDA environment like this. You just give it a name and a version of Python and it will create this environment, closed environment. Remember from the first lecture in your own computer. Then you activate this environment so you go inside this environment and now you are able to execute everything with all the dependencies that are installed in this particular environment. We don't have any now, but we can type KONDA install. With this particular, for example, KONDA package by VB Chemistry, it will install all these, all the dependencies that the by VB Chemistry module needs inside this particular environment. And then you can run Jupyter Notebook and you will be able to start using all the building blocks that are contained in this particular module. It's as easy as that. Remember that for all the different modules that we have in the library, you have the BioKONDA package that you can go there and you can explore the name of the BioKONDA package and you can run KONDA install in your own environment so that all the dependencies needed for all the different modules will be installed automatically. Those are all the packages that we have. If you just type in the search input form by UVB, you will have, you will retrieve all the different modules available in the KONDA packages related to the BioXL Building Blocks Library. But even more important than that, and as I was telling you before, you can grab the whole workflow using KONDA packaging system. And for that we are using this environment.yaml syntax. It's just one single file, a YAML formatted file where you can tell KONDA, this is the name of my environment that I want. Those are the channels where I have all the different modules. In this case, we are using BioKONDA and KONDAforge. And the dependencies that my workflow has are the BioXL input output to download, we are using the automatic legal parameterization example that I was showing before, input output to download the small molecule, chemistry to add hydrogens, minimize the hydrogens and parameterize the system, and then just three auxiliary ones that are always there, but the important ones are these ones. And with this environment.yaml file, you can run something like KONDA environment create, using this environment.yaml, and all the dependencies for all the modules will be installed directly in this new environment that is called like this name here that you can modify. With this and with these lines here, you can reproduce any workflow that you want using the BioXL Building Blocks. This is something that we are going to see tomorrow, but look at that and I'm sure that you will understand six different lines, seven different lines. So git clone of the repository in GitHub, having all the information that Jupyter Notebook, which is the workflow, you clone the workflow in your own machine, you go to the folder that is created, you create the environment, the KONDA environment, you activate the environment, then you just enable a couple of extensions just to see widgets and to be able to see the interactive intermediate results, in particular the structures, and then you run the Jupyter Notebook with a notebook that is contained inside the git repo. Seven lines to reproduce one particular workflow, but seven lines to reproduce any workflow that you want from the list of demonstration workflows or from a workflow that you build using the BioXL Building Blocks, as easy as that. Okay, there's also the opportunity for you to start the workflows, not just downloading the information in your machine, installing all the KONDA environment in your machine, but just going to a remote execution URL. This is an online tool that you can just write the GitHub repo where the notebook is, and it will automatically go and identify the environment.yaml file, which is giving all the information that it needs to install all the dependencies and to run the Jupyter Notebook, and it will do it for you, completely free and online. The only thing is that this binder is completely open to everybody, so you need to compete with the rest of the people. Sometimes there is no computational resources behind that because all of them are busy, and for that we developed our own binder for the BioXL Building Blocks, thanks to our partners in MVLEBI, and with that you just need to sign in with your GitHub account and you will be able to execute any of the demonstration workflows from the website in this BioXL Building Blocks binder, and I invite you to take a look at the different demonstration workflows, click on Executing Binder, and see what you can do because it actually gives you the opportunity to play with the Jupyter Notebooks and with the workflows, either without the need of installing anything in your own computer. Okay, I hope that I have convinced you about the importance of the Jupyter Notebook to start with the BioXL Building Blocks library, understand the syntax, start playing with the workflows, start developing your own workflows, but now, BioXL Building Blocks workflows, how can you start a workflow basically? I think that using the Jupyter Notebooks of course, I think that you have two different ways, and both of them are important, but one of them is, for me, the first one and the most easy one, and it's just cloning an existing tutorial from the ones that you have in the demonstration list of tutorials in the website and playing with the tutorial. And actually, this is what we are going to do tomorrow, or you can go and start the new workflow from scratch, and I will give you just five minutes of information on both of the different approximations. So the first one, cloning a tutorial, you have already seen that. If you go to one of the demonstration workflows in the website, and you scroll a little bit down in the documentation of the Jupyter Notebook, you will see in all the different demonstration workflows this section, which is called Conda Installation and Launch. If you reproduce this in your own machine after installing the Conda packaging system, you should be able to reproduce that in your own machine. It's as easy as that. And remember, the Conda directly understands from the environment of the animal that needs to install all of these that are the requirements and dependencies of the particular workflow. Once you have that in your computer, in your run the Jupyter Notebook, you can start playing with the Notebook. Remember that you can also do that without installing everything in your machine, just opening it in my binder or in the BioXL binder. You will be able to do exactly the same. Once you have the Jupyter Notebook open, you can play. And that means, okay, you have a BDB code here, which is a LISO sign. It's a toy protein. It's easy. This is here because we know that it works in most of the times with really basic molecular dynamic setup pipelines. But what happens if I start playing not just with the LISO sign, but with a part of that kinase with a tetramer, complicated tetramer like this one here. What happens if I change the BDB here? What happens if I play with the DNA instead of a protein or a protein DNA complex? And this is a really good way to start identifying issues and understanding why you cannot use the same workflow for all the different structures in the BDB database. You can also modify the building blocks. So, for example, EditConf to build the box, the water box surrounding the molecule. I can change the type of the box. I can change cubic for octahedric and then look at the intermediate structure with NGN and see what happens, what has changed. Of course, the type of the box should have changed. The distance to the molecule, I can change that too. If I add ions, counter ions to the system, I want to neutralize the system. I can modify that and say, no, I don't want to neutralize. I can modify the concentration. It's just playing a little bit and understand how that works. You can also, of course, look at the documentation. Take a look at what are the properties that you can modify and add new properties or just remove properties. One particular example, which is important, is this MDP property. MDP is the molecular dynamics property file from Gromax. It's a bit that you have the session about Gromax tomorrow, but for the ones that are familiar with Gromax, I'm sure that you recognize this property. It's as complicated, so the molecular dynamics parameters or properties for Gromax are so complicated and Gromax has many, many, many different types of properties that we have a webinar in BioXL, a whole webinar, more than one hour, talking about different parameters on the MDP file. So imagine how complicated this could be. You can take a look at the manual of Gromax to identify the meaning of all the different properties. But in the BioXL Building Blocks documentation, you will not find all the different properties that you can add here. This is a particular MDP that is open for you to add whatever property that you want. For example, the ensemble that you want to run the simulation, MVT, MPT, the type of pressure algorithm that you want to use or the temperature algorithm, all of these information can be just placed here. But you need to know about the MDP file. Okay, more easy things you can modify here. For example, I want to extract the potential energy from an analysis of energy, but I can just write here more terms such as temperature, pressure. I want to extract all of these information from the Gromax energy execution. So just modifying the Building Blocks and playing a little bit. And finally, you can also add Building Blocks. You start to feel comfortable with the syntax in the BioXL Building Blocks. You can add a particular simulated annealing equilibration, for example, or you can add a mutation in the MD setup process before starting the setup. You can mutate one residue of the protein and see what happens after that. So I recommend to play a little bit with the Jupyter Notebook. This is a very good way to start understanding how to use the BioXL Building Blocks. Okay, so the second approach is starting a generation of a workflow from scratch. And as I was telling you before, you need to conduct environment for that. But to identify first which are the Building Blocks that you need, you need to think about the workflow. For example, I'm thinking about the automatic small molecule parametrization. This is small molecule, usually a ligand. I wrote in the table that we have in the website, for example, ligand in the text file. And I find a Building Blocks that is called ligand, which is a wrapper for downloading a PDB ligand. Okay, I will start with this one. I click on ligand here and it will open for me the documentation of a ligand building block. In the ligand building block, you have the parameters that you can or you must write. So output PDB, which is mandatory and then the properties, which are optional. Output, of course, is the PDB that will be written in your file path. Ligand code is the code for the PDB that you want to download. And this is different APIs that you can use to download. But you have an example here. So you can just copy and paste the example into a Jupyter notebook. After installing the conda package that contains this ligand building block, which is the BioXL building block by your BBIO of input output. You run a conda instead of this in your environment and then you can run the building block in your Jupyter notebook. The syntax is here and you can just copy, paste it as easy as that. And you have the first building block and you start building new ones one after the other. Same process for the rest of the building blocks. So you identify that you need to add a box, edit comf. You click on edit comf and you go to the documentation page. You have an input group path and output group path that you need to define. We define the output group path here, which is this one here. And we use as an input the output of the previous building block and then all the properties that you can define. So you can use distance to molecule, box, type, center molecule, and you start populating the properties dictionary and you run. And the same for example for the block on PMX free energy to mutate the residue. In this case, this is just an example to tell you that we can also, we are compatible in using docker containers, but this is for another session. We don't have time for that now. Okay, in the hands-on session tomorrow, we are going to do exactly that open a Jupyter notebook, play with the notebook, try to go from the beginning to the end, understanding the different cells, the different steps of the workflow, which are basically splitting the protein and the ligand from a PDF file that we download using the PDF API. Parameterize the small molecule, the ligand, obtain the protein topology, so in two different parts. And after that, joining the protein and the ligand topologies and the structures, and then running a typical basic molecular dynamics setup, which is minimizing the energy of the system, equilibrating the energy and ending the finishing the workflow with an unrestrained short molecular dynamics. And after that, small pieces of quality check that we always run, which is the root mean squared deviation and the radius of gyration on the frames that will be generated by the short molecular dynamics. Okay, before finishing this introduction lecture, let me tell you a little bit about the common line usage of the BioXL building blocks, which is the approximation that we use for the success story that I presented in the previous lecture. You can do that in different ways. So the first one and the easiest way is from the Jupyter notebook that you have already created, you have your workflow there, you can download the workflow in Python format, just a file, download as Python format, and it will generate a Python code in your system that you can then run. And these advantages of this is that, of course, you lose the graphical cells because you are now going common line way, and you lose interactivity, but you have a great advantage, and it's that you are gaining high throughput. So you are gaining automation and repetition. You can run that as many times as you want. Problem to run that workflow as many times as you want, that if you want to modify the parameters for a certain step, just for example the input file, you need to modify the Python code or adapt the Python code and the Python script. So for that, what we thought is that for a common line execution of the building blocks, we needed another approximation, and for that we developed a new idea that is divide, split the script, the workflow script, which is the Python code that loops the conditional logic behind the workflow, and the workflow parameters, which are the inputs and outputs of the parameter. The dependence is between the different building blocks, so the output of the first one is the input of the second one, etc., and all the parameters and the properties, and global inputs of the workflow and global parameters of the workflow. All of that will be in a YAML, separate the YAML formatted file, whereas the Python script will be in a separated Python script, the workflow script. And if you want to know more about that, just please take a look at the tutorial that you will find in the website, that is called command line workflows, and here we have all extended documentation about how to run the workflows, using the BIOXL building blocks library in a command line way, using two different files, the script and the workflow parameters. We are not going to see that in the summer school, because we don't have time, but if after the hand transition you think that you are interested in that, and this tool is suited for you, just please be in contact with us, and we can help you in running your workflows in a command line way. Okay, as a summary, we have seen how to build, deploy and run BIOXL simulation workflows. We have seen that it is really easy with the BIOXL building blocks. I hope that now you understand the syntax that we are going to use tomorrow in the live session. We need to create a command line environment, or to define the command line environment that we need, the workflow needs in a YAML file, use the syntax to build the workflow and connecting the building blocks. That's all. Jupyter Novografial using their files is a really good tool to start playing with the BIOXL building blocks. I hope that you also agree with me on that now. It's really easy to package the whole workflow in a conda environment and export the workflow, and it is really good for reproducibility. So now you start thinking about what you can do with your bimolecular simulation workflows and how you can export that, share that with your colleagues, with the scientific community if you want. And also how to export the workflow to command line for high throughput executions, that I have just briefly explained to you that there is a way to do that, but we can talk about that more if you are interested. So remember, tomorrow we have the live session. We will be with BOW engineers there that will be monitoring the chat with all the questions that you find. Remember, we will run the Protein Small Molecule Complex Molecular Dynamics setup. In the morning, tomorrow you will have the lectures on Gromach's introduction. So here we will also use Gromach, so I'm sure that you will be already familiar with the Gromach syntax. So we will be able to, in just a couple of hours, go from the beginning to the end of the tutorial. And we will include the query and answer session there. So please write down all the queries that you have, all the doubts, and also suggestions or feedback from this introductory lecture. And just ask us in the session tomorrow before starting with the hands-on. And with that, thank you all for participating in the Bioxel Summoner School. I hope that we will see you tomorrow and we will be able to run the hands-on session on the Jupyter notebook. And the Bioxel Building Blocks Library. Thank you.