 Welcome to the BioXell webinar number 72. And today with us is Thomas Ler from AstraZeneca. I'm Alessandra Vilna from the Royal Institute of Technology with me is Otto Andersson from the Finnish IT Center in Finland. Thomas will speak about computational chemistry workflow with cycles and conditions using mites. So the webinar just will be recorded. And during the webinar, you are most welcome to ask question to ask question you have to use the Q&A function that is on the bottom of your zoom panel. And you might have this symbol of this symbol, depending on the operating system that you have. You can click there, type your question, and the question will be read by one of us at the end of the webinar. And but you have also the possibility to ask yourself the question. So just writing if that you do want to that we activate your microphone or if you don't want to use right now microphone. We also had something new this time. So we have an after set webinar. So after the webinar there will be in AskBioxell.eu, that is the BioXell forum, a category dedicated to this webinar where you can go on for one week to ask question to Thomas about this webinar. So please go to check there for after the webinar. Let's all about Thomas. Thomas is currently a senior scientist in the molecular AI group PedastroCernican. He's working combining physics-based approach to protein-leagant binding with generative machine learning models. He got a PhD in the Center for Misfolding Disease at the University of Cambridge, and where he was using computational methods to study kinetic and thermodynamic of disorder protein. He has also contributed to the development of open source software like Plume, and that is a software for enhance molecular dynamic simulation. And now I will stop sharing and give the world to Thomas, please. Thanks for the kind introduction and the invitation. Today I'll be talking about our efforts to build a workflow manager to handle more complex workflows containing cycles and conditions. And the motivating factor for this is really the integration of physical methods and generative machine learning models. So first I'll give you a bit of a sort of motivational introduction in terms of early stage drug discovery, how we can use generative models for it, and then how this implies a need for a more advanced workflow management system. And I will talk a bit about how you can use Maze in practice, so it looks like what is implemented and some basic capabilities. So machine learning crops up in all parts of drug discovery and the drug discovery process. So of course we have late stage development and clinical development where machine learning methods are also routine, but also of course early discovery so target identification and initial drug design. So I'm going to focus on drug design today and the departments I'm working in at AstraZeneca, the molecular AI group. And we have sort of three main focuses. First of all we're working on synthesis prediction. So given a small molecule, how can we find an ideal synthesis route for this in an automated way. And the novel molecular design so coming up with completely new molecular structures that are drug like and could be good drug candidates. And then the third one, this is sort of the sub team I'm working on is to be physics informed screening aspect, and this is very tightly integrated would be the novel molecular design efforts because if we come up with a new drug compound we want to be able to evaluate it as efficiently as possible. So the core really of the drug discovery process in the early stages the design make test analyze cycle. So generally you will a drug discovery process project starts with a drug target. So some kind of protein that is implicated in the disease and will typically have some kind of initial hit compound that is a starting point for us. And this hit compound will often be not very active or selective as not so nice toxic logical properties. And we want to get to a candidate drug, which is more potent as a good side effect profile and certainly as effective as possible. Between these in that in that process, we go through this design make test analyze cycle. And so we design come up with a design for a molecule, synthesize it, test it in some assay and then analyze the results and use that to inform the next iteration of this cycle. How can we make this faster. And so what we're working on in my group is is a lot of doing a little design as I mentioned before so coming up with new structures, synthesis prediction, and what is also done in AstraZeneca is automation to make the test and analyze steps as efficient as possible. And I'm not going to be talking about those today. So, do you know how my design is, is sort of a fairly new field and it's really been made possible by innovations and machine learning, especially. And so I think a lot of you are now familiar with large language models and actually the main software that that was developed in house and is, and is available for this purpose called reinvent is actually based on ideas in language models. So what we can do is we can take a small molecule, like this example here, and we can use the smiles representation which is basically just a string representation of this molecule. And smiles codes have a really can be seen as a language in some sense so they have a specific syntax and grammar to it. So we can use the, the power of, of language models and apply them to this task of generating small molecules. And this is done in our case with recurrent neural networks which are a bit of an older method that they work really well in this case actually. So what it is we can sample from this recurrent neural network, after it's been trained on a suitable set of traffic molecules for instance, and eventually build up our molecules so in this case where we get to this, this probability vector and we sample from this probability vector. And this case we're sampling an oxygen atom. And because we sample an oxygen atom we have a certain other probability distribution the second iteration will be a double bond and then eventually we can build up our small molecule by sampling from the output of this network and we get something that is a drug like molecule. And about something like what you see on the right side. This tool is called reinvent and it's really be the core of our design efforts. And as I just mentioned the idea is we propose some new chemical structures, but ideally what we want to do is we want to get some kind of idea of how good those small molecules are at a particular tasks so for example binding to a particular drug target. So what we want is some kind of scoring function that evaluates them, and then feeds those scores idea back into the generative model so that at the next iteration the generative model can produce a better set of compounds. This process called reinforcement learning. And you might have heard of this in in many other sort of high profile machine learning projects. And so the advantage of this is that instead of screening through existing libraries which might have a few billion compounds at most, we can actually screen a much, much larger section of chemical space. And the number that gets thrown around the chemical space was 10 to the 60 compounds. And of course a lot of that depends on how the model is trained but we can definitely generate many compounds that have never been seen before. So in theory this gives us a lot of flexibility in our drug discovery process. And you can see this reinforcement learning at play if you use docking as a scoring function. And in this case you can see the docking score on the y-axis and the epoch on the x-axis. And you can see that the model logotype learns to create compounds that have a more favorable docking score. So the idea now, of course, is that we want to go a step further with this scoring process. So docking is quite nice but it's also not necessarily the most accurate way of determining the binding energy of a small molecule to a protein target. And a far more accurate approach but also very computation expensive one is a relative binding free energy calculations. And these have been used by us quite extensively and they generally correlate fairly well with experimental binding free energy. So they are a great tool to make well informed decisions on which molecules to prioritize for synthesis, for example. But of course we would like to have this kind of scoring in a loop with the generative model I've just mentioned. The problem is that RFE calculations are quite slow and they can take several hours for a single compound or the difference between two compounds. Whereas docking can be done in seconds to minutes. So there's a nice way of sort of making this process a bit more efficient and that is active learning. So the idea behind active learning is that we can take a subset of the compounds we want to score. So say we have some kind of large library of 20,000 compounds we can select just a few hundred or so. We can do an RFE calculation on just this subset. And then we can take those scores that we got and train a machine learning model to predict the scores for all the other compounds that we didn't sample. So if we select our subsets wisely and the machine learning model is in this case it's a domain specific machine learning model. So it doesn't have to be transferable so it can be quite effective. And we can actually significantly improve this process. And we can get away with doing a few hundred free energy perturbation simulations, for example. And thus saving hundreds of thousands of hours potentially of compute time and making this whole process a lot more efficient. So this would be some motivation for having some strong way of combining all these models together. We want some efficient way of doing this. And so this is where a workflow manager comes in and this is where we're going to a more to the engineering side of this whole process. So why would you want to use a workflow manager? Well, first of all, you can really make a lot of gains in terms of reproducibility of your of your scientific workflows. If it's a very standardist way of running them, then that makes things easier not just for yourself but also others who want to replicate your workflow. Then another aspect is configuration. You can separate the configuration of your software with that of your workflow or your actual scientific workflow makes it very clear which parameters are for your workflow and which ones are just there because you're running on a particular machine somewhere. And of course, you can gain a lot in terms of modularization because of many components you're using in workflows will be reused in other parts and you don't have to rewrite them again or share code in some other way. And then it can be a lot easier to automate these workflows because the interfaces can be kept very consistent. And also there's the concept of abstracting away very complex workflows in so that you can really focus on the essence of your workflow and don't have to worry about details for some particular step that have been already fleshed out before. And then finally there's some, there's some hope that you can actually parallelize quite significant parts of your workflow and thus make your whole setup more your pipeline a bit more efficient. And so you can see here is sort of an example workflow that is very typical I think in molecular simulation. So if you want to simulate a protein with a ligand, you would have some step of preparing the ligands and preparing the protein those can be run in parallel so there's some efficiency saving here. You set up your system you minimize your equilibrate simulate the system. So this is a very typical setup and these steps can sort of be seen as we can abstract in my way to boxes that look like this basically. So this is very typical news case you might be familiar with but it's actually not entirely what we want to do I think in many cases, because we might do our analysis at the end and then find this is not actually converged we need to run a bit longer. And in this case you actually immediately have a cycle in there and that's where existing workflow managers break down because basically not none of them can actually run a circular workflow like this. And so this these graph structures you saw, they can be classified as directed a cycle graphs or directed cycle graphs and directed a cycle graph we have a very linear workflow. So, any node just depends on the woods that came before it and then that came after it, whereas in the directed cyclic graph can see that node be actually depends on note a but also downstream node. D. So, there's a cycling here which cannot be easily modeled with the ways that workflow managers running directed a cycle graphs. So there are certain algorithms that are very optimized for directed a cyclic graphs that cannot be applied to directed cyclic graphs, because they depend on this ordering. And so there's quite a lot of software that can run directed a cyclic graphs, some larger piece of software airflow and Luigi. And on the other hand, directed cyclic graphs, there's not very much I found actually that can do this that we could base our work lives on. And the only big one I found is is which is written in Scotland has a bit of a different different use case. So what do we want out of our workflow manager. So, I think I mentioned these, these things before so but it's what we wanted, mostly is a reproducible system and we wanted this to be written in Python because all our other software is is written in Python and it's an excellent language for for scientific computing. And we wanted these workflow definitions to be very simple and portable. So, can easily run them on other systems and that sort of goes into the next point of configuration we wanted to keep the configuration for the system you're running on completely separate from that of a workflow. So as long as your system is configured correctly, in terms of what software can use and what is available in terms of libraries and so on. So we can just take a workflow and run it on on that system without any changes to the workflow. And we wanted some way of modularizing our workflows of course so we want to be able to reuse workflow nodes, and we can do that by giving very well defined inputs and outputs, so that they can just be put together like little pieces. And of course we wanted some ways of automating this so good interfaces to excellent software for example we can use this in Jupyter notebooks for example. And a very big point here is also the some way of abstracting our workflows into what I'm calling subgraph so the idea is you can group many nodes together to form a subgraph which in turn acts just like a normal node in a workflow would means you can build up workflows hierarchically by combining very small steps together and then abstracting them into one big step and then use those bigger steps in the larger workflow and you can do this arbitrarily deeply. And this makes I think it I think makes it easier to reason around very complex workflows that inevitably crop up in computational chemistry. And finally, it would be nice to have parallelization by default. So, being able to run all these steps in parallel and let them communicate with each other. And this brings me to what I use as a basis for this whole system and that is the concept of flow based programming which is something that was initially discovered I think in the or invented in the in the 80s but then never really caught on. And this is a very elegant way of solving this problem because the idea here is that each node in your workflow graph is just running a single system process. So all nodes in your system are running at the same time, but they all communicate with each other through specific channels based on how you connected them. One, one node can send data through the output port and some other node will receive this data and can do some processing with it. And so every node just waits for data to be received or does some processing and then sends it on and then can do something else. And the advantage of this is that you can now execute any kind of graph structure can be have arbitrary cycles or some kind of conditional execution, but also all nodes are running at the same time which means everything is parallel by default. And you can buy just very, I mean very strict about what the nodes are sending and receiving, you can completely avoid any problems like race conditions that you might normally have if you're trying to run software in parallel. So this allows us to solve quite a few problems and gives us some interesting patterns that we can implement that wouldn't have been possible otherwise. So the first one I can I can show you is on the left side and that's batch processing. So we have some kind of long list of data that we want to process through to compute nodes compute a and compute B. And the way we can do this is we can chunk our data into into batches and send our first batch to compute a which will process it send it to compute B process it and send it onwards. But say we have compute node a is is some kind of process that only runs on the CPU and compute B is something that only runs on the GPU. And sometimes we can be far more efficient with our data by doing this because while compute B is running the second batch on the GPU compute a is is running by the computer B is running the first batch of the GPU compute a is running the second batch on the CPU to prepare for compute B. Far more efficient use of resources by doing this. Similarly, you might have software about your two GPUs in your system and your software can only use one GPU at a time. What you can do is you can parallelize your system in such a way that you have to compute nodes doing the same thing just running on separate GPUs for example. You can split your data and send some of it to first node in the other half of it to the second node and let them do their computations and then merger results. And finally, the problem I mentioned earlier with with having some kind of, for example, convergence check is essentially just iteration so we have some kind of compute process happening. And we want to check if we're done and if we're done we just go out of the loop that's fine and if we're not done we send it back to the compute node and we can murder results data. So this is sort of an example of what you could do if you had a simulation running and you wanted to do a convergence check. So this brings me back to active learning. So the, the idea here is, as mentioned before, we have some kind of library or generator in our case. So some way of generating small molecule candidates. And we send them to, we sample from this generator. And before we actually run this whole thing, we have some kind of, I'm assuming we have some kind of model that is trained to predict scores, even if it's not in a very efficient way. But we have, we send it to send some samples say 1000 structures from reinvent, we send them to this model that predict scores. And then based on these scores, we can select a subset, say 100, and those hundred are sent to the so called Oracle, which is the expensive function that we actually want to model. So the Oracle can do this calculation on a subset of 100 compounds. And the scores that it calculated of the free energy of binding, for example, that they calculated and sent back to the, to the model, and the model can be retrained with this data. And at the same time, of course, we merge these results, what we got from the Oracle, say the hundred with the other 900 that we got from the, from a cheap model and feed these scores back to, in this case, reinvent. So the reinforcement learning loop. And this way we can hopefully get a better set of small molecules the next iteration. And at the same time, our model here is going to get a lot better over time, because we're doing it more and more training data from the expensive Oracle function. So the general idea behind active learning. And here's sort of an example of how this would look like in in maze. So we have some kind of generator that generates a list of smiles. And these are sent to the, to this model, the surrogate model that predict scores for all of them. So we have some kind of set of scores. And this goes to this acquisition function. And this acquisition function decides which molecules to send to the Oracle function that is very expensive and which ones to just send back to the generator eventually. So, in this case, we have these two, these two compounds selected, or this one selected for an article and this one selected to just send on. The Oracle does its calculation, sends on the result to a node that just copies the results. And this, this result will be sent first to the, on one hand will be sent to the surrogate training nodes. So we retrain our model on this new score that has been calculated. And the retrade model can go back and been used in the next iteration for the next prediction cycle. In the same time, everything is merged back together and sent back to the generator. So, this was sort of the, the general motivation for, for designing this. Specifically, using it in active learning workflows which incorporate a lot of different types of software we have generative models we have some kind of scoring models we need a lot of sort of what I'm calling plumbing steps in between. And it allows us to be very, very flexible in terms of what kind of workflows we put in so we could think about putting in an additional model in the active learning workflow, for example, or additional scoring steps, or deciding using some other methods and then using those scores to decide which ones to send to an RBE calculation and so on. So there are lots of different combinations we could do here to optimize these, these workflows and make them fit for a very particular purpose. And so there's a bit of software that we've already implemented in terms of workflow steps that can be used right now. And so one of them is, is of course reinvent. So, it's more multiple generative model. We also have a few sort of scoring steps and a lot of docking focus here so to talk to you, you know, and glide by a spreadinger. We also have a shape matching this is rocks. So this is an alternative to docking in some sense and it can be more accurate in some, in some cases. So we have a lot of sort of utility functions so a lot of loading molecules writing molecules. We have an RMSD filtering step so we can, for example, after docking filter compounds according to how close they are to the to some kind of reference we have embedding steps so this allows us to embed go from smiles to to an actual embedded small molecule isomer with 3d coordinates and some kind of state and things like preparing the docking grids for software like you can do online. The other side is more expensive stuff so we have a fairly simple interface to be a relative funding for energy methods implemented in PMX through a previous workflow manager called I close that we had that is only for linear workflows. So we're working on implementing gromax is a big priority here. And we also have some people working on more semi empirical and a new sheet of things xdb pressed and Gaussian. And among all these other things are lots of utility functions so filtering data and some some way. Additionally to different branches. What's a file IO and things like this. So my co worker Lily is is working on implementing gromax in this framework. And this is where I mentioned sub graphs earlier where this can really come into play because we can implement the individual gromax commands but then of course, normally you would want to. You can do a lot of stuff that you're grouping things together and you're running the same set of commands every time you do some kind of action. So we can do exactly this and group lots of notes together and handle the copying and state manipulation this way, and then have very simple workflows at the end that allows to give us a very course of what is going on, and then make it a lot easier to reason about things. And then the other hand we have to my co workers, we could ask them to show for working on predictive synthesis models. And the idea here is that we have some kind of reaction metadata we've already, we have from some other resource, and we want to evaluate certain properties of this reaction. And so what we can do with with maze as well as we can start this circular work flow. And that is generating conformers doing some semi empirical simulations and possibly DFT simulations to evaluate a certain reaction, and then eventually using this data to train an in silico model to actually predict the impassability of a certain compound, for example. So now I will just switch over and show you some actual code and how you can use it. This works. So, okay, I want you to just quickly go through sort of a classic example which is a simple docking workflow. And the idea here is, so this is a Jupyter notebook, and we're going to run a simple linear docking workflow we're going to take a few compounds, embed them. So it smiles and embed them to make them give them actual 3D structure, and then dock them to a protein and get the results back. So let's just show you a real basics of maze and how it looks like using it. So there are certain things you need to import. No pointing into too much detail here but there are certain, certain notes that are available here that we are going to use specifically autotour QPU and gypsum VL, which is a method to embed small molecules. So to go from smiles to different isomers and different conformers. And something I should maybe mentioned before, continue much further so earlier already mentioned so maze is written in Python but it's also, it's designed to be specific to the packages and in some sense so it's, we have maze the core package which is completely domain agnostic. And we have the so called maze contrib package which contains stuff specific to computational chemistry and on the domain specific code. So maze is not necessarily only there for computer chemistry you could use it for any kind of workflow management. I mentioned earlier that we keep our configuration completely separate just for this example and because we're running a notebook it's a bit easier to just show the configuration right here. So this is what it would look like you would specify okay auto.tpu needs specific modules so this is what you might be familiar with from some HPC systems would want to do a module load CUDA module load GCC. So this is basically this and we tell it's where to find the software, and this comes in handy when you have software that doesn't come with a fixed name so gromites is another example. You can sort with or without MPI for example have a different name. So this is to avoid any hard coding of things like this. Now for for talking we need to have a grid. So this is already there. And we're going to start by defining our workflow which is going to be done by instantiating a workflow object. And we're going to tell it where the configuration is in this case we're just going to use this example configuration which is what you just saw above. In a real case you would have one file for your for your whole system, and you can put it in one spot and it was find it there. And you would never have to change it depending on the workflow. Then we can start adding nodes to our workflow. And Maze makes use of the typing features in Python, which are fairly new addition to know about them that basically allows you to give type annotations to to any object in Python. And in this case makes me so this to determine what kind of data is being sent around from node to node. And I thought in my experience it can really help with finding errors in GrassWay is sending maybe the wrong kind of data from one node to another. And it will catch this before before the workflow is even run. So in this case we're we have a node called load data which can load any kind of data. But gypsum as a node needs a list of strings it's input because it's getting smiles codes. So we need to tell the data that it's going to load the list of strings. So this is the data type for this, for this node. So we just add these notes, and the same applies for the return node which is a way of returning the data back to the current Python interpreter having been run through the graph. So in this case we can, we can set up the data we want to load in this case we're going to dock these two small strings. And we're going to tell gypsum that handling the isomers and some conformers how many variants to generate the maximum. And we're going to say tell auto.tpu where the grid is. We can connect things together so at this point we haven't actually connected anything to it we've just said there are these nodes in the graph. And now we're actually saying okay this is connected to this and so on. And this will be a bit clearer if you look at the visualization of the workflow we've just made. So we have some data loading, some node loading data here. And it will show the data type. So we're loading a list of strings sending them to gypsum. Gypsum is generating a list of this isomer collection which is essentially just a thin wrapper around our decit molecules. These are sent to auto.tpu. Auto.tpu has two outputs there in some sense they're a bit redundant. But because there is always a bit of an overhead in parsing, parsing conformers from the results from docking and we don't always need them. There's also a score on the output so this can be a non-py array. That just has the scores for a certain docked compound and also the list of docked compounds containing conformers. And then all we need to do is run float and execute and it will run and give some information. And we can get the results of this from the last node so this was the last node we defined this return node and we can just get the information out of it. This contains the actual molecules that have been embedded and docked. So we have a score, in the first case a molecule failed to dock and in this case it's worked. So we have a score which is a terrible score for docking. And then we can actually visualize this and of course this looks a little bit weird because it's a docked conformer. And then finally I thought I'd show you what you would do if you wanted to add a custom node to this. And what you need is in this case we're just going to need an input and we're going to need the special node class. And what we do when we define a node so it looks something like this. We define a new class and we give our node a name and we subclass from the node class in maze. And we declare an input and this is where we say what this node gets as an input. What kind of data in this case it's an umpire array. And we have the input for it. And then we always define a simple run function which just determines the logic that is in the node. And in this case we're just receiving from this input. And this call will just block until it gets some kind of data. So once it gets data this will immediately run and we'll get scores and then the node can just do whatever it needs to do with scores. And in this case we're just going to log whatever scores we got to the console. So again we can build up our work flow as before and it'll be basically the same except that we're here adding this node that we just defined. Instead of the void node that was there earlier which just deviated to whatever it got as input. And again we can run it and we'll see. We just got this logging message telling us what the docking scores were. And I think I might have just a few more minutes just to want to be other examples I mentioned earlier in terms of using Maze to do some kind of batch processing or parallelization. And so in this case we can do this with docking. And docking is sort of a good example for this because in this case we're running with AutocomGPU which this may imply is the one that's only in the GPU. But we also need to embed this one first. Which means that gym practices can actually be in my experience just about as though it's docking itself. So it would be good to parallelize these steps. And this is something that's done in the CPU. So what this means is that we can build up our work flow and I'll just show you the graph because it makes it a little bit clearer. So again we load our data in and we batch it. So we send out chunks of it at the time and we run these two nodes in the middle in a loop. So they continuously receive data and continuously process it. And then whatever we got out of this whole thing, send it to a node called Combine that can merge all these chunks back together again. And so with these, with this batch on Combine node the interface is exactly the same. So we could combine this into a sub graph and it will behave just the same as just these two. And this will make it possible for AutocomGPU to be docking while Gypsum is embedding the next batch of small modules. And this will run just the same. So you've finished Thomas? Not quite. Okay, so you have the last switch from now. Yes. So just to sort of summarize this. So I think it might be interesting for anyone who has to work on a bit odd software or software with conflicting interfaces or conflicting environments. And something I just want to quickly mention is that actually Maze allows you to run because each node is running a separate process in Python. It actually allows you to have a completely different Python environments for each process for each node. So in some cases you might find you have, I've noticed this especially with machine learning there's some package that absolutely needs some version of PyTorch and some other packages completely incompatible with that version. You can use them together in the same workflow but having separate environments for them. So that's another case where this can be useful. And of course, any kind of with parallelization requirements and quite a few situations where not completely aware of the, of some kind of opportunity for parallelization but it crops up naturally in the graph. So all the overcode I just showed is it's all open source and the permissive license and it's all in our monocled AI GitHub, and I think there'll be links in the chat as well. And if you have any questions on this you can get in touch or give us the forum. So yeah, thank you very much for listening and happy to take any questions. Thank you very much Thomas was great presentation. So we have a couple of questions. The first question is from Valentin Valentin if you're still there we will unmute you. If you can talk you can just speak and ask your question. Hello, is it working? Sorry. Yeah, it's working. Please go ahead. I was just wondering how can you make sure like that you get a small fraction of a family of one like is for a specific project to have an importance in the in the machine learning model. What do we narrow it down? Yes, like narrowing down really specific ligands for the protein rather than than the like, how can you avoid like the bulk ligands that you don't really need or use? Yeah, that's a good point. I mean the model through the reinforcement learning process model learns to generate molecules better actually suitable for the target protein if your scoring function is good. So you can, there are different ways of doing it right here. You could, you could sample very small batch sizes and then just wait a longer time. We sample huge batches and that we scoring functions sorted out that way. So it's super clear what what is actually better. And then in terms of narrowing it down even further so I just mentioned very specific scoring functions right like talking but of course in practice you would you would use a lot of other stuff as well so you can start using things like like the Pinske's rule of five for example there is a very, very simple one or have some kind of limit on the molecular weight or toxicological models, other QSR models. So there's a lot of options there and if you start combining them then you're really shrinking the space available quite dramatically. Okay, that was great. If I can make another question. You do generate molecules out of nothing, right? You don't get already made database. Yeah, that's correct. Yeah. Okay, thank you. Thank you. So now the next question is from Cesar. Okay, Cesar. I will allow you to speak if you want. Would you speak? Hello. Hi. Yes. So I got interested in how you implemented the FEP on this all workflow. I mean, if you use relative binding free energy, you cannot access to all the molecules because it's relative. It's for the congenetic series. So how do you access to run all of them and compare between all these medium compounds? I don't know. Yeah, that's a good point. So at the moment, what we're doing is we're generating based on a sort of a constrained inner scaffold. But that the actual space is sort of fairly limited. And it means we have a good reference. And so the steps we take in our BFE are going to be relatively small. So that is what we're currently doing. But we're definitely also considering at some point absolute binding free energies to make sort of scaffold holding possible. All right. Yep. Thank you. This is available, right? Okay. Thank you. Thank you. So now we have another question of Erica. Now I will unmute you. So just give me a moment. I guess, and Rika is not anymore online. I don't found her. Okay, I will read the question. Hello, very interesting. Are you treating each frame of your simulation as a single data point? Or is the entire trajectory per replica? I'm not quite sure what the context was exactly of this, but generally we don't treat, I mean, for RBFE, we treat the whole simulation as a data point. So we take the case of RBFE, we might do several replicas of the same of the same system and then treat that whole thing as a as one data point. Okay. Thank you. And I will see. Just check the next question. If this still is from Bruna, we can ask from Bruna. So I will allow you to talk if you can talk. Sure. Thank you. So many things for the presentation. I was wondering if you want to implement OpenMM in this application. Yeah, that's sort of fit me on my on my list. Yes, would like to get this. Oh, great. And then we have a question I was trying to from BL. I will allow you to talk. Would you talk? Can you hear me? Yeah, please go ahead. Yeah, thank you. Thank you for such a nice presentation. So I was wondering if this maze can be used along with a job schedule or something like Celeron or PBS Pro. In the environment, a lot of colleagues are running a job in the same dedicated like in a shared compute environment. Yes. Yeah, that was a big, a big thing we wanted to definitely have in there. So I didn't show just now that there's sort of built in commands for running built in functions for running any command and you can dynamically decide if you want to run it on on Slurm, for example, or locally. Of course, you could run the whole workflow, but that's not really doesn't really have much to do with with Maze itself. It should be the, I mean, it's not the part of Maze, but it can be done in a certain way. Okay, thank you. We have a last question that I will read because I don't see the person online. What is the least software requirements to run the program in a normal computer, or do I need super computer to run? It will run on any computer. I mean, I developed it mostly on my workstation, which has a course in the GPU, but it really depends on the workloads you're using, but I don't think there's any real limitation here. So I thank you a lot. I just want if you could stop sharing, I will start again sharing. So I want to have just a final announcement. Just give me the option to go to my PowerPoint that is here. So I want just to make an announcement so we will let the next webinar by Excel webinar will be the 10th of October at three Central European time, and it will be on the competency app by Martin Lorenzo Minaris that and in particular is a link to the competence app that browse competence career profile and training research linked to the computational biomolecular research, so by Excel. I thank you everybody for the tennis, and I hope to see you next time. Thank you. Bye.