 After these days of learning about Gromax and the PMX Energy Calculations MD, it's my pleasure not to introduce you to Christian Magrater from AstraZeneca, one of the co-organizers of this event. And Christian will talk with more how these applications are used also from the industry point of view for the development of novel drugs. And it's a good example for the usage of the applications. So now we can see Christian's screen. All right, thanks Ross. Thanks for the opportunity to give this talk. Well, hearing for that, I thought what would be interesting to you guys as a different angle from the industry. So we have been applying MD simulations. So I'll talk a bit about that and also how it fits into our pipelines. And this will be a very biased pipeline obviously, because that's what we do in-house. But also if we replace individual steps without tools, the overall workflow scheme will stay the same. So bear with me for a bit. At the end I'll show you where MD simulations can actually impact drug discovery. That doesn't work. So just as a bit of an introduction. So AstraZeneca is a global pharmaceutical company. So we're really considered to be big pharma. And we have three different research and the lights in Gatorsburg in Cambridge and where we are located in Gothenburg. And more specifically, I am part of the Molecular AI department. So Molecular Artificial Intelligence. And we deal with two main questions in drug design. The first one being what to make next. So this is the compound idea generation. What could be a potential new drug? It's often referred to as the novel design. This is also the team I'm in. So we talk a bit more about the exploitation, exploration scenarios that we usually have and the main tools. One of which is reinvent and all of our tools are published. So if you're interested, you can look at those. So these are idea generation tools really using deep neural networks. And the other question is how to make it. And that's something that's often maybe not fully appreciated outside the industry. But it's actually a big problem to synthesize ideas that come out of our idea generation pipeline. The best idea doesn't help if you don't know a way how to make it, right? And this is what the synthesis prediction or deep chemistry team is dealing with. So they built a tool called AI Synth that allows you to predict reaction pathways in order to get the compound out in the end. We're also involved in other activities. Most prominently melody. So that's a data sharing privacy preserve data sharing effort across the industry to build better predictive models. But it's not part of the talk today. So the first day we had a very nice talk about the complications in drug discovery. So that saves me the hassle of going through it in too much detail again. But it's safe to say that drug discovery still is very slow and expensive. Drug record methods have helped a bit there. But if you look at the graph on the right, it's still the case that we start with a lot of compounds in the early stage. And then a lot of them are filtered out along the way, maybe because of liabilities and toxicity. And in the end, you have one marketed medicine after a couple of years and hundreds of millions of dollars. So this is still a problem. And if we can improve our predictions, we have a better... If we start here with better compounds, then the chance that we get something through is higher. So that's the whole idea. And another development that has gotten some momentum recently is that we can divide patients in smaller groups. So that's what we call patient stratification. Because we understand much more about the causes for this disease. And that is great because then we can generate bespoke medicines. But on the other hand side, that also leads to an explosion in the number of targets that we have to address. So to sum that up, our business success requires a reduction in the cycle time. And I'll introduce the A-Cycle on the next slide. And also the costs for the famous LiLO or lead identification optimization phase. So the DMT-A-Cycle design of a new compound set. Then you make it, then you test it in certain assets, you analyze the data. And whatever you learn from there, you feed back into the design stage and then you iterate that. And something that is good enough to be propagated downstream. And our team or department works on the design stage. So I'll focus mostly on the idea generation bit here. We try to publish our stuff and we also try to publish the source code. So if you're interested, you can have a look here into those. But I haven't really explained what makes it so difficult for us to generate new ideas. And I think this might help there. So if you look at the chemical, so the collection of all of the compounds that it could potentially make. And this is just an illustration here. And this is enormous. This is estimated to be on the order of 10 to the power of 60 compounds that it could in principle look at. However, realistically, we can only access something on the order of 10 to the power of 10 or 10 to the power of 12. So what we will end up with is a very, very sparse set of areas of sub spaces in chemical space where we know something about and large where we have absolutely no idea. So, or what it says here, we have access to only a very negligible fraction of the chemical space. What can we do? So one way to generate new new ideas is to, for example, which I'll introduce. So that's a tool that we developed in how that's why I'm mentioning it. But there are a lot of different tools that you can apply to that as well. Is that you take a very, very large data set of molecules for sample database, and then you train a neural network on on that to reproduce that. And if you do that properly, then the network will not learn to reproduce the training examples that would be relatively meaningless, but it will learn the rules of chemistry. And that means that it will deduce implicitly from all the examples it has seen that it's not a good idea to add six substituents to a carbon atom, for example. So what we will end up with and that's the changing color of these weights here is a generative model that has a very good capacity. So if you think back on this picture of space that I just showed, we're able to access a lot of different spaces or a lot of different areas, because our model has a high generative capacity. But so if you start with that trained model, it will spit out just molecules that may conform to the rules of chemistry. On the other side of the street, that's fine, but then relevant for project 10. So let's say we want to fit a molecule into this binding cleft here and inhibit some target because that's relevant for disease. Then it's very unlikely that just by chance we get something out that actually fits here. So what we do then with our prior or naive model is that we subject it to reinforcement learning. We select compounds, run that through some scoring function. I'll come back to that on the next slide. And then the score is fed back to update the internal weights. So once in a while, just by chance it will receive a medium score or something that's slightly better than rubbish. And from that it can learn and optimize internal weights. And if you do that long enough and typically we're looking at cycle numbers of 1000 to 3000 iterations, then you end up with a bespoke model that is able to generate relevant project. And for the next project you do the same approach again. So one thing that I should mention are actually really good in generating compounds. So they look good, they conform to certain rules like that. The problem is the scoring. So there are all sorts of different scoring components that it can fill in here. But it's especially binding affinity remains a bit elusive. This is how we construct the scoring function and ignore the equation on the bottom. It's really just a composite of different components. So each of these components describes the molecules in a different way. So this could be molecular weight say. So you penalize everything that's too small or too big and within a certain range it receives a good score. So you will end up in the end with compounds that have a certain size. The second one could be the number of hydrogen bond donors. So you want to limit that maybe not to go crazy and so on and so forth. Now what is really, what would be really cool here is to get structural information in. And that's something that we've been working on quite a bit. And in our first attempt what we did is we wrapped several docking back ends. And that's from a recent preprint that we put on archive. And that is just one of the scoring components. So what happens is that you embed the compounds, you dock them, and the docking score is then fed back to inform the agent. And over time it learns how to generate molecules that produce a better docking score. That's illustrated in this graph here. I won't go into the details, but this was a public data set. We didn't tell the agent anything about the structure to generate other than it should have higher or a good docking scores. And you can see that if you look at any motor similarities, that's the similarity measure of the compounds produced to known actives. So things that we noted are binding. You see that you have at least a couple that might very well come also from the active set that the agent hasn't seen. And although the decoy set was much, much larger, you don't see that in here. That's the idea. Okay, so far I haven't talked about MD simulations at all. But if you think about the limitations that docking has as a proxy for binding affinity, it becomes clear where we are going here. But just to give you an overview, so I thought about what kind of applications we have internally for MD simulations at this moment. And also ask colleagues what they do in everyday life. This is just a small overview here. So all of what I talked about so far is concerning small molecules, so the things we just seen. But there's also a big push towards new modalities or biologics. So this could be peptides or oligonucleotides, for example. So in the small molecules world, what we could do is we use MD simulations for a septic configuration sampling. So instead of just having one big stray structure for a docking virtual screening, for example, or even using it in reinvent as a score component, it could have multiple taken from a MD simulation. Another thing that a couple of colleagues do is using MD simulations as some sort of sanity check for docking poses. They're all in the process of using it for pocket identification. So that is very important if you have to decide whether a target is druggable, as we say. So whether it's actually suitable to be targeted by small molecules or not. And that's still a very tricky question. Because to actually know that you would have to have the product, the molecule already. So you need to define whether a pocket would, in principle, qualify to be druggable or not. But the biggest impact that MD simulations have had and will have to have is relative and absolute binary energy calculations. And I come back to that. Actually, it's very nice that Vitas Talks just introduced that concept so neatly. Yeah, but for new modalities, it's more about the structural understanding. So this is very early days. We're applying it to a lot of different new targets that are not druggable by small molecules. But it's still Wild West. So it's not a really streamlined process. And that's across the industry. It's not just us. But you could also use it, in principle, to predict properties such as the melting temperature from MD simulations. But going to small molecules, this is just what I introduced. So the receptor configurations are. So instead of just taking a crystal structure, you use it to actually run an MD simulation over the course of 100 nanoseconds or so, you cluster it, and then you select representative conformations from these snapshots. And that's actually something that we recently implemented to do ensemble docking in the reinforcement learning group of 3Invents idea generation. So that should help us enrich the configurational sampling a lot. The other thing is just simply running Holos simulations. And we have automated workflows to do that using Gromax. And this is from one of these. This is just a test one. So you have a protein and a docked or actually it's native ligand. So you have a ligand attached to it. And then you see how this part of the protein changes over the course of this simulation. And you can use that to check whether a docking pose is reasonable or not. So if you see that ligand flying away after two nanoseconds, it's probably not a good thing. So this helps to enhance our structured understanding. And as I said, it serves as a sanity check for docking poses. One thing that chemists like a lot, it's also showing you a time-resolved interactions. So if there is a hydrogen bond, for example, that is supposedly very, very important, then you see it appearing and disappearing over the course of the simulation. And that is something that they can relate to a lot. So what is often misunderstood is that we need to convince chemists to actually invest resources to do what we propose they should do. And we're competing with a lot of other techniques and inputs. So we have limited resources and we need to allocate them the best. So MD simulations are an excellent communication tool as well. But the biggest impact of, at least in the small molecule world, of MD simulations is relative binary energies, at least to this day. So what we often end up with is a representative set of novel ideas or a series. So we have an initial hit, let's say it's this compound here in the middle, and then some way or another we get a whole set around this initial hit. So that could be enumeration, that could be by reinvent, whatever. And these are 2D. So if you want to actually calculate that, so what we actually want to, I should add that. So what we want to have is we want to get some insights whether it would be a good idea to go to this or this or this derivative of the initial hit. And binding affinity is a very important criteria. So the first step is to embed these 2D molecules in 3D. You also take care of things like stereochemistry and protonation states, etc. Once you've done that, you can stop talking to fit these molecules into the bindings left that you have defined using the reference ligand usually. So what we start with internally is most often is one or multiple X-ray structures and some experimentally determined binding affinities. From that, and I'm brushing over a lot of details here, but from that construct what we call a perturbation map. That's how we communicate internally, but you could also see that as a bunch of transitions from here to there, from here to there, from here to there simply. So it's a pairwise thing. And again, for this central one here, let's say we have an experimentally determined binding affinity. Then you run your MD simulations and from the trajectories you obtain at the annotated perturbation map. And that's what we communicate to the chemists. So they can see, well, OK, going from here to there, here to there might be preferable. And often they look at the snapshots that we get from the MD simulations and say, OK, this is picking up a new interaction here or this is important because there was a water there stuff like that. So this is really, really good for hypothesis building. And it works better than docking because it takes more into account. So for example, receptor dynamics and a whole bunch of other properties. And as I said, it's helping in making a case. So unfortunately, I cannot show you internal project data, but I can assure you that we have successfully applied that to a number of different projects from different disease areas in the last two and a half years, I would say, or three years maybe. But this is a project of a colleague of mine where he was involved in generating it. This has been published. So you see that here you have this scaffold here. And what they did is they exchanged this R group here by Florent-Claurin and so on. And they were able to identify a couple of nice things that also paved the way. So it was published recently, but it's older work, I believe. So this paved the way for other follow-up studies. So they identified a pocket and also were able to use FEP on those things. And this is great. The main problem, you can already see that probably here, is that these methods require you to have a large chunk of the molecule to be the same. And that's just what Vita has alluded to. So you have to have a certain phase-based overlap here. So in other words, these modifications here, that's small. So if you look at another compound like this, and let's say this is our starting molecule, then it's an easy task to go from here to there. So this number here indicates how far they are apart structurally. But it might be more tricky to do these jumps. And these jumps here might be, so going from here to there might already be out of scope for relative binding for energy techniques. But unfortunately, that's exactly what we are more and more interested in doing these careful hops or jumps outside the non-chemical space. This is very important because sometimes we have liabilities, toxicity issues, off-target effects, or other things that just won't work. And you can already see that. So it's actually quite disappointing if you have compounds that bind well, show good enzymatic inhibition but then for some other downstream problem like toxicity, they will never make it to the market, right? And then we're looking for a backup series. And that would be great if you could do that simply by going from here to there, or even ignoring completely where we started from in the first place. To drive that point home in a slightly different representation, so consider this UMAP embedding here. So this is based on structural descriptors of fingerprints of compounds. So this is representing chemical space. Each of these dots is one compound. And these are completely arbitrary dimensions. Don't worry about them. It just means that these points here are closer together to one another than these points here. And let's assume we start here in this unitization stage, and this is the compound we're looking at. Then using relative binding free energy methods, we could access maybe something within the circle here, right? So we could explore these other molecules in here. And that's great. But one problem, or the biggest problem here is what about the other scaffolds that might be interesting here as well, right? Some of them might be more easily accessible, but for others, these jumps might be just too far. What we do now in this... Sorry, my point to disappear. So what we do at the moment is that we would ask the synthetic teams to synthesize something in here and also get a new crystal structure, and then we take it from there. This takes time. This takes money, and you might be unlucky and pick one of these candidates in here that are just not active. So you would discard this whole area just because this could be a treasure trove, right? But you're just discarding it because you made the wrong pick. And that's not great. What I would prefer, or what we would prefer to do is we could simply access these things right out of the box. Yeah, so much for small molecules. Now I have one slide on new modalities. So we're also investing a lot into oligonucleotides because they have different mode of action. So they hybridize with some RNA. And in this case, for ASOS, they hybridize with some RNA in the cell that is degraded. And by that, you can target things that are not drug-able by small molecules. And as I said, this is very early stage. So we're still trying to find the right setups, both experimentally and computationally. These were some initial simulations. We ran our ultimate pipeline. So you have a simulation time here, 100 nanoseconds, and three identical replicates. It's just different, the initial velocities. And what we did is we did a backbone clustering. So you see that if you look at the populations of overall populations, you see that the first cluster is almost 50%. And this is the CMS, the central member structure of that cluster. So it's a fairly stretched configuration. And you see that appear and disappear over the course of the simulation quite a bit in the first two, at least. So here we have a couple of problems that we need to address. And we're recruiting people to do that. But we're also reaching out to academic collaborators because as Vita said, the force fields are not that great and we need to do all sorts of modifications here. So this is really a tricky bit. Also, we don't reach conversions in these small timescales and a lot of other issues that happen here. I just wanted to bring it up because this is a different angle and these simulations will have an impact in the future because structurally, those things are not very accessible. Okay. So that is already the wrap-up slide to bring together what requirements we have to make the biggest impact in pharma. First of all, we need accurate results. And I put accurate in quotation marks here because we're not necessarily concerned with a certain threshold. Say, we don't need the results to be within half a k-cald per mole or something to be useful. For us, it's often enough to just be able to rank them. We have pipelines that require a certain feed-in at any given time. So if I was able to rank a couple of compounds, say a hundred compounds, and I say, take these 20, put them forward, and there are a couple of hits there that's often that would suffice completely, right? And it's also good to build convincing hypotheses. That has changed a lot in the last years because we had successful project impact. People consider MD simulation as being a technique that is more trustworthy than it used to be maybe five or 10 years ago. We also need sufficient throughput because that's something that's also, maybe something that's not immediately clear. These DMTA cycle iterations have become very, very fast. So projects change the goalposts a lot. So if it takes me a couple of weeks to come up with an answer to problem A, problem A might not be relevant anymore once I have my answer. So we need to, that's another thing. So if I had to balance accuracy and throughput, I would often be on the side of throughput just because having a reasonable answer now is superior to having the best answer later down the road, right? Then what we need, this is more of a practical thing, of course, we need automated and stable workflows. So MD simulations haven't had a reputation for being credibly stable and that's a problem because there are a lot of different options for people to do that, for Comchemists to come up with solutions. And if it doesn't work a couple of times, they won't come back. They simply drop that and do something else. So that's also something that we are working on to provide those tools to wrap what comes out from academia or from also proprietary software developers and make that work for our scientists. And that brings me to the next part. So ease of use. You have to think that there's a lot of different people using your software. It will not just be computational chemists that invest a lot of time to setting those things up. So one of the biggest edges that proprietary software has over academic codes is the ease of use. Just the way it's interfaced with a GUI or something, it helps a lot. And last but not least, we have and we want to collaborate with academia because we're not in a position to build our own protocols. So the non-equilibrium approach that we just introduced and the previous talks, we wouldn't have had the resource to come up with something like that. So we rely on academia to develop that with and for us. And that's why BioExcel is so great from our perspective. And we're also shifting a lot of interest on Gromax and especially PMX from the growths lab. So we hope that we can collaborate further with Vitas and Bert. Yeah, and that brings me to the end of the talk. So this is the department at the moment. And if you have any questions, I'm happy to answer them. Richard, this was a very good overview of the different methods used by pharmaceutical companies. There is a question in the chat, which method are you using for relative binding free energy calculation? Do you use FEPPLUS? As of now, yes. FEPPLUS is the one that we mostly rely on in-house. We're trying to explore different things, both from other vendors but also from academia. But FEPPLUS has certain edges that I cannot comment on that we need at the moment. Yeah, as you were explaining, it's very difficult. Area 1 needs to use a lot of different methods to try to find consensus with different applications. It's very challenging. It's not straightforward. We're still not at a point in science and technology where you can just push a button and it will give you a drug that works. But overall, I think science has made tremendous progress in this direction and hopefully we'll get closer and closer to this. Absolutely. And as I said, there are a lot of projects in-house that are running a main or a backup series that was derived using those methods. So this is not hypothetical anymore. This is not what could be done. This is being done. Yes. All right. I don't see any questions in the chat. Well, thank you Christian for the presentation and thank you everyone. Now we are going away for a break until 2 p.m. Central European time when we will continue with docking and specifically presentation by Alexander Bonven about HADOC, one of the very popular applications in the field. So thank you for the morning session. Thanks for all the presenters and we'll see each other in one hour and 30 minutes.