 So thanks, Patricia, for this nice introduction and the invitation to be part of this wonderful workshop today. So, microbe is developing computational methods for drug discovery applications. And obviously, as you can imagine, deep learning plays an increasingly more important role in this domain. However, our domain has quite some issues if you compared, for example, to image recognition or to large language models. And the issue is that we have a much smaller set of data, usually in the amount of hundreds or thousands of data points. And that's why training a deep neural network model on such a small data set is quite challenging. And that's why our effort in our group is to try to incorporate as much as possible of physical chemical knowledge into deep neural networks. So to say, combine the best of two worlds, the knowledge we have gathered over the last decades on physics, chemistry, biology, and the other side, learning from the data which is existing. And the hope is that with such an approach, we can learn models using a much smaller training set size and to generate model which generalize much better to unseen data. And this, I think, is a huge challenge. And if you look at a lot of papers which are coming up, I think those, a lot of these models are actually very poorly validated about channelizability. So the type of data which we try to include in our neural networks is our knowledge about physical interactions between molecules, effects like salvation, or even aspects on protein ligand dynamics. And I will show on a few examples how we try to incorporate this knowledge into deep neural network models. So as a summary, I will start with a short introduction about the fundamental questions we actually have in computer data track design. Talk a little bit about the type of neural networks which we are working with in our group. And then I will show you on two examples how we can actually combine this physical world with the data science world. We'll talk about high content screening and the novel design. And then about protein ligand talking in order to predict how drugs actually interact with the target protein. So drug discovery is actually a very time and cost intensive process, starting from the identification of the mechanism of a disease to the identification of the target protein, then finding hit and bleed structures which might be able to target this target protein. And then we typically have a very long cyclic optimization process where we design modifications, we synthesize the modification and then evaluate them biologically. And then with the optimized lead compound, we go into preclinical and clinical studies. And hopefully after an average 10 to 15 years we can go into the approval phase of our track candidate. Now AI plays a more and more important role along this whole trajectory starting from the identification of target proteins, for example, based on omics data, AlphaFold, which we have heard a lot to actually generate a structural model for our target protein. And then aspects which I will particular talk about today to then identify hit and bleed structures, for example, by screening billions of molecules to use docking to identify how those compounds actually interact with the protein. And then we use more and more AI models to actually assist in the synthesis planning of those compounds. We have also now working on methods to estimate efficiently the binding affinity of compounds. But that goes even further into preclinical studies where we have no models to predict so-called pharmacokinetic properties, so how the drug is distributed in our body to tox prediction, which is a very important hindrance of compound development and even to the selection of candidates for clinical trials. As you see that in all the aspects, AI machine learning plays a more and more critical role. Now the question which we are particularly in our group interested is in the so-called drug design part, so which ranges from the hit identification up to the optimization of lead compounds as well as the earlier toxicity prediction. So let's imagine you have this 3D structure of your target protein available. Then the first question which we have to address is where's actually the binding site on this protein? So if you have an alpha four structure, for example, a crystal structure, where can the ligands actually dock and bind and interact with the protein? Then you have typically a large library of nowadays billions of potentially compounds which you can synthesize relatively rapidly. So the question is which of those ligands is actually able to bind? And then how does one of those compounds for example bind to the protein? So which means what's this orientation? It's position in the conformation. And this is very critical if you then want to make for example, rational modification to optimize the interaction of those compounds with your target protein. And then finally we want to be able also to predict what kind of modifications, chemical modifications need to an improvement or also deterioration of the binding infinity in order to optimize the interaction strengths. Now as I said, there are a lot of classical computation methods which try to address these questions but what we are trying to do is to combine AI models, deep learning methods with physical chemical modeling. And the type of networks which we're using, I will just summarize them here in four different slides. I will use them throughout the different sub projects I will discuss with you. convolutional network which you have already seen before. So it's a principle usually used for processes, processing images where in principle run a filter like a magnifying glass over parts of a text here in order to analyze the whole phrase here. And you can do the same with images where you have so-called kernels of filters which are running over the image to generate more and more abstract features of the original image which you can see here down at the bottom. And then we can use this combination of features to make for example a classification if the input image is a female or male face here on the bottom. Another network which we are quite frequently using in the context of chemistry are graph neural networks. So graph neural networks are sharing quite some similarities with convolutional neural networks where convolutional network in principle use information of neighboring pixels to get some more characteristic futureization of the pixel of interest. And you can do the same with graphs where you have of course a much more heterogeneous structure. You can for example learn about the environment of this node by neighboring nodes which are connected by edges. And this node edge, so this graph representation is of course very feasible to handle chemicals where the nodes would be atoms and the chemical bonds would be your edges. And using these graph neural networks you can actually obtain a very sophisticated futureization of the environment of a certain function or the whole molecule itself which can then be used for a downward tasks in the modeling. Now one model which we are quite heavily using are transformer neural networks which you have heard already in the context of AlphaFort. Obviously, chat GPT is based on transform models or also Google translate. And they in principle are working on the so-called attention principle. So for example, if you want to translate a sentence here in English to a German language then what you do is in multiple encoding stages using the attention principle you learn how for example the word it is connected with other words in the sentence or in the whole paragraph if you think about chat GPT. And if it goes from multiple this encoding layers you get an abstract representation of your paragraph in reduced embedding which we call also the latent space. And from this latent space you can then go through a decoding stage by translating word from word for example the generated German word. So for example you start with the start codon and then you translate based on the embedding you have generated the translated sentence word by word backwards. And for example chat GPT is actually using only the decoder stage where instead of using the start codon you use in principle the prompt you have generated and then chat GPT is generating word by word as a response to the prompt which you have provided to chat GPT. Now one other model which has become very popular in the last two years are diffusion models. They are typically used for image generation. So if you want to generate images with Dali or stable diffusion they use this type of generative models. And the idea is the following that you start with an image and you add stepwise more and more noise to it until in principle you have a completely noise image. And then you learn in your network which in principle denoises the image one by one. So it learns in principle what amount of noise it should remove to get back to the original image. And if you condition this network with the text prompt you can then actually generate it completely new images based on the text provided. So here's just a few examples. So if you give for example here impressionist landscape of a Japanese garden in autumn this bridge over a coy pond the leaves of the trees vibrant colors you get this wonderful picture. Or for example here if you want to have some cats eating eyes in front of a Japanese temple then you would for example get this beautiful image here. And this is just learning by the text prompt and the diffusion model to reproduce in principle to generate images. And we will see how you can use all these models now in a biological or biochemical context. So we're using these different type of techniques to come up with a completely new putated drug discovery platform. Again, trying to combine physical chemical knowledge with deep learning. And this network in principle goes through all the whole stage of drug design generating the novel design a library of compounds which then undergoes virtual screening to find some initial hit compounds which bind to the target protein of interest. Then you run them through some post prediction and post ranking to actually get the complex structure of your protein ligand complex. And then you can even want to go to affinity prediction to get an optimized lead. Today I will talk more about the first part of the presentation. So let me start with the first example and this is how we can use these type of architecture for high content screening and do novel design. So what we typically do in a high content screening so this means screening of billions of compounds which can hopefully identify some compounds which target our protein of interest. What we typically do is something called virtual screening for example here similarity based virtual screening. And that starts typically with one known active compound null which functions as a query molecule. And then we have our huge library of molecules and what we do is we take compound by compound from this library and measure the chemical similarity with our query molecule. And the idea is that compounds which are similar in chemistry to our query molecule should also have similar biological activity. So what we do is we then rank our library of billions of compound based on the similarity to the query molecule and the top ranked molecules are then hopefully additional active compounds which we then can further optimize in our drug design process. And the question is how do you measure this similarity? How can you represent chemistry in the computer in order to compare to compounds how similar they are from a chemical standpoint? And I just want to give you two widely used examples. One is fingerprints. As in sprints of what you do is you analyze the topology of your compound and translate them in something like a barcode which you use in a supermarket. And then you just compare the similarity and differences of the barcodes in order to say which compounds are similar and which compounds are different from each other. Or you can do this based on shapes or pharmacophores. And this is important. I will show you an example for this in a second. Sometimes compounds which are chemically very different, they actually have a different, similar interaction with the protein of interest. You see two crystal structure of these two compounds and here you use the superimposition of the two compounds. And then you see they share their shape and they share their interaction with the protein of interest. And also sometimes you need to go beyond this simple 2D representation of chemistry to a more 3D representation of chemistry. Now what I also should say is that this process is relatively slow. In particular if you think about 3D similarity which means you have to align in 3D space first the ligand onto the other ligand and then measure in 3D the similarity of the compounds. So this is quite a long process if you think about that you have to screen or you want to screen billions and multiple billions of compounds. So we had the idea to do this in a completely different way. And our idea is to try to translate this very high dimensional chemical space into a low dimensional latent space. And the idea is to model this latent space in such a way that the chemical similarity is conserved. And so that means that if compounds are similar in chemistry they are close by and molecules which are different are more distant away from for example these compounds of interest. And what this then allows us is to just use Euclidean space distances. So which means you just for example take from your query molecule all the atoms which are close by and then you directly know all the compounds which have a high potential to be also active against the target of interest. And that will be an extremely fast process which can also work then in three dimensional space. So how do we do this? How do we do this translation from the high dimensional space to the low dimensional space? Well, we use what we also use for a natural language processing. We do an out encoder based on transformer models. And here the language is not the natural language of words and sentences, but it's a chemical language which is called smiles here for example. So this is a consequence of letters which completely describes the chemistry of your molecule. And what we then do is we use again as I mentioned a transformer model to translate this language into a latent space or reduced embedding. And with this embedding we want can also then decode back to the smiles the compounds which we put into this out encoder model. And what we developed is an additional loss function which allows to generate such an embedding that it conserves actually the chemical similarity but again in this reduced latent representation. And then as I said before we can just look in the neighborhood and identify all the existing compounds in the neighborhood. And that would be our prediction of potentially new active compounds. Or we can also use it for the noble design. So we can use the same volume but now just randomly generate points of compounds which have not been generated before. And that allows us to generate completely new chemistry which nobody has seen before but which might have a high likelihood to bind to our target protein. So this is really true generative modeling of chemistry and targeting specific target proteins. So does it work in practice? So here's an example where we actually screened 1.5 billion compounds against one reference compounds. We did this for 10 very different systems. And the aim was to identify the 10 most similar compounds for which we know the ground truth. And what you see in the results without our additional loss function to form the latent space based on similarity we are not be able even up to a very large number to identify all the similar compounds. While with our methods in the top 15,000 which means 0.001% of the full data bias we can actually identify all similar compounds to a reference structure. So this gives you in principle an enrichment of a factor of 10,000 to identify active molecules. And we now went actually to 3D space. I showed you this slide before where we was actually shaped in pharmacophore similarity. We trained a model similar way as I showed you before. And as you can see here in the first tests and this is relatively fresh from my graded student Manuel Selner that if you start, for example, with this molecule Rallivixifin which is actually a drug binding the aspiration receptor and anti-cancer drug then within the top 10 structures using our transformer model you could identify compounds which are completely different in their chemical scaffold. But from these three compounds, for example we actually know that they also bind to the same receptor sometimes even with different pharmacology which is actually quite interesting. Now in the second part of my talk I will talk about protein ligand docking. So just to give the context to the previous part of my presentation so what this high content screening typically is doing it reuses the large chemical space to maybe only 10 or 100,000 of molecules. Then we can use more sophisticated techniques like protein ligand docking to reduce this chemical space even further before you actually start to testing them in an experiment. So what is protein ligand docking? So protein ligand docking is principle D method so if you don't have experimental crystallography or cryoAM data available to generate the structure of protein ligand complex. As AlphaFold right now can only predict protein structures when protein ligand structures is currently still a very hot topic on the investigation. So classically how we do this is that we take the protein structure of interest with a certain shape and certain physical chemical properties and then we take the ligand with a similar shape and physical chemical properties and then we generate millions and billions of different configurations and each of these configurations which we also call poses is then evaluated with a scoring function which principle measures how good the ligand fits to the binding side of the protein. And then the pose with the most negative score is then the predicted protein ligand conformation. And as you can imagine by generating millions or billions of these configurations this is a quite intensive process if you think that you want to dock hundreds of thousands or millions of compounds. And typically what you have to do is you have to find a balance between the number of configurations or poses you have generated and how you evaluate the protein ligand interaction strengths using our scoring function. And typically this is tilted towards the generation of poses on the cost of a precise evaluation of those complexes. And what you typically also do is you keep the protein rigid because otherwise the conformation space would just explode and you would not be able to correctly sample at all protein ligand complexes. So we asked the question if we can overcome this limitation of classical protein ligand docking methods. And the first approach which we developed this is my student Matt, my senior scientist Amir in our group is to come up with a completely different idea to do protein ligand complex prediction. And the idea is that you take a training set of protein ligand structures from the PDB you abstract the protein structure by its C alpha atoms and that allows you to model the side chains in principle implicitly. So you include actually some protein flexibility into this context. And then from this protein known protein ligand complex you can calculate the so-called Euclidean distance matrix. So this is all the distances between all the atoms of your ligand with all the C alpha atoms of your residues in the binding side of the protein. So this fully characterizes how the ligand interacts with the protein. And then what you do is you represent also the ligand as a 2D representation because you don't know typically what the confirmation of the ligand is. You feed this protein and ligand information into your network and train it so it can reproduce this Euclidean distance matrix. And once you have trained such a model you can just take the protein structure you abstract it by the C alpha atoms. You take the ligand in its 2D representation you run it through the neural network it generates automatically the Euclidean distance matrix. And this allows you in one shot to actually regenerate how the ligand interacts in the binding side of the protein. There's no sampling just in one runs through the neural network you get the protein ligand complex you then of course need to regenerate your side chains into some refinement but in principle it's no longer sequential it's actually a very fast process. It's just a little bit more details how we actually do this. So we take the ligand as a graph, 2D graph the protein as a three-dimensional graph we feed them to these graph neural networks and then use this advanced featureization or embedding of each molecule and each node of your binding sites put them together, run them through another neural network to get this protein ligand distances which you then can use together with some optimization to reconstruct the protein ligand complex. And this actually works relatively well in practice we compared this for example with some standard techniques which are used for example the auto dogmina which is a very commonly used technique and we see that we can improve our the performance of the model compared to this classical models and which is also important to a significant reduced cost compared to classical dogging methods. Here you also see that we can actually incorporate flexibility relatively nicely in our model here just for example if you look here at androgen receptor so this is the known complex of the ligand bound protein. Here in red you see actually the unbound state so this is the crystal structure without ligand you actually see that it would actually overlap with the ligand so if you don't include this flexibility you will never be able to bind this ligand correctly but what you see is that in our methods the side chains here the thionine and here the slicing were flipped over from this red to the quake line and then allows the ligand to perfectly interact with the binding side of the protein so which means the method is also not just fast but it also allows to incorporate necessary protein flexibility. Now this method right now was only being possible if we actually know the binding side but often we do not know the binding side and this comes into the problem of so-called blind docking and the question what can we do in this context where we don't know the binding side. Now what I also show you here is that very often we do not only have one binding side so this is here for example at GPCR and you can see that we actually have multiple binding side also steric binding side, multiple allosteric binding sides and obviously the binding side is dependent on the ligand of interest so some ligands bind only to a specific binding side and we of course do not only want to identify one binding side but we actually want to identify for a given ligand what binding side is actually relevant for this ligand. So the method which we've developed is called a pocket net and this is a method which again tries to incorporate physical information with deep learning. So what we do is we use some classical pocket prediction methods which are based for example on physical chemical information which generates an ensemble of pockets and then we run this information into a neural network to get with the conditioning on the ligand and then get a re-ranked ensemble of the binding pockets. So how does this work in practice? So what we get from these tools are in principle multiple pockets. From the pocket, from each pocket we generate again a graph of the binding side of the residue surrounding this binding pocket which generate graph of the ligand, some additional features for example the score which we obtain from the different methods but also some physical chemical features of ligand and binding side and then feed this into a simple neural network which again gives us a score about the principle which can be then translated into a rank of each binding pocket. And that actually works quite nicely. You see that our model outperforms quite significantly existing methods to identify binding pockets and in contrast to these methods we can actually also identify the specific binding side for a given ligand which these other methods cannot. There you still need to figure out which binding side actually each ligand would bind to. However, there's a new kit in town. These are diffusion models and diffusion models have been in the last year become very popular in the biology or pharmaceutical area and I just want to show you one example for this. This is DiffDoc from MIT and I showed you previously how you do a diffusion model for images where you in principle noise an image and then you denoise the image. You can do the same with protein ligand complexes. You can start with the known protein ligand complex from the PDB and then for example the ligand would be binding here and then you know noise this protein ligand complex by adding translation rotational and torsional noise to each ligand which means gives you then the ligand in any place around the protein surface. And then if you learn a model which can actually go the backward direction so which means you start from many different configurations of the ligand and then principle denoise it back to hopefully the correct protein ligand complex. And this is a completely new way how you can predict protein ligand complexes and in the initial paper very good results were shown but on a very small test set there was no generalization tested. So and we tested it then actually on a much more chilling data set and the method completely failed which means this is also an indication that shows you that these type of models which are trained on a relatively limited amount of data cannot generalize to your protein or protein ligand complexes. And our idea now is to integrate physics into these diffusion models. And we do this in three different on three different levels. First we calculate salvation information in form of hydration sites. We'll talk about this later in more detail. This is a critical component if you want to understand protein ligand interactions. We also predict interaction sites so-called pharmacophore. So that tells you in principle which kind of chemical entity of a molecule binds at a specific place in the binding site. And then we also add physical interactions throughout the diffusion process. For example, repulsive forces that the ligand cannot collide with the protein. And if we do this, then so as again we tested this on a much more realistic data set where we do this three to one split. And in particular that the test set is as diverse as possible to the training set. So because you don't want to test it on a system for which you already have data that's not interesting for drug discovery applications. And what you see here is if you do this on a more challenging data set the success rate of DiffDoc which was originally published to be around 40% drops to below 10%. But we can actually rescue the performance of these diffusion models by actually integrating physics where we reach now a success rate of around 40% at least in the top one position. It's still not 100% and we are still heavily working on that but you can see that by integrating physics you can actually much more significantly improve the generalizability of these neural network models. Now the last few minutes I'm going to talk about the second part which is actually the scoring problem. So far I talked about how we can more efficiently sample more precisely sample protein ligand complex. How can we now improve the quality of scoring? So which means estimating the strengths of interaction between protein and ligand. And for that we use classical convolutional neural networks which are usually used for using pixelated images with different colors. We use RGB colors to represent the colors so these would be multiple input layers and which represents for example, different birds, cats, dogs, and ships. We do them, run them through multiple convolution layer to get abstract features which you run then through fully connected neural network to classify for example, that this image here is an image of a boat and not a bird, cat, or a dog. And we can do a similar approach now for protein ligand complexes. But in this case we are not talking about 2D images but three-dimensional images. So what we do in practice is we take known and protein ligand complexes but we take the ligand out and we dock it into the protein structure. We generate 25 poses at least one of them is the correct configuration. And then what we do is we represent the protein by different layers of a three-dimensional image. For example, one layer would be the density of carbon atoms. One would be the density of oxygen atoms, et cetera, et cetera. We do the same on the ligand side where we have different densities carbon-dense oxygen densities of each of these different poses. And then we combine these layers in three dimensions from protein and ligand into the convolution network and that convolution network then learns to classify which of those poses is the correct one and which is the incorrect one. And if you apply this, so what we did is we actually trained this only on 600 protein ligand complexes with very small training set because we really want to see that we can generalize to very new protein ligand systems. Here we use a test set of 2000 protein ligand complexes. And what you see here is by using this re-scoring with the convolution network we can approve the quality by about 10% compared to classical M-scoring methods which is good, but not what we actually would have wished for. So we asked the question, what is missing in our model? So far in our model, we only have studied the protein ligand, direct protein ligand interaction because we have only used images of the protein and images of the ligand. However, we know in protein ligand binding that the free energy of binding which actually determines if a ligand binds, how the ligand binds is actually the free energy difference between the bound complex and the unbound structure. And in the unbound structure, protein ligand are not in vacuum but they're surrounding by a lot of water molecules. And before the ligand can actually interact with the protein, these water molecules have to remove from the binding side and around the ligand which we call desolvation. And this desolvation is actually very critical for protein ligand binding. It's actually very often the driving force for protein ligand binding as well as for protein-protein interactions and other association events in biochemistry. But also sometimes water mediating interactions between protein and ligand. So how can we now incorporate this information into our neural network? So how can we incorporate more of the physics into our neural network? So what we actually did is we predict now this hydration information. So we run, for example, MD simulation with a tool which we developed a long time ago, what site to measure the density of water which gives us then this high density clusters. And from the trajectory, we can then also predict entropy and the free energy of binding for each of those water molecules. Then gives us a clear picture. For example, all these red water molecules are water molecules which are not happy. So if you replace them, that would be actually beneficiary for the binding affinity. And in blue, you would have waters which are actually happier. So you don't want to actually replace them. And based on this profile, you can actually estimate which binding pose is more fably than other binding poses together with the protein ligand interactions. So how do we incorporate this? We use this information, generate now again additional layers. For example, water occupancy or the free energy of deservation and add them to the input layers of protein and ligand. And again, learn a convolution network. I will talk about this in a second. If we applied this on the same training and test set, we saw actually now suddenly a significant improvement in the docking performance where we can reduce the error from about 40 to close to 10%. And that shows you that if you integrate really your knowledge into this neural network, here, for example, the importance of salvation and deservation, you can significantly improve those models even with a very small training set. What we also did with this model is a lot of people think neural networks are actually black boxes, but this is actually no longer true. You can do a lot of interpretation of the neural network. We used here something which is called layer-wise relevance back propagation. So in principle, you learn what of the input features are most relevant for the output of the neural network. And from that, you can actually obtain a lot of valuable information. For example, you can identify the importance of the different parts of the ligand, the most important residues. But what we are particularly interested is actually the deservation contributions. So for example, we identified water molecules coinciding with water-mediated interactions, which you see here, for example, in the crystal structure. We saw water molecules with positive enthalpy. So this is actually not surprising because this are typically in hydrophobic regions where water molecules don't like to be and that has to be typically occupied by hydrophobic portions of your ligands, nothing else than the hydrophobic effect, which we are quite familiar with. But what we also identified, that was a little bit surprising. We found water molecules with negative enthalpy, which are important for estimating the correctness of a protein ligand complex. We call this anthropically favorable first shell water layers. That has not been recognized until this study in much detail or even a very few studies before. And for example, by Geyhard Kleben, Marburg, and what he had, for example, then identified in crystal structures is that, for example, this ligand here is more of this ligand. It has nothing to do with direct interactions between protein and ligand or deservation. It has something to do that this ligand destroys this anthropically favorable water network, which is something like a cap you put on something though that the water cannot go out. And here it's principle the same principle, or if you want to say it in simple words in principle, if you want to break out out of the prison or the binding site, then here you have to actually pay a higher price because you in principle have to destroy this water network. And this brings me to the end of my presentation. I hope I could demonstrate that if you actually integrate your physical knowledge into neural networks, you can significantly improve in particular the generalizability of this model. And this is actually a huge challenge, not just in our field, and sometimes it's a little bit underappreciated because you really want to apply this new network on your stuff, not just the interpretation of stuff which you already know. And with this, I'd like to thank the people actually who did the work. I showed you results from Manuel, from Amre, and from Matt predominantly today. And with this, I'd like to thank you for your attention. And thanks again for the kind invitation to this workshop.