 So the topic you can read, deep learning for excited states and molecular design. And so please. Thanks a lot for the introduction and thank you for the invitation as well. My name is Julia Westermeyer, and I'm a postdoc in the research group of Reinhard Maurer. And I mainly work on machine learning for excited states. But I want to show you how we can use this to enable molecular design. And I want to start by asking you why do we actually need to speed up molecular material design? But if you think of the main challenges of the 21st century, then there are two pressing issues. One would be, for instance, global warming. And I guess we all know this heat map, which shows by how many degrees the temperature has risen on Earth. But we are also currently in an emerging energy crisis, where our energy needs are constantly increasing. And the energy sector is actually one of the main causes for the greenhouse gas emission. So they are directly related. And what if we suddenly hit the point where we need to design a material fast, like scientists said to design or develop a COVID vaccine? Would we be able to find a solution to the current problems? And if you think about the history, I guess we would not. Because most material design take about 10 years and cost more than $10 million. And to find out why this is the case, I want to guide you through the usual material process. And we usually start with a material concept. So now let's consider our task is to develop and design a new optoelectronic material. We all know that photovoltaics are extremely promising because they are likely to meet most of the energy needs in the next couple of years. And they can, yeah, and they are very promising. So, but if you want to design a new material, then you have to satisfy several criteria. So one would be it should be easily accessible, sustainable. It should have an ideal band gap. And it should be better than those materials that we have before. And most often the requirements are competing. So it's very difficult to find a material that satisfies all of them in the best possible way. So how can we find such a molecule? But let's consider for a moment that we have done this task and we found a perfect material. So we can ask our nice colleagues in the lab to synthesize it. And then we have several questions that need to be answered. So can we synthesize a molecule? Is it stable? Is the synthesis sustainable? And so on. So there could be some challenges that might make us reconsider our concept. But if this works, we can move on to the device construction. Again, we could face some challenges, but then we could as well move on and make a file application device testing, accessibility and the electronics. And at this stage, it's extremely important that we as theoreticians get some feedback from the experimentalists to find out how we can actually improve our initial concept. So there might be several loops and this takes a lot of time. And eventually, in the end, we hopefully get the manufacturing stage and to industry. But unfortunately, most often we are stuck here because this really takes a lot of time. So in this talk, I want to show you how we hopefully can get better initial material concepts to speed up this process. So there are several ways how we can come up with a material in the beginning. So the most traditional or most intuitive way would be to either sit in front of the computer or draw on a paper, one molecule that already exists. So we could do an experiment or do a calculation and then we get the property of interest that we have defined as important for our target. Then what we could do is we could slightly change the molecule. For instance, here we could add a carbon atom. Then we can do another experiment or a calculation, get another property and we can compare these properties. So we know what do we need to change in order to get to the property that we want. But this takes a lot of time so we would not improve the whole concept. There is another way to do it. It's high throughput screening and we could take all molecules that we know. We could do calculations. Now, if you want to do experiments, we likely need robotics and automation. And then we could map all these molecules here in our property space. So we can actually find the molecule that is best suitable for our task. But there are some problems. So this is very biased because all the molecules that we draw from, they actually already exist. And as you've heard in Francesca's talk, we only know about 10 to the power of four molecules but the chemical space is much larger. So how can we get to these molecules? And if you think back, most important discoveries are made by accident. So we cannot wait. If you think of global warming, we cannot wait until we find the next material by accident. So what if we have our property space and we know here is the molecule with the multiple properties that are optimal for our tasks? So can we enable the inverse process and to get the molecule that satisfies these multiple target properties? And one way to achieve this with generative machine learning, you have heard a lot about it in the last days. So as you've already know, it's unsupervised learning technique. And the goal is to learn from a data distribution. So what we want to do here is we want to learn to build new molecules. So we want to learn the rules and apply them to generate new ones. And I usually like to replace here the brain with a magic cat because once it's trained, it actually acts like a magic cat. So you can pull as many molecules out as you want. And once you have created a new data set of millions of important molecules, you need to then find a way to screen these molecules because you want to find the most suitable molecule for our purpose. And there we need high throughput techniques. So we need kind of supervised learning techniques. So there are three ingredients that we need in this whole process. The first one is a data set as obvious. Then a generative model with which you can predict new molecules that do not exist currently. And then we need a predictive model to screen these molecules. And I want to go through each of them separately. So the first one of the most important part of, I guess, any machine learning study is actually the data set. And our goal is to design new molecules that are potentially relevant to optoelectronics. So there is actually one data set that is very, very promising for this purpose. And it's called the OE62 data set. Among the R4s, there is Melissa Todorovich and Patrick Rinker, one of the organizers here. And this data set contains 62,000 data points of organic molecules which are extracted from organic crystals. So they are potentially relevant to optoelectronics. And what is also very nice compared to other benchmark data sets, it's extremely diverse. So the data set has a rich chemical diversity and it contains molecules that are extremely small but also very large with more than hundreds of atoms. And you can see here, we have up to 16 different chemical elements. And what is also very nice is that it, all the data points contain orbital energies at the PBB0 level of theory. So we could actually use this to train another model to screen them later. So the data set we had, then we needed to move on to the generative model. And the goal was then to use a generative model and learn the data distribution of the OE62 data set to create molecules that are similar but not equal. The model that we use is Chichnet. It's an auto-regressive generative neural network that was developed by Christoph Schütte and Niklas Gebauer. So the initial layer here you might already know is Schnett. You probably heard it in the first day. And what is really important about Chichnet is that compared to smiles or two-dimensional structures, Chichnet can predict structures in three dimensions. And this is extremely important because any photo-electronic property is extremely prone to, it's extremely important to have a good structure because if you slightly change the structure, the optoelectronic property can change a lot. So we really need to have structures that are close to the equilibrium. Yes, so I will skip the theory of Chichnet because you've heard a lot about it. So we got a lot of help from Niklas Gebauer and Ryan Beric, a very talented undergraduate student who worked with me to adapt Chichnet such that it works for the OE62 data set because it contains much lighter molecules than for those data sets where it was developed. And once we had trained Chichnet, we could predict arbitrarily many new molecules that were potentially relevant to optoelectronics. But now in principle, we trained it and we predicted many new molecules but how can we actually be sure that these molecules are actually comparable to those in the OE62 data set? And therefore we compared, for instance, the elemental distribution. So you can see here these are the elements in the OE62 data set and in the IQ you can see the original data and in light you can see the Chichnet predicted molecules and they actually match very well. There are some differences, but this is a log scale so they are still small. Then we also compared different bond lengths and angles and you can see that they actually, the distributions match quite well. And finally, as I mentioned, it's very important since we are dealing with three dimensional structures to get structures that are really close to the equilibrium. So what we did was we took some generated structures and reoptimized them with density functional theory. And you can see here, so this is the molecule obtained from density functional theory but also from Chichnet and you can see that they are almost identical. So we have most molecules that have an RMST of about 0.5 angstrom. And you can see here that this might look like a lot but still if you look at the molecules, they look very similar. So these are again two structures that are overlaid. There are some outliers with about a root means very deviation of three angstrom. And you can see that, well, this don't look that good anymore but these are outliers. So we were in general very happy with the performance. So we also had our generative model. And then finally, we needed a predictive model to screen molecules for optoelectronic properties. And that was actually the most difficult part of this whole concept. And before I talk about the predictive model, I first want to talk about how can we actually characterize molecules that are useful in optoelectronics. So the experimental technique would be photomission spectroscopy where we perturb a system via light. So in the talks before, you've heard a lot of ground state but now we add light to the system. So what we now have, we have many more different states than just one. And we need to describe these. So there is photomission spectroscopy where we can probe the occupied states which are light onto the system and an electron is ejected. And by taking the energy of the light, the work function and the kinetic energy, we can get the energy level of the occupied state which is equal to the negative of the ionization potential. There is also inverse photomission spectroscopy where an electron is added to the system and we can probe the unoccupied states in this way and get the electron affinity. And the gap is then called the fundamental gap. And all these properties, ionization potential, electron affinity and fundamental gap are really important properties to characterize molecules. So as I already mentioned, the OS62 dataset has data points with orbital energies. And I was very happy when I realized this because very often the homo-lumo gap, the homo and the lumo energy levels can be used as approximates to the fundamental gap ionization potential and electron affinity. So what I told the current PI of the research group I work in, he told me that if I would like to fit the orbital energies, this will never work because he tried it and it didn't work. But we are all very optimistic researchers, I guess. So of course I tried and I thought why shouldn't it work? So what I did was I wanted to use a neural network which was SNET for excited states that I basically developed during my PhD and I wanted to use it now to fit the orbital energies. So I wanted to have a multi-dimensional output and we decided to fit about 50 values. I guess you've already learned about SNET and the nice thing about SNET is that it's message passing. So it actually also learns the descriptor. And with this orbital energies, we then wanted to convolut as a photo emission spectrum. So as my supervisor has predicted, of course fitting didn't work. And the problem that we saw was that the orbital energies were not smooth functions. So here you can see an arbitrary reaction coordinate and in blue you have the occupied orbital energies and in red the unoccupied orbital energies. And you can see that at certain regions we have casps of the energy levels and these are non-smooth functions but machine learning models are smooth by definition. So we needed a way to outsource this problem of the non-smoothness of the target values. So this approach didn't work. So what we did was we adapted the neural network such that the last layer represents a latent Hamiltonian matrix. So this can be seen similar to, for instance, in excited state diabetic representation or local atomic orbital representation. And so we basically then predicted as many values as we needed to fill a Hamiltonian. And so this Hamiltonian was then also constructed to be symmetric and diagonalization of this Hamiltonian should give us eigenvalues. And what we then did, we mapped the eigenvalues to the orbital energies. And in this way we could outsource the problem of non-smoothness of the targets. And we actually learned something that is smooth. And yeah, this approach then finally worked and you can see here we have some matrix elements and you can see that here we allow crossings. So the functions that we predicted were actually smooth. So this finally worked and we could predict photomission spectrum which was very nice. But then we compared to experiment and what we didn't consider was that the accuracy of DFD compared to experiment is not very good. So we needed better accuracy. And one thing that is also very, very nice about the OE-62 data set is that it contains GW values. And GW can correct DFD orbital energies that are wrong to the neglect of the self-interaction energy. So the level of theory is G naught W naught. It's a single shot perturbation and the perturbation is the electron that is added or ejected. So we have quasi-particle energies that are very accurate and can be compared to experiment. But there is one problem. We only have 5,000 data points and whoever has worked with neural networks probably knows that this is not a lot. So how can we make use of little data? And you have also heard in previous talks the Delta machine learning approach and it is a very powerful approach to learn from little data. And I want to show you this plot that you probably also know. It shows the number of data points against the prediction accuracy. And if we have a conventional machine learning model and we want to reach a decided accuracy, we need a certain amount of data. So now for whatever reason, we have 62,000 data points and we get the decided accuracy. But now if we have 5,000 data points, the accuracy drops and we need more data to achieve the same accuracy. So what we did was instead of learning the method itself, we learned the difference between two methods. So what we learned was the difference between GW and the predicted DFT orbit languages. And in this way we could get away with much less data. So what we had in the end was we had two neural networks, one for the orbit languages and another one for the correction to this first machine learning model. And with this we then obtain partial particle energies which we could use to predict photo emission spectra. So you can see here the photo emission spectrum of fluorine that is not included in the training set. Again, PB0 and experiment, which do not match. But if we add the second machine learning model to correction, then we get a accuracy that is very good. We also wanted to see whether we compare different molecules that are quite similar in the structure. And we have chosen phenatrine. And here are the orbit languages, then experimental spectrum that do not match. But when we again add the second, the delta machine learning model, we get the accuracy that is quite nice. So we also had our predictive model and now we needed to combine everything. So we wanted to know can we now use this predictive machine learning model for the structures that were generated with Chichnet? Because if Chichnet predicts structures outside of the original data, then it might not work. We therefore plotted the ionization potential electron affinity and fundamental gap of the OE62 dataset and Chichnet predicted structures. And you can see that the distributions are quite similar, they are slightly shifted, but for high throughput screening, we were quite happy with it. We also assessed the accuracy by doing some reference calculations. And we found that the accuracy was in the range of 0.26CV, which is to be honest quite a lot, but for high throughput screening, we were still happy with the accuracy. So we had our final setup with all these different types of machine learning models and in principle, we could then generate millions of new molecules and select those that are the best ones. In principle, we could then give this to our biological or synthetic chemists and our task could be done. But this would not help us a lot because if you think back, why do we use generative model? We learn a data distribution to predict a data distribution that is similar, but we don't want structures that are similar, we want structures that are better than those before. So we wanted to design molecules that have better properties that are not in the original data set. So how did we do this? We started with this same setup, we trained the generative model, we predicted structures, then we screened them and we had the selected molecules of target properties. And then we used these molecules and bias the generative model. So this was already shown for the QM9 data set and Homo Lumo gaps by Nicholas K. Bauer. But what we did, we did this iteratively and we wanted to find molecules that have a small fundamental gap. So we wanted to find optoelectronic molecules that have the smallest possible values. Okay, so here I show you the results. This is the distribution of the original data set and this is the fundamental gap. And then we have chosen only these structures to bias the model. So the distribution that we get after biasing is slightly shifted towards smaller values which we envisioned. So we again, then have chosen these molecules. Again, we bias the model and we did this iteratively until we ended up with a distribution of molecules that had properties that were outside the dose of the original data set. And in the beginning, we were extremely skeptical about it because you probably all know that if we have data points that are outside the training set, model can predict some random values. So we have selected some data points out of the last loops and we did some reference calculations. And here you can see the fundamental gaps in the OECD data set and then here of the last iterations. And you can see that these are really smaller than those in the original data set. So we could create principal molecules with properties outside the training set. And we wanted to know in which parts of chemical space do these properties live. And therefore, Joe Guilques worked with me on this project. And what he did was he used unsupervised learning to analyze all of the data. So we wanted to find the rules which make these molecules have such small fundamental gaps. And what he did was he defined two different descriptors, structural and bonding descriptors. Bonding descriptors included some aromaticity of the molecule and then we had the structural descriptor and therefore we used the soap descriptor. And you can see that the dimensionality of these descriptors is extremely large. So we cannot just look at them and say, okay, that's what makes our molecules so nice. So we did dimensionality reduction with just this principal component analysis and then we plotted the data points. So you can see here the data points, so the chemical space that is made up by the molecules. And here you can see the first principal component and the second principal component that make up most of the variants in our data. So in light, you can see the molecules of the OE-60 dataset. And in colors, so the color code shows the iterations. So what you can see is that with progressive iterations we somehow hone into a certain region of chemical space. So we actually do not leave the original chemical space, we just leave the original property space. And this also explains why the predictive model works because we actually are not in an extrapolative region, we are still interpolating. So we wanted to know what are the bonding patterns and the structural motifs of these molecules. So can we find some rules that we can tell to synthetic chemists that they should apply them to get some good molecules? So we did some clustering, we used the principal components that we obtained from PCA before. And here you can see the principal component plotted against the fundamental gap and the color code is again the loop. And when we did clustering here, you can see the clusters that are shown in different colors. You can see that they are kind of distinguished with respect to the energy, but not strictly. And we extracted the centroid and the nearest neighbors out of these clusters. And we found some very common groups that are actually in use of molecules in optoelectronics. So for instance, tetra tierful valence, which are considered the bricks and rotors of optoelectronics. And what was very interesting, we found in almost any molecules, a lot of sulfur, selenium and cyanobrubes. So we looked into it in more details. And here you can see the elemental distribution of the molecules. Again, in blue you see the distribution of the original data set. And then the changes with progressive iteration. And what you can see is that the oxygen content decreases, but the sulfur and the selenium content increases. So this, let us conclude that made the assumption that replacement of oxygen with sulfur selenium can lead to smaller fundamental gaps. So we wanted to verify this and we actually really found that if we replace oxygen with sulfur, we decreased the fundamental gap in modern 1EV. And this is even more severe if we replace oxygen with selenium. So we could find some rule that was nice. And we also plotted the distribution of the kaibon nitrogen bond length in the molecules. And you can see here those of the original data set. And then the distribution of the bond distances that we get with progressive iterations. And you can see that we really get a lot of molecules that have mainly kaibon nitrogen triple bonds compared to single bonds. So to conclude, we were able to predict molecules that have properties outside of the data set. But all of these molecules contain some unusual bonding patterns and they are not very easily synthesizable. So if we would then go to our synthetic chemists and our collaborators and tell them, okay, you should just put a lot of selenium into the molecules. They might not be very happy with this. So what we wanted to look at was the complexity of synthesizability. How long do I still have? Okay, 10 minutes. I still have 10 minutes or not? No? I can... Okay, because we have lunch soon, I will speed up. So, okay, thanks. So, yeah, as I said, if we tell the synthetic chemists that they should put a lot of selenium into the molecules, they would not be happy with that. And in fact, if we look at the complexity of synthesizability, it strongly increases with the generative molecules. So here, this is a plot of the synthetic complexity and this is the original data set. And all the molecules that we generated had larger complexity of synthesizability. So we wanted to do multi-property biasing and we now screened molecules. We did the same again and we screened molecules that satisfied both conditions, small fundamental gaps and small synthetic complexity. And you can see here, if we do this, then we again get molecules that have smaller fundamental gaps than those in the original data set, but they are not as small as before. But on the other hand, we get molecules that can now be synthesized. So there is some trade-off that we have. So we also looked at the elemental distribution later and you can see that biasing towards multiple properties gives us a Pareto optimal solution. And what we found was you can unfortunately cannot see here, but it decreases the occurrence of selenium atoms. So by biasing against synthesizability as well, we really find molecules that are nicer to synthesize. And yes, so with this, I already come to the conclusion and I hope that I could show you that by combination of generative machine learning and predictive machine learning, we can design new molecules. And by iteratively biasing, we can actually leave the property space of the original data set. And what is also very nice is that multi-property biasing is also possible, but it is a Pareto optimal solution. So we may have to find somehow the best of both worlds. And if you're interested in a predictive model, you can find it here. And also some advertisement together with Christoph Schütt and Michel Gastecker, we have written a perspective on machine learning and computational chemistry. And if you're interested in machine learning for excited states, I recommend this review here. And with this, I want to thank my current research group, some amazing collaborators. And I want to thank you for your attention. Thank you very much. Very nice talk. Sorry to dominate the questions, but would you be able to just give a few more details about the generative model that you made? So I didn't quite catch how the encoding or decoding or however it works is actually done. Yeah, sure. So the generative model was developed by Christoph Schütt and Niklas G. Bauer. And it's actually a schnett. And what it does is it models a probability distribution. So it has the, it wants the probability to have. So we start with a focus token and then it puts the next, it has a probability that the next atom type sits at a certain position and also which element, also the probability which element this is. And in this way, it somehow builds the molecule. And there also exists some new version which is a conditional generative model. And I think it was recently published vitamin nature communications, if I get it right. And there you can have a joint probability and you can also add some properties that you want to bias against. So this would be very similar. Okay, so your model was sort of iterative scheme where you're adding one atom at a time. Like, what's his name? Björk Hammers has a similar conclusion on that. But I guess you guys have the invariance in there through schnett. So your rotation invariant in your prediction. Yeah, okay. Thank you. One more. Hi, I'm happy you found OE62 so useful. I have a question also on this generative model. So you showed that the elemental space of the generated molecules was very similar to what was an OE62. But is there actually a way to get out of the elemental space? So could you predict molecules that would have a different species than atom from a different species? No, no. If we can only predict molecules with elements that are in the original dataset. Okay, so is there maybe a way to step out of this and find, you found sulfur to be very good, but is there maybe another element you could add? I think that if you would add a few molecules with new element types, it would be possible. But I think otherwise you need to have them in the original dataset. But we can discuss this with Christoph later. Okay, one more question. Thank you for the nice talk. I would like to ask if it's, how hard would be to put their analytical properties that could be interested, let's say for the industry like how stable the molecule is under sunlight, not telling that we have the properties calculated now, just in principle, you need to know how stable is the homo, the excited state and so on. Thank you. Okay, so as far as I understood the question, you were asking how difficult it is to add more properties to the biasing. So this is one of the problem of this method because the more properties you want to bias against, the more molecules you need to generate and the more expensive this whole procedure gets. I mean, one loop takes about one or two days, so it's okay. But the more you add the more properties you need to generate, the more properties you need to scan through and then it gets more expensive. And in order to improve this, the next step would probably be to use conditional generative models, which somehow can predict molecules that satisfy both the structural and the property distribution. But in principle, it should be possible, it should be perfectly fine. Okay. Yeah, there's a question at Zoom. Yes, sir. Let's check. So this is a bit of a loaded question. Why does your generative model only interpolating the known chemical space? How to explore the unknown chemical space? Your questions, answers on a postcard, please. It's a very good question. Yeah, so on one side, we want that the generative model predicts molecules that are similar. So I'm not sure if there's an easy way how to get outside the structural space. But since we also had this, the predictive model, it is actually very good because if we move out of the space, then we also needed some active learning approach to adapt the predictive model and then it would get more complicated at all. Okay. So I think there are no more questions. So let's thank again the speaker and the speaker this morning. And...