 Mark, for I think almost everybody already know him, but for those that do not know him, he's professor at the New York University. And I mean, now I have here a list of all the editors that different place of positions of editor that he has and it's something like huge list. Yes, theoretical chemistry accounts, journal of physical chemistry, journal of chemical theory and computation. And he also has a position at the New York University of Shanghai, if I'm not wrong, where we meet for the first time, Mark. And I don't know if you want to remark anything else, I will let you start and please. All right, thank you so much, Alex. For me, it's a real pleasure to be able to give this seminar. My big regret, of course, is that since I come to Trieste almost every year for the last, I don't know how many years, it's been a while actually, that I can't be there this year, it's a wonderful place and I really miss it. But this is the situation we find ourselves in and so here we are doing a Zoom seminar, which I hope will work out well. As Alex said, I have a couple of different affiliations, including the chemistry and the math departments at New York University in New York. And so down here in the lower right, you see our campus in what I guess will be our upcoming season, the fall season in a few months, that's Washington Square Park and in the back there are some NYU buildings and then the NYU Shanghai, which again, I didn't get to go this year, I go every year, usually for a while and this year was called off, but this is the NYU Shanghai campus area in the eastern part of Shanghai called Pudong, very close to the river. It's a very exciting city if you've ever been, it's really a fascinating place. Anyway, so let's see how this works out with Zoom. What I'm gonna talk about is some developments that we've introduced for studying structure and phase behavior in mostly molecular crystals, a little bit of atomic crystals, the sort of benchmarks that we do on the way to being able to do molecular crystals. And these things involve pure mathematics, topology, which I'll talk a little bit about, molecular simulation, which we're gonna see actually brings us into what I think is the next frontier in this area and a little bit of machine learning. So we're now about a decade past when we were celebrating the 100th year anniversary of X-ray crystallography and crystal structures, experimental crystal structures. And of course, since that time, databases have opened up and we now have access to hundreds of thousands, if not millions of crystal structures for proteins. Here in the protein data bank, the crystallographic open database, and of course the biggest one being the Cambridge Crystallographic data center. So since this 1912 work at Maxwell Lauer, there's been a lot of follow-up in studying and producing crystal structures. And one of the biggest problems that we have to deal with, and this is by way of giving you some background as to why this is such an important area, and why computation is poised to play a big role here, is this problem of polymorphism and being able to predict crystal structures. So all this term refers to is the ability of a compound to form multiple crystal structures. And an interesting example of this is the pharmaceutical compound, rotigitine, and I have to admit, every time I look at this word, I get a confused with rigatoni, I think it has basically the same letters in it. So, but it's not pasta, it is actually this. All right, that's rotigitine, and it is a pharmaceutical that's used in constructing this thing called the NuProPatch. And you use this for people with Parkinson's disease to relieve their symptoms. Now, one of the nice things about this is that it actually came out of the academic setting, so it came out of Groningen in 1985. It was licensed to short pharmaceuticals in 1998, and it was finally approved for use in Europe in 2006 and the US in 2007. And at the time, they thought, this was the only crystal structure that you had. It's called form one, all right? And you put this crystal all over the patches as particles, they're absorbed by the skin, and this is how you get your medication. Well, it turns out that if you let these patches sit, you start to see these kind of snowflake-like patterns appear on them, and this is the appearance of another form, a very stable form actually called form two. It's a polymorph, all right? It's called a late-appearing polymorph because you don't see it when you make form one, all right? So this is basically predicted by Oswald's rule, which says that the least stable forms of a crystal will be the ones that form first, and then the most stable forms will form after that, and your hope is that at least your less stable ones are metastable, right? But in this case, that's not what happened. You had a conversion in a late stage to this form two, and it turns out that this form two is very stable and it's not absorbed by the skin, so this drug becomes bio unavailable at that point, and as a result, the drug had to be recalled in 2008 and it was unavailable for use for four years while they tried to reformulate it as some sort of amorphous matrix, all right? And this is not what you want to have happen. Another very famous example of this kind of late-appearing polymorphism is the very well-known case of Retronovir, which is one of these drugs used in anti, or in treating HIV, so AIDS therapies. It's a protease inhibitor. You can see the kind of dollar amounts that go into making one of these drugs. That's the molecule, and was originally put on the market in 1996 as ordinary capsules. They said you just put it on the shelf. You don't have to refrigerate it or anything because at the time they thought this was the only crystal structure that you had, all right? What it does is that once it comes apart, the molecules go in and they bind up the active site of the HIV protease, which is one of the proteins in the enzymes that is responsible for cleaving long polyproteins into active viral particles. Well, again, it turns out that if you just let this sit around, there's a late-appearing polymorph that shows up called Form 2, and it's very stable, and that means it's a very low-free energy one as for solubility, and therefore it's not bioavailable. And you can see the difference between these two things, right? One is a kind of blade-like structure. The other forms these needles, and so these are very different kinds of crystal structures. Well, again, the drug had to be recalled. It took four years to reformulate it as a kind of a gel cap to prevent this from happening. And obviously, any time you have to recall a drug, there's a big PR head, there's a lot of money that goes into it. You'd like to be able to anticipate when this is going to happen before it does so that you avoid this problem. And it's been suggested actually as recently as 2018 that somewhere between 15 and 45% of pharmaceutical compounds could have these late-appearing polymorphs, right? Now, during the crystallization experiments that you need to find all of these polymorphs and to figure out their free energy ranking, especially when some of them are late-appearing polymorphs is an extremely difficult thing to do, all right? So in material science, this is an outstanding challenge, you don't usually think of organic molecular crystal specters and pharmaceuticals as an area of material science, but it really is. It is an area where computation could actually play a very significant role if you have the right algorithms because they could anticipate when this is going to happen before you actually have to do all these time-consuming, expensive, and difficult experiments. Okay, of course, the other area where this comes in is in a patent law, all right? So let's suppose I have two bioavailable polymorphs. This is an example of Zantac, right? Renididine hydrochloride, which is used in the treatment of heartburn. This is actually what's recalled last year because of cancer scare, all right? The drug also has the ability to cause cancer, so they took it off the market. But it's an example where if you're not careful about the protocols that you put into your patent application for making one form over another, then when patents expire and generic drug makers come along and say, okay, we want to start getting into this game, and they try to follow the procedures given in the patent application, or they may find that they get a different polymorph, all right? That usually leads to legal proceedings, and this is one case where this actually happened, all right? So it was Glaxo who had the original patent on form one, they also tried to patent form two. Nova Farm came in and said, well, your patent really only covers one of them, we're gonna go for the other one, and it became a big legal battle, all right? And this is a drug that lots of people take. So this had sales in the billions of dollars, all right? So again, this is the kind of thing that you'd like to be able to anticipate before you have to go to court and go through long, arduous, and expensive legal proceedings. As I said, this drug is no longer available on the market because of its cancer risk, but it raises questions of, what constitutes a distinct polymorph and do the specified manufacturing procedures always give you the one that it says it's going to give you? So when you have multiple bioavailable polymorphs, this is something you have to worry about. Now, another area where, especially NYU has gotten very, very active in is pesticides. And so I like to use this picture as a metaphor for what I'm gonna be talking about. So this is in Na'ika Mines in Chihuahua, Mexico, where you can actually go visit these mines and you can walk on these gigantic gypsum crystals, right? But when you do that, you become a crystal walker. And so this is my metaphor for this very, very old cartoon. You can see 1946, right? For what they thought a fly foot might look like, walking on a surface that's been covered with this DDT, all right, this whole pesticide DDT. And the idea is that the molecules would be absorbed from the surface into the foot of the fly, into the pads that would go and then shut down the nervous system or do whatever it does to kill the insect, right? Now, one of the things we've been doing at NYU experimentally is trying to look at creating polymorphs of these pesticides that may have greater lethality, all right? So this is the known form of DDT. And this one over here, this B is another form. So form one is the one that's known. This form two is a new polymorph that was actually made at one of the labs in NYU. And it turns out that because it's less stable thermodynamically, it's metastable, it turns out to be more lethal. And you can actually prove that by doing a kind of controlled lethality test, all right? So I'm gonna show you, I hope this movie will sort of work. So these are fruit flies that have been exposed to either no pesticide as the control group and then the form one informed two crystals, right? And as this movie rolls, you'll see, of course, nothing happens to the control group. Those that are exposed to form two, you'll see a faster die off or a faster knockdown time than those exposed to form one, which verifies this idea that you can actually create a better insecticide just by exploiting polymorphism, right? So you can see the knockdown time or the kill time of form two is faster than that of form one. And you can do lots of these trials and you can actually plot what's called the kill time or the lethality curve for the different polymorphs. So here's form one in three different trials and the red is form two and look at the rise time, all right? So the lethality is much better for form two. And so this is a new area in agrochemicals and so I talked about Austro-Alt's rule of stages. This kind of thing led our collaborators to talk about Austro-Alt's rule of lethality, which is at the least of an... Mark, can I interrupt you for a second? Please. Yeah, so what's the difference in the solubility of form one and form two? Form two was going to be, well, it's absorbed faster. So I imagine that its solubility is probably the same as form one. I don't know if it's higher, but I think it comes apart faster. Okay. So it will dissolve faster. I see, I see, I see, I see, I see. Okay. Okay. If you're interested in this, there's actually a really beautiful essay that was written by my collaborators, Mike Ward and Ann Bard Carr, called the abuse of Rachel Carson and the misuse of DDT in science and environmental deregulation. Now, who is Rachel Carson? So Rachel Carson was a biologist. You can say that she lived, basically she died the year I was born. And she was a biologist who actually turned herself into a writer. And one of her most famous books is The Sea Around Us, and I only refer to this because in this book, she predicted the idea of rising sea levels as a result of industrial activities, what we now call climate change. And in fact, if you go back to popular science of 1951, you see illustrations like this. This is New York, right? This is where I live, right? New York City. And you can see this illustration of something from her book that said when things actually warm up, then you're going to see New York being submerged in water because of rising sea levels. And we already experienced something like this with Superstorm Sandy. So that was very prescient. But the book that's relevant here is this book called Silent Spring, all right? Another very good read where she basically cautions against overuse of DDT, this pesticide because of its high toxicity. And she worried about DDT getting into the oceans and runoff and causing all sorts of health problems, right? She didn't say don't use it. She said use it sparingly, right? As sparingly as you possibly can. Well, in this essay, they talk about the fact that even as recently as a couple of years ago, right wing media groups like Fox News and Breitbart and all of this have actually blamed Rachel Carson for large outbreaks of malaria, blaming her saying that she said that DDT should be completely abandoned, okay? She never said that, right? This is an example, again, of how these right wing media groups lie in order to make their point. So they really are abusing what she said in the Silent Spring book in order to make their case for, I don't know, academic liberalism or something like that, very, very dangerous. So I really encourage people to read this essay. It's a very interesting thing. Okay, so more recent work going on and why you was looking at some more common currently used pesticides like delta-methrin and emidicloprid, delta-methrin is this. And in this case, they created a polymorph again called Form 2, which has a strikingly fast knockdown time. Now the good thing about these changes in the kill time curves, all right? So is that you can use a lot less of the pesticide. And so for emidicloprid, this is now a brand new study because this thing has hexamorphic, like six polymorphs that they've been able to fight. And right now they're doing lethality tests to see how these different polymorphs perform. And the interest in this obviously is that the use of these pesticides does actually cause loss, for example, of bee populations when they're overused. And over here on the lower right, this US Geological Survey shows the use of this emidicloprid over the years and just how in the last decade, the use has really spiked, right? And you now have enough of this out there that you can see the effect on honey bee populations. So if you can use much less of it, you can help alleviate this problem while still having a very effective pesticide. So this idea of exploiting polymorphism and agrochemicals I think is a very, very fascinating thing. Okay, well, as I said, this is an area where computation is poised to play a really important role. And in order to encourage computational groups to get into this area, the Cambridge data center runs every few years, a, what's called a blind structure prediction test. And so what they do is they put out a set of target compounds. They've been doing this now for about 20 years. They're asking, look, is it possible for you to use computational and mathematical techniques to predict what the crystal structures are, right? And for me, the inspiration for doing this comes out of this work from the 60s of JJ Burkhardt, right? Where he talked about the history of discovery of the 230 space groups. And he said that basically, crystal structure prediction is something you should be able to do mathematically. And that's kind of what is going on here with these blind structure prediction tests, right? So the most recent one was a few years ago was their six tests. And these are the targets that they had. And so what I want to show you is one of the simple ways that you can actually go about predicting one of these crystal structures. So I'm gonna focus here on this target 22 molecule. It looks like that, okay? It's largely rigid. And so the protocol that one follows is the following. You start off by figuring out what's the geometry of your monomer, okay? You can use high level quantum chemistry to do that. And apart from the chemical diagram that you have, what you find out is that the monomer itself is puckered, right? And puckering actually influences what the crystal structures will become for this thing. Next, you have to create a foreseal, an interaction model. And this is actually very, very difficult as it turns out because often the energy spacing of the energy difference between different polymers can be as low as a few kilojoules per mole, very, very small. And so your foreseal has to be extremely accurate. And so in this case, what we did was we used SAP DFT. So you basically take your cone sham wave functions, your cone sham orbitals, and then you do perturbation theory on the real Hamiltonian. You do this on dimer configurations to create a pair-wise potential, all right? It's not gonna be perfect because this kind of fitting is done in the gas phase and you want this to work in bulk phases. But at the end of the day, that's what you're gonna get. Then you take your foreseal and you start hacking molecules into random structures, say, according to the symmetry rules of some subset of the 230 space groups, maybe the top 20 that appear on any of your databases, that's gonna give you something like 48,000 structures, which is way, way too many. And here are just some examples. These, of course, are minimized according to your force field. So you're getting to a local minimum, but you still have zillions of these things. Well, it turns out they're not all unique. So you can do some clustering in order to find out what the unique structures are. You still get thousands of them. We know that there are thousands of polymers in nature. But if you put an energy filter on there, say everything below, let's say 10 kilojoules per mole, we're gonna keep, that gives you something more manageable, like 131 structures. And then you subject these to finite temperature, finite pressure, MD. Let's see if my movie is gonna work or not. So usually I have to come back into this slide to get the movie to work, and there it is. And so what you see here is that when you do this, many of the structures that you find actually undergo a subtle, but very clear structural shift. Not all of them, but some fraction of them well. So this is about as exciting as MD of crystal structures gets, but you see a very clear change in the packing and in the shape of the supercell. What this is telling you is that you found a shallow minimum on your energy surface that is easily escaped at the conditions at which the experiment is actually performed. And if you subject all of these, let's say 130 structures to this, you'll eliminate about half of them in this case. You can use then the trajectories that you generate for the ones that remain to get a set of final structures and then you plot them on an energy density diagram like this, all right? And you can see kind of very clearly that there's a number of structures that appear up here in the high energy, low density range. And these are very unlikely candidates, but then there's a kind of clear separation between a very small number of structures that appear here in the low energy, high density range. And, you know, again, as I said, these differences in energy are on the order of a few kilojoules per mole. Now this handful of structures down here would be your candidate structures. When you enter the competition, you don't know what the answer is, right? That's being kept from you. So this is why it's called a blind structure prediction test. But you submit your candidates and then they'll tell you if you got a match to the experimental structure, which in our case, turns out to be this red box here, right? It's not the lowest energy structure and that could be because our force field is far from perfect, right? But we do get a match. Turns out it's our fourth range structure. And so just to remind you, you start here with this chemical diagram and what you have to get to is this prediction, right? And so that's what's being shown here. That is the, an overlay of the predicted structure and the experimental structure, right? Turns out it's a P21 on N structure. So these are the symmetry operations. The agreement between the experimental lattice parameters and the calculated ones is very good. The better measure of how well you agree is the RMST-20, which they take 20 molecules out of your structure and 20 molecules out of the experiment, they align them, and then they do an RMST of that alignment, sort of like what you do when you do protein structure prediction. Well, at 300 Kelvin, if you thought that was where the experiment was done, the agreement, the RMST-20 is about 0.187. It actually turns out that this experiment was done at 150 Kelvin. They never told us what the experimental temperature was at the outset of the competition. But at 150 Kelvin, the RMST-20 is 0.14, which is, for this prediction, it's really good. In fact, they mentioned this in the paper that came out on this, which is that when you do things at the right temperature, you get a value, you know, RMST-20.14, which in this case turned out to be the lowest of all the RMST-20 values submitted. Now, to give you an idea of some statistics from the competition, 21 groups who entered tried to predict the crystal structure of this simple molecule. About half of them got it right, all right? Which means that in this case, they had an RMST-20 of about 0.8, which is really permissive, right? But those who got it within their top five, there were only seven groups. And this shows you, that's about a third, right? So that shows you that this is still a very tricky business, by predicting crystal structures from nothing. Okay. Now, in going through this exercise, I will say that we spent about 75% of the time trying to come up with a good force field. All those SAP TFT calculations, that was expensive, right? Only quarter of the time was spent doing the actual structure predictions. That raises the question. And again, drawing inspiration from this paper of Burkhard, that you should be able to predict crystal structures mathematically, is there maybe a way to do this based solely on mathematical considerations, only on geometry and topology without having to introduce a force field. Now, if you start to analyze these databases, so if you go through like the COD, the Open Database or the Cambridge Crystal Database, and you pull out, let's say, hundreds of thousands of these structures, and you plot, in this case, we took the centers of masses of the molecules in the different crystal structures, and we just did a heat map of where these centers of masses actually lie. And so I'm gonna show you a projection of the distribution that we got from mining the database along the BC direction, right? Turns out that these are the crystallographic BCs. If you do AB or AC, the results are pretty similar. But what you realize for, let's say, some of the space groups is that the locations of these centers of masses, it's not uniform, all right? In fact, they have very, very preferred positions in the unit cell in fractional coordinates, all right? So here are some common space groups, PCA21, P21NC, P212121, et cetera, et cetera. And you can see that there's only a few, very few areas that are dark red, which is where you have the highest population. That's telling you that for the large majority of these crystal structures, the molecules really prefer always to be at these very special locations. And it turns out that these locations are actually predictable, all right? So I'm gonna show you an example. One particular example, this is a kumaran, and I picked this because they did some crystal structure prediction for this. They did some polymorph creation for this NNYU. They've actually found four new polymorphs of this molecule, which are called kumaran two, three, four, and five, in addition to the kumaran one, which was already known, all right? These are called the opaque, the blades, needles phases, and then the kumaran five doesn't actually have a name, but they're named for the way they appear on these optical images. Okay. Now, this last point is actually kind of an important one. It actually turns out most of these polymorphs are not very stable. They only survive for about a couple of minutes before they revert back to the stable kumaran one phase. But you can actually stabilize them long enough to get these images by adding the sap, the Canada balsam fir tree in there, all right? So they actually add this Canada balsam into the structures, and then they actually will stay around for a few weeks. Now, the idea of using geometry and topology to predict crystal structures is really based on a few fundamental principles. One is that these special positions that I was telling you about actually turn out to be minima that can be predicted using a set of geometric order parameters. For example, in this case, we picked the Zernike order parameters, all right? Which are often used in 3D image analysis, and they look like this, right? So to kind of me, if you know anything about Steinhard order parameters, by the way, that one half should be a two, not a one half. If you don't think about Steinhard order parameters, they look like that, all right? Except for the fact that they have a radial polynomial. And so the way that we create order parameters for predicting crystal structures is you take the generators of the group, all right? So these are all the group symmetry operations, G1 through GZ. C is the number of molecules in the unit cell. You take a reference molecule, all right? Which is the, you know, for any molecule in your reference, or any atom in your reference molecule, we call this R1, you then apply the symmetry operations and you plug them into this form here, all right? So you have a radial polynomial, you have a spherical harmonic, and what you have to do in order to create this thing is to set up a set of bond vectors or how you get to each atom in the unit cell from the reference point, let's say the center of mass, yeah? Then you apply the symmetry operations to get your various locations. The bond vectors give you the length and the angles that you need, all right? And so for example, if you were doing P21 and N, these would be your symmetry operations. You plug them in. Then it turns out that these special positions that we found by mining the database actually turn out to be very sharp minima of these Zernicki order parameters for different values of N, L, and M, okay? So you can see, for example, one of the carbon atoms in my kumaran crystal structure when you actually follow it along, well, let's say for example, the crystallographic B axis is a very sharp minimum at a value of B, which is around five point something angstroms, right? So this is now in physical coordinates, not fractional coordinates. Or you can also do it along some of the angles that go into your order parameters and you can see a couple of very sharp minima up here there. And in fact, if you do a histogram, overall lots of N, L, and M values, you go up to very high N, L, and M, right? You do a histogram. You see that, let's say along the theta direction where I have multiple minima, one really stands out, right? And so by doing these histograms, we can find these highly preferred locations appearing as high population minima in this histogram. And these become a way to use geometry to predict where these atomic positions are actually going to lie in your unit cell. So you can- Mark, I have a curiosity. So just going back to poor man's chemistry, would you be able to relate these topologically inspired parameters to simple things like hydrogen bonding, amount of van der Waals interactions between the different molecular groups? Is it trivial to do it or not? Well, I think you probably could. I don't know if it's actually trivial. I think you'd have, yes, obviously those are the kinds of interactions that you would have. Since these are organic crystal structures, you wanna look at things like high stacking. How does that lead to certain packing arrangements? Hydrogen bonding, yes, definitely. Whether or not it's, I don't think it's going to be trivial, but I think it must, there has to be physics there, right? You know there does, right? So we're just capturing the physics in a more convenient way. But yes, there is physics that will underlie all of this and one needs to uncover it. And obviously what you're talking about is what the physics is. Well, how to make that connection immediately, I'm not exactly sure how to do it, but it must be doable. Okay, so the next step is to exploit this topological order parameter of business to try to come up with a way to predict an actual crystal structure. So again, let's start with some random structures. So you take a space group, you generate some tens of thousands of random structures. And then you would formulate a kind of distance metric. So you take your special positions, right? That you've predicted from your order parameters. You take the atoms that are in your unit cell, all right? So the number of atoms is end. And of course, not all of them can be freely varied this way because there are constraints due to the geometry of the molecule, right? You have to satisfy the bond constraints and all of that. But among the degrees of freedom that you have, you can try to match the positions to these special positions in the unit cell. So you're looking to minimize this little distance metric. So you can evolve a set of random structures, try to find those with the smallest delta S, generate some new structures, let's say around the one with the smallest delta S and then go from there, right? And you just keep going until you find the smallest value of this delta S that you possibly can. And an example of this is this Kumaran structure, right? So this is the Kumaran one. Here's the blue clusters are an initial set of random structures, okay? Then the first step is to find the one with the smallest delta S. That's the one I circled there. And then you generate a new cluster and I'm only showing you the ones that reduce the value of delta S. So those are the orange ones. You do one last filtering based on trying to find the smallest value of delta S that you can. So you take the smallest orange one, you generate a new set of green ones and then finally you get to a prediction basically with a zero, almost zero delta S, right? And it turns out that you get the Kumaran one structure this way with an RMST 20 value of 0.1, right? Without ever having to do a single energy calculation, okay? We also did this for the another one with a Z equals four, that's the Kumaran five structure. Here I needed three generations of my call it a genetic algorithm if you want, all right? From blue to red to orange, all right? So we started there, we generated the red cluster, started there, generated the orange cluster, and then finally the last one gives me my prediction. The delta S isn't perfect in this case, but I do get the Kumaran five structure to P212121 with an RMST 20 value of 0.14, which is also very, very good. Now you can ask, all right, but you kind of had this information beforehand, what happens if you pick the wrong space group, right? So let's suppose you pick one where there were no polymorphs. Well, we tried that, we actually took the P1 bar space group and we did a search of 100,000 structures in this and doing the algorithm yielded about 1,000 possible candidates, but none of them actually were physically sensible. In the sense that if you actually look at some of these structures, they use a molecule sort of get pushed up against the edge of the unit cell and then of course you're going to have molecules basically crashing into each other from periodic boundary conditions. And so, you know, you would have an infinite energy. So if you did evaluate the energy, it simply wouldn't be sensible. So if you can't find any physically reasonable structures in a given space group, you check it out, right? And so that's sort of where we're going with this topological crystal structure prediction idea. And so far it's looking very, very promising. We've looked at molecules other than kumarine, we went back to the blind structure prediction molecule and we found that we could get again this experimental structure using only topological considerations. But I want to move on to something else, which is what about ranking polymorphs? So I say you can predict crystal structures, but real crystals are always in finite temperature and the real figure of marriage in terms of predicting and ranking them should be free energy, right? Now, what people often do, of course, in this field is they use the harmonic approximation. And so they use a formula like what I've given above here, that the free energy is going to be the electronic potential energy plus a PV term and then some harmonic vibrational term. And you can make this quasi harmonic if you want, try to minimize the volume or the unit cell parameters once you have the frequencies. Well, the problem is if I do this, let's say for the kumarine polymorphs, right? Which we were able finally to predict all of them using the simple protocol that I showed you way back with a force field. If you actually look at the harmonic free energy as a function of temperature, right? So the ranking should be one, two, three, four, five and it turns out that for the force field we used as you increase the temperature and apply the harmonic approximation, the ranking starts to go to hell at around 150 Kelvin. And as you approach room temperature, of course, things really go bad with the harmonic approximation. The ranking is lost, okay? And that's a failure of the harmonic approximation and the quasi harmonic approximation doesn't really do me better in this case, all right? And so there are going to be strong and harmonic effects. This is kind of becoming well-known now that at the experimentally relevant temperatures, this is going to happen. And this is a rigid molecule. Imagine what happens and you have something with some flexibility in it. Now the difficulty with predicting free energies beyond the harmonic approximation, of course, is the fact that with the potential energy that you're working with is a rough energy landscape. Something we all know something about, right? You have lots of minima, lots of barriers and so the difficulty of navigating over this creates what's called a rare event problem. All right, so we've thought of a way, of course, to attack this problem using collective variables, right now. This is a well-known approach in meta dynamics and other ways of predicting free energies and exploring free energy landscapes. And so if you wanted to go beyond harmonic approximation, what kinds of collective variables could you use for crystals, right? Well, the cell shape is certainly one. When you do constant pressure molecular dynamics, you actually have direct access to this thing called the cell matrix, which has the three cell vectors on the, you know, as the columns of the matrix. Another very nice set of variables that comes out of the Paranella group is just an entropy measure, all right? So this is a collective variable that you construct from a spatial distribution function, which is basically modeled using a set of the distance and orientational Gaussians. And then finally, you can use packing parameters like Steinhardt or Zernigge, if you prefer them. And, you know, as I showed you, you can create molecular analogs of these. So you can actually take these over to the molecular realm, right? Now, when you use all of these collective variables, there's a lot of them, right? You're generating a fairly high-dimensional free energy landscape. So one of the things that we've done is to use neural networks as a way to give you a compact and close form representation of these high-dimensional landscape using only the simulation data as your training data. The simulation, well, it has to visit a high-dimensional free energy landscape. Of course, only visits a little bit of it, but a neural network is very good at filling in the missing information. So this is an example of, you know, let's say your free energy that you might actually be able to generate now. Okay. Now, once you've chosen your collective variables, what do you do with them, all right? So what we do is we use this thing called temperature accelerated or adiabatic free energy dynamics, which is a good way of getting to these high-dimension, these high-dimensional free energy landscapes. And I think we all know what the general statistical mechanical problem is, right? You have to generate what's called the marginal distribution, okay? The marginal distribution is what you get by integrating over, you know, the Boltzmann distribution subject to a set of conditions that your chosen collective variables have certain values, S1 to Sn. You can generate that, of course, if you take the log of it, you get your free energy, all right? Well, what this equation says is that, you know, you should lay down an n-dimensional grid, set these conditions at each point of the grid and run a simulation for each one. Well, that can work if n, this little n, is like one or maybe two at the most, right? But when you're trying to generate these high-dimensional things, you're gonna have a dimensional explosion problem. You're not gonna be able to do this. So the best way to approach this problem is to sample the marginal, just like you sample e to the minus beta h. Problem is, I don't know what this marginal distribution is a priori. So how do I sample something whose form I don't know? Well, we're gonna do it on the fly. So the first thing you have to do is to clear these delta functions. So what you do is you replace them by Gaussians, which, you know, are kind of sharply peaked and very narrow. But when you do that, you have a term that you can absorb into your Hamiltonian. And so it comes in as a kind of harmonic coupling between the points of your marginal, these s's, and the collective variables, right? And you let these capital values get very large. Well, if you now introduce a kinetic energy for moving these s's around, okay? Now you have a way to generate a distribution of s's. It's not yet gonna be the marginal. I have to do a little bit more work, but this extended Hamiltonian gives you a route to doing this. In order to generate the marginal, what you have to do is to create an adiabatic separation between the physical variables and the s's. So you do that by making these masses that you associate with these s's very large. And that is enough to generate the marginal distribution because it's like doing this integral, if you will, on the fly, all right? The s's are slow, the physical variables are fast. So it's kind of like doing the integral on the fly, but of course that's gonna be very inefficient. So in addition to that, we introduce a boosting element, which is we make the temperature of these s's very high. So this is a little bit like inverse car parallel, right? So we have a high temperature and a large mass associated with the things whose free energy we want. And the physical variables are moving with physical masses at physical temperature. So these are the equations of motion that you have to solve and what you can prove is that with these two temperatures, T for the physical variables and T s for the s's, you can generate a distribution called the adiabatic marginal, which is the true marginal raised to a power of the temperature ratio, T over T s, okay? And so if I take the log of both sides, the log lets me bring down the T over T s factor, you multiply by K T s on both sides, the T s's cancel and you get the free energy at the temperature that you want. Okay, so let's do this now for this, this kumerin case, all right, where we said, you know, the harmonic approximation was breaking down. And here's let's say a typical trajectory using some of these collective variables where I'm able to generate a nice transition between the kumerin one and the kumerin five polymorphs, okay? So I get a trajectory. Then I can then use to generate the free energy difference. And you can do this on a bunch of these. So the ones that we've been able to get so far as a function of temperature are one to two, one to five and at 300 K, two to five. And you can see that the ranking is now coming out right, all right? So two is less stable than one by just a few kilojoules per mole, five is much more stable than, sorry, one is much less stable than five by a larger factor. And two is more stable than five by a factor that lies in between the one and two and the one and five. So that is the right ranking. So we haven't gotten all the transitions yet, but this is illustrative of the fact that, you know, this will now work up to the temperatures of experimental relevance, right? Around, you know, 150 and 300 kelvin, which is why these things are usually done. Here's another example showing that you can actually predict or explore the free energy landscape and get a set of relevant polymers. So this is resorcinol. And this is an interesting case because you have a conformational dependence on the polymers that you get. The resorcinol is a benzene derivative. It's actually a pharmaceutical that's used in the treatment of certain skin diseases. It has these, you know, these OH groups that can either be in one of two sort of trans or cis-like configurations. It turns out that the A and the B confirmation determine a set of structures that when you do a standard crystal structure prediction and you get your energy density diagram, again, really sit down here separated from this cluster up here. So the three in red are of the B type in this trans-like configuration. And the most stable one, the alpha polymorph, is the A type, all right? So I'm showing you here the different structures that you get. All the peaks have to line up and everything. Otherwise, your experimental colleagues will say you didn't get it right. So this is the kind of agreement you have to get. But I wanted to show, let's see if I can get this movie to work. So I wanted to show the actual exploration using this enhanced sampling approach of the polymorphic landscape, right? And as you watch this movie, you'll see it visit the different structures that you have here, right? So you can actually get a transition. Of course, you'll always end up in the most stable one ultimately, right? But this trajectory, for example, shows that it produces the conformational change that you need, which is a flipping of the OH group to go from one of these B polymorphs, epsilon beta or P-3-1, into the alpha polymorph, okay? And you visit a couple of different structures along the way, if you compare what you see in the trajectory to what you see down here. And there's a rather large energy barrier that you have to cross in order to get over to the alpha polymorph. So yes, while it is true that the energy difference is only a few kilojoules per mole, there can be large barriers that separate these different polymorphs from each other, right? Which is why it's difficult to see the actual structural transition, okay? Now, the last thing I wanna talk about here is a really beautiful area that's being worked on by my colleague at NYU Abu Dhabi. This is Panchenomov and he's creating a set of what are called jumping crystals. And these are crystals that undergo a polymorphic transition on a heating plate so you heat them from low temperature to high temperature. And you see, you go through a couple of polymorphs that release a lot of energy when the transition happens, all right, so this is pyroglutamic acid. And what you see is that as you heat them, the polymorphic transition causes the crystals to jump off the plate and yes, they kind of disintegrate when that happens. Of the three samples we had here, two of them jumped off the plate, one of them will never will because this is a not an isotropic type of transition. But what's happening in this, you read that you're seeing, is that there's a hydrogen bond network in this crystal which switches from let's say a short hydrogen bond in one place and a long hydrogen bond in the other, these switch places. And it's sort of like snapping a rubber band. When that happens, a lot of energy is actually released and it releases along one direction or one of several directions which causes things to jump off the plate when that happens. Okay, so we said we wanna be able to produce that. The mechanism that's sought to govern this is some sort of interface migration, right? And this is known to be a way in which you get these structural transitions but it's very difficult to see. So we said, well, before we get to molecular crystals, let's try to do this in a simple atomic crystal, right? So we picked molybdenum because it's known that there is a structural transition between A15 and the BCC phases that happens via interface migration. We tried all of the sort of usual collective variables for this and we could never get this transition to occur. All right, so we said, we have to turn machine learning on this problem. So what we did was we said, let's introduce a set of classifying functions that we can use to classify local environments. After all, we have an interface. So we need local information around this interface. And a good way to do this in a way that's translationally and rotationally invariant is to use these things called symmetry functions along with these kind of standard hacking sign hard order parameters, right? And these come in as sort of surrogates for your atomic coordinates. And these are the kinds of things that are also used in generating neural network potentials. Well, what we do is we're gonna sample local environments using these collective variables. We're gonna collect snapshots of these for training and then we're gonna build a classifying a classification neural network that can tell us whether it's a local and therefore, when you sum over all the local environments, you get a global classifier. What kind of structure you actually have? Well, in molybdenum, you have a bunch of different possible phases. You have FCC, BCC, HTT, A15, and then what we generally call liquid or disordered phases. And so we have a classifier for each one of those. So you run different examples of MD and these different phases, plus some interface phases. And then you train your neural network to output a classifier vector for you. Now, we're not gonna use the neural network or they're not gonna use the classifier vector per se as a collective variable. What we're gonna do is we're gonna build what's called a path collective variable to move between one phase and another, all right? So in this case, if you have P phases of interest and in these examples, P is equal to two because I have two phases I wanna study. Even though I have five possible phases, I just wanna explore the transition between these two. So I construct this called the collective, path collective variable in classifier space, right? So Q is my instantaneous classifier that I get for any arbitrary configuration. Q, I is the ideal classifier. And then this is a variable that has a particular integral value in one of the two phases, okay? And you can guide the sampling by using the orthogonal variable that looks like this, right? So this kind of scheme is usually used in configurational sampling, but in this case, with the aid of the neural network, we can actually use it in classifier space. So what you do is you run MD, enhanced MD, driving S or Z and or Z as your collective variables. You feed the instantaneous configuration into your trained neural network. You let it spit out the value of Q and then that Q is the thing that you put into this path collective variable. Of course, everything has to be differentiable because you wanna get forces and such from this. So, but your neural network is and so this is perfectly fine. Here's you an example of the kind of training curve that you got, all right? So in bulk phases, the training is really straightforward. Around the interface, it's a little harder because this is a heterogeneous environment, but at the end of the day, our trained network is able to give us about 99% accuracy on the bulk phases and about 93% in the interface environment. And that's actually good enough for driving this transition. So I wanted to show you an head sampling for this interface. And so you can see that the A15 phase turns into the BCC phase quite nicely by interface migration in this sample, but you can also get the interface to go back, back up hill and bring in the A15 phase again, all right? And so we ran both adiabatic free energy dynamics and meta dynamics on this just to make sure they were giving the same answer. And you can see a bunch of hills that you get from both of them, all right? And those hills are giving you an actual estimate of the barrier for each layer to undergo a transition. And the height of the barrier is about 0.5 EV. So not only are you getting the structural information, you're also getting kinetic information out of this. And in fact, you can compare this number to kinetic Monte Carlo studies that have been done previously. And in kinetic Monte Carlo, the idea is to get the right barriers out. And each, you can see from this connectivity diagram that each layer transition costs you about 0.5 EV. So now we're seeing this as a way to get polymorphic transitions that actually carry kinetic information with them, all right? Now we've been working, we've done this for the atomic crystals, but we think we can also take this over into the realm of molecular crystals. All right, I think I have to finish up. So just to give you a sort of perspective on future, all right? Where one really needs to go with this and why molecular simulation is such an important component of this. I want to give you an example of another pharmaceutical, which is Paracetamol. And I agree that I picked this. This is, again, they're working on this at NYU and they've created a bunch of new polymorphs of this thing, which is, you know, a common painkiller also called acetaminophen. And well, it turns out you can, so one of the polymorphs that they got, a very stable one, which is called phase six, I think, or phase seven. They wanted us to do some CSP on this, all right? And so they got their powder X-ray diffraction patterns and they tried to index them themselves and all of the things that came up when they tried to do this are in this P21 type of space group category, right? But they all, when they evaluated the energy using DFT, Density Functional Theory, they all came out with really high energies. They said, okay, you guys give it a try. And so we turned our CSP protocol loose on this and included searches with Z prime equals two and included, you know, low symmetry space groups and all of this. And finally, what you get is a P and A 21 structure that has a low energy, all right? And a really good match to the powder X-ray diffraction, right? So that's what this thing gives. But when you start to study some of these other polymorphs, all right? So that's fine for one where you can do this, but if you start to look, let's say at this number three polymorph, this one turns out to exist in a couple of different structures that involve a certain amount of disorder. Now three can either be monoplinic or orthorharmic depending on temperature. And in fact, it turns out that you have two different space groups. So this one polymorph actually is kind of itself multi-morphic. Now I'm showing you an MD simulation at 300 Kelvin and also one at 100 Kelvin, all right? Just to show you that the transition between the orthorharmic and the monoclinic phase is kind of a rare event. There's a large barrier there. At 300 Kelvin, it's easily crossed with a 100 Kelvin, which is actually a relevant temperature. Turns out it's not. But what I want you to see in these movies, especially the 100 Kelvin one, is that the actual crystal structures themselves involve a certain degree of disorder. And this is something that's never studied when people do crystal structure prediction. There are large segments of this 100 Kelvin trajectory you're seeing here where there's a very definite slippage of the layers, okay? And that's actually creating a disorder within the structure. That's something you'll never get from standard CSP. Now it turns out that of course, in order to sample at 100 Kelvin enough to get statistical relevance, you need the enhanced sampling. And so this is using temperature accelerated approach, all right, with the cell matrix as your effective variable. And you can see now very facile transitions between the orthorharmic and the monoclinic forms. And of course, if you calculate from your standard MD at 300 Kelvin, your enhanced sampling at 100 Kelvin, you can see that there's a change in the probabilities associated with one of the angles. It turns out it's the alpha angle in this case. So while at 300 Kelvin, there's a peak in the histogram at 90, at 100 Kelvin, the histogram peaks at values around 82. And of course there's an image of this around 107 or so. Okay, all right. Well, you can now use the simulation to plot a free energy diagram as a function of temperature that shows you what are the relevant phases, the relevant forms of this polymorph at different temperatures. And it turns out that at 100 Kelvin, there's a deep free energy minimum here in the orthorharm and the monoclinic phase. And up here at 300 Kelvin, you go into the orthorhombic phase, right? And what actually stabilizes these guys is the layer slippage, the disorder. So this is an example where disorder plays an important role. And this is a big challenge going forward because I think there are a fair number of crystals where you need to account for disorder in order to get a real prediction of what the different forms are. And the agreement with experiment as a function of temperature in terms of what is stable at high temperature and what is stable at low temperature is actually pretty good. You can see that at low temperature, it is the monoclinic form and at high temperature, it is the orthorharmic. Okay, so only molecular simulation can give you this, right? So developing the simulation techniques that are able to capture this kind of disorder is gonna be important going forward. So lots of challenges remain. This is gonna be my last slide. Basically, the CSP methods that people use still predict too many structures. Nature doesn't have as many polymorphs as we actually predict, we don't think, all right? The reason could be that, again, your force field isn't good enough and maybe topological crystal structure prediction will be better at really finding the real experimental polymorphs. Maybe, maybe not, right? But at some point, we have to be able to narrow down our search so that we really get the experimentally relevant structures. And that means the relevant structures that you would get if you did all the right experiments. So sometimes the reason that you don't get a particular polymorph is that you haven't yet done the right experiment to get it, right? So experiment has to meet theory in this case and we're not there yet. So when people say, like in the Saturday discussion, that crystal structure prediction has turned from a basic science into an applied technology, I strongly disagree with that view, all right? We're still not there yet. Will we stick with traditional structure generators? Will simulation become a more important tool here? I think you will. Will pure mathematics and topology become an important tool here? I don't know, right? This is in its infancy, we'll see. But at the end of the day, we still need a way to get disorder in our system, right? And that's gonna be really important. Another challenge that's such in pharmaceuticals that comes up is getting water, right? A lot of these pharmaceutical crystals have water in them. They're hydrates. I can talk about hydrates in here, but this is a challenge. How do you predict whether hydrates will form and how much water and will the water be in stoichiometric or non-stoichiometric ratios? And finally, how much is machine learning going to impact this field? I think it's going to impact it a lot, but I think the uses of machine learning in this area so pretty new, all right? So there's a lot of exploration to be done there. I'll just finish off by thanking my colleagues, all right? My students and postdocs who've done a lot of work in this area, all right? Especially Nikos, who's done some beautiful work on the topology, Leslie Vogue, who's done a lot of work in the free energy prediction and traditional crystal structure prediction, Krzysztof Szelewicz, who I work with to get force fields and my experimental collaborators, Brachar, Panchenamov and Maria Baez, these two are NYU Abu Dhabi. And the work on molybdenum was done by Yuda Rogel, who's actually joining my group who's supposed to be this summer, but we're going to have to put that off. She's coming on the Heisenberg Fellowship to work on this problem of predicting the kinetics of polymorphic transitions in molecular crystals. So I'm going to stop here. Thank you everyone for your attention and please, if there are any more questions, go ahead and ask them, thanks. Thank you, Mark. It was really nice. Thank you. I don't know if people have questions. I have a couple. I will let before the audience to ask. I don't know. I have one question. This. This is all ambient pressure. And I wondered if you looked at high pressure and crystal structures and if they've, if you can apply, they're saying nothing. They've exercised the particle metric. So does that change with pressure? Yes. So to short answer, the question is, yes, we have looked at high pressure phases as well. I didn't show any examples here, but pressure is an input into our simulations. So we've looked at, for example, benzene at high pressure. We've looked at urea and resource and all co crystals at high pressure. And the interesting thing there is that yes, sometimes you stabilize different polymorphs at high pressure. So it is, it is useful to do that. And it's a very easy thing for us to do. It's just it becomes another parameter in our simulations. And more men can actually tell you something about nature and high pressure in general, like inorganic or organic. Can you, like those heat maps you said of the site occupancy of certain space groups? Yes. That change, once you get to the Terra Pascal, they will give. So that is a good question. I mean, we haven't thought about how to get pressure in there, but when we do the topological crystal structure prediction, we actually look at a range of densities. And I mean, we haven't gone into densities that would correspond to really high pressures, but I think that my guess is that these predictors would probably work at high pressure. These topological predictors would work at high pressure as well. But it's something we need to explore. We haven't done it yet. All right, thank you. Okay, I think Alessandro now wants to make a question. So your topological order parameter actually is very, very impressive. So this idea that you are getting these extremely nice correlation plots without computing the energy at all is absolutely astonishing in a way. It's a very deep observation, but doesn't this approach somehow favor crystal structure with new molecules in the unit cell? So simply because these, so imagine that you have that in some, somehow you have a very, very big unit cell with, I don't know, 10 or 20 different guys in the unit cell. Wouldn't this simply be rolled out by your approach? So it's a very good question and I don't yet know the answer to that because we haven't explored those structures just yet. We're gonna have to mine the database, I think, for structures with higher Z values in order to see what happens there. It is possible that we won't be able to capture those as well, but we'll see. We'll see. Okay, I think Ali now. Yeah, Mark, thanks. Could you go back to your slide on the pyroblutamic acid? Sure. So I missed a bit of the details here. So L pyroblutamic acid has a short hydrogen bond. It has a short hydrogen bond. Yes, it seems like it, yes. And one of the polymers in the alpha form. Okay, and then this inhibits it from being able to jump, I didn't follow this exactly. So the mechanism is, it's only speculative at this point because they just observe the jump, right? But what they think is happening is that in the alpha form you have a network which has one short hydrogen bond and one long one, all right? And then as you heat it, it goes to this beta form in which these hydrogen bonds kind of switch roles, right? This is like a snapping, like a rubber band snapping in which the short becomes long and the long becomes short. And along one of the axes you release a lot of energy and snapping the rubber band. And that actually causes these things not only to jump but to shatter when they do. But it's speculative, right? They don't exactly know what the mechanism is but you need the computational studies. Yeah, so I just wanted to make a bit of a comment just to make a connection. There've been some very recent work studying the optical properties of pyroglytamic acid. And what's interesting is that there's some of these polymorphs with short hydrogen bonds that have anomalous fluorescence in the absence of aromatic groups. And people think it's related to the short hydrogen bonds. It's very similar to what Steve Boxer has discovered for green fluorescent protein. So yeah, just your slide maybe think about another parallel or something completely different, so. Interesting, yeah. Okay, I think now Ariel wants to ask also something. Yeah, hi, Mark. Hello Ariel, I just know you were there. Hi, it's good to see you. Yes, a very quick question. So your topological thing and everything that you have said so far of course applies to crystalline states. Now complicated molecules have a tendency to go glassy. Could we say anything about the glass structure of any chemicals that might be of significance? Topology or whatever. Does it tell us anything about the glass? It's relying on crystal structure and symmetry and all of that is missing. We're talking about something like a liquid. Is there anything that we can say about the structure of that glass? That's a really good question Ariel and I don't know the answer yet. Glasses is something we haven't explored at all yet with this approach. It's something that I would be interested to do, maybe going forward of below our focus really is more in predicting regular crystal structures. But I think that's a very intriguing question. I mean, we've only been working on this now for just a little under a year. So there's a lot of things we can do with it. But and that is one of them. I think that's really intriguing. I think in order to do it, we'd have to kind of disengage the method from the imposition of the symmetry group operations because I don't wanna have that constraining and I don't even know how to do that. So maybe everything is obvious but maybe something is left, right? Exactly, yeah. I don't yet know how to do that but it's something that I would like to do because I'd like to be able to do this just generally without having to rely on a set of symmetry operations. Yes. But I'm not sure how to do that. It's sort of still a crotch for us. So I keep your question of mine because it's an intriguing one. Very intriguing, yeah. Okay. Thank you very much. Thanks, Ariel, yep. Okay. Now I have to ask you something since I'm the last one. Please. I would want to ask you two things. One is related with the disorder crystals that you mentioned at the end of your slides. Yeah. Do you think this intrinsic disorder it's something related with the beta factor of the X-ray prediction of the X-ray extractors? I mean, when you obtain an X-ray you usually obtain also a beta factor and somehow it's related with the movement. So and how, well, I mean, I think it has to show up in your beta factor, right? If it's disorder and you should be able to tell but my feeling is that this is intrinsic, you know this disorder is really an important part of stabilizing. In other words, I think what you're seeing, for example, in this paracetamol example in the experiments is probably some kind of ensemble average over these different layer slippages that you have. These layer slippages, they just happen too easily, right? And so you see a kind of ensemble average of different possible layer slip configurations. That would have to show up somehow when you're in your beta factor, right? I think. So it looks like NMR, if it could be done, would be much better, right? That's interesting. Yeah, yeah, yeah, probably, yes. And the other question, it's related with probably your next slide, if you can go to the machine learning of collective variables, it's probably, I mean, how do you obtain the ground truth for your neural network? For your gaining? I mean, well, so, so, well, I'm not sure what part, well, maybe I should go through the protocol again. So the training data is just a series of simulation snapshots, but they're done in the different phases, right? So I do FCC and I get, you know, a set of thermally agitated FCC snapshots and I do BCC, ACPA 15, whatever, I get all these different things, right? Then I do some high temperature stuff to get the disordered structures. And then I have to present it also with some interface structures, all right? So we simulate also the interfaces that we want. So not only the one we want, but other interfaces as well. And the reason we have to do that is you don't know, so when you watch a movie like this, right? You don't know what all you're seeing in that interface. It's not necessarily just A15 and BCC. So we have to present it with all the, you know, all possible interfaces between different phases, right? So we pair them up and then we run this. And so that's how you get pure phases, interface phases, disordered phases. And then you just have the neural network, you know, spit out this vector, which will have different values. That makes sense, yeah, okay. Yeah, yeah, that makes sense. Sorry, I missed this point. For myself, there are no more questions. I think nobody else wants to ask. We can thank you, Agrae. It was a pleasure. I very much enjoyed this. This was fun.