 Boeing 777 was actually the first airplane, which was designed now 20 years ago, designed entirely be a computer-aided design before anything was built. In fact, they sent sort of a famous story. They designed all of the parts and then sent the plans off to tool and dye companies and got all of the pieces, six million pieces and brought them all into the Boeing warehouse. And then assembled the plane without having tried anything and without having put anything together, just built all the pieces, brought them together and started screwing them together and the plane did fly. So that's the picture of the 10th, 10th step. They, there were three million separate parts. So you can compare this now. The reason to talk about that is not that it's anything you don't know, it's just to remind you and to get you to think a little bit about the degree to which we are designing things before doing them in the, for most complex objects in the real world. Chemistry on the other hand, this is what things look like, right? So this is actually a synthesis from one of my colleagues Paul Wender for Brian Staten and they, and this is not, this was not designed computationally anyway. This is basically just trial and error, right? And so you just take a bunch of molecules, stick them together, see what happens for some new molecules. Of course they're very, very smart. So it doesn't take them forever to do this, but they, but it still takes quite a bit of time. And it's, and it's in striking contrast really what, what we're doing at the atomic scale, I would argue as a field is in strike, start contrast to what we do with the macroscopic scale and the mesoscopic scale. And they, and, and really we, you know, I think that it's time, I think that I'm not the only person to say this, but it's time really to start to change this and really to start designing things at the molecular scale. And, and as theorists, you know, there's been much, much emphasis on explanation. So on how would, how is it that if I have some experiment, I want to understand what that experiment is doing. I can do some calculation, divide some theory, do some simulations and actually begin to understand what's really happening to give rise to the particular effects that I saw in that, something that some experimentalists saw in the experiment. But there's been a lot less prediction, a lot less emphasis, excuse me, on the last two pieces of this, which are part of what I would consider as a cycle. I mean, the first part of the cycle is that you should be able with some theory and simulation to explain what you see. But the second is that if you actually can do that well, then you should be able to now begin predicting things and predict here should mean predictions of things that you don't already know. So they, so not confirmation, but prediction. And once you know how to do that, then you should be able to design things. And the question is what do we need in order to do this? What do we need in order to take theory and computation and make it really capable of doing that? And this is one of America's favorite philosophers and his statement about the, about what, how why that's going to be hard. So then the question is sort of what to design. And here, I really don't want to go into a lot of detail about it. These are things that I've been interested in. You know, you would like to be able to design environmentally responsive materials. So to take polymers and make them such that they had certain chemical responses to force. This is a particular case of a molecule that we were involved with in collaboration with a more group, which at the University of Illinois, which basically when you take this polymer shown here and pull on it, so you put this into a solid state of polymer and pull on it, then the, then what happens is that the pyropyrin, this, this pyro bond here, tetrahedral carbon bond break and you make a mericinene dye. And this goes from being transparent to being red. And you can see here, the stress field, essentially after the molecules have been broken, right? So how would you make other molecules which are mechanical sensitive and which do reactions that you might be interested in in response to, to this mechanical stress? How can you design reporters that would, for example, fluorescent proteins that could be used in optogenetics or they could be used in bio-imaging that would be able to actually show the light, fluorescence or absorption, what is going on at the molecular scale in living things in cellulite cells? And how do you build better photovoltaics, better batteries, better fuel cells? All of these things really require something that is super important and that is, they require the rearranger, they require electronic structure, they require to be able to describe bond breaking and bond formation, right? And that's the, and that's sort of the key and the size of the molecules involved in all of these cases is going to be in the hundreds of atoms and beyond level, right? So they, because to get that kind of complexity, that's what you're gonna need. So that's what you would like to do and I don't wanna belabor any of the particular applications, but say now, why is this hard? And that's foreshadowed a little bit by what I was saying. It's hard because the equations that we need to solve are quantum mechanical equations, right? And so even explaining is difficult. We wanna predict, then we have to actually layer on top of that the problem is there are many, many molecules, there are many possibilities and many explanations that we're gonna have to be able to exhaust these. And to give you a sense of that, you can just go back to give you a sense of the complexity. You can sort of go back, set back and say, well, if I was going to start searching over molecules to be able to predict function, for example, then I'm gonna care how many molecules there are in the design space. And the number of medium-sized molecules is 65 million, the number of the estimated known medium-sized molecules, the estimated number of actual drug-like molecules, again, sort of medium-sized and having certain properties that we don't care about right now is 10 to the 60. The number of atoms in the universe is about 10 to the 60 or 10 to the 65, right? So there are more unknown molecules than there are atoms in the universe or at least of the same order of magnitude. All right, so what are we gonna do? So in order to get to go at this, we're gonna start with an approach that allows us to solve at least the chemical part of the problem, that allows us to describe boundary arrangement, allows us to describe electron and proton transfer, allows us to describe excited electronic states. And that means we're gonna solve the Schrodinger equation, we're gonna couple the solution of the Schrodinger equation to the solution of the dynamics equations for the nuclei. So we'll answer these two questions, where are the electrons and where are the atoms and we'll answer them together, right? This is first principles molecular dynamics or ab initial molecular dynamics and we can do this for classical mechanics, we can do this for the nuclei, we can do it for quantum mechanics of the nuclei. Why is this a problem? Well, the problem here is that it's hard, right? So I just said, all right, we're gonna do this and we're gonna do it lots of times, but it's very, very difficult. And an example of how hard this is is something from about 10 years ago now where basically they took a protein and solution and did a calculation of a single point energy. And the amount of time it took to do this was two hours on 8,200 processor cores to get just the energy. What we're gonna need is dynamics, we're gonna need to do this over and over and over again, we're gonna need to do it for many, many molecules. So this is not, this is gonna be a problem, right? It could be a big problem. So what we turned to some time ago now in order to try to solve that problem was video games. What we noticed was that video games were getting better and better and they looked a lot more realistic today than they did when I started working on them, when I started playing video games that is in 1980, which for most of you, it's probably before you were born. But at that time, this is what video games looked like and I thought they were fun. And they, but now they look like this, right? And so something really happened over that 20, 30 years. And what happened was that computers got faster but even more than that, what happened was that many, many people wanted to play video games. And so they put a lot of money into it and they made them very, very, they made the machines that played the video games that you'll play video games on much, much better. And then ultimately this led to graphics cards which now are in high performance computers. So we took quantum mechanics and we put this on graphics cards and the final result of that was essentially 100 speedups. So you could now do electronic structure about a hundred times faster than for CPU and that's sort of where things stand. I just, and I'm not explaining this to you, I'm just giving you this as background. The, so now your calculations on molecules with hundreds of atoms really can be done in seconds without any problem at all. But this led us to a problem, right? And this I will tell you a little bit more about. And the problem was that the GPU was still not enough so it still wasn't fast enough. So a hundred times isn't what I need. I need 10,000 times, right? So it's speedup, right? Or 10,000 times more efficiency. So how do you do this? And so here they, I wanna give you a flavor for the kind of idea that could make things better. And let's just go step back and say, what is really the problem in quantum chemistry? So in quantum chemistry, you have electrons and electrons interacting with each other. And that means there's gonna be Coulomb repulsions, right? That means you're gonna need some integral of a charge density of what electron one electron two interacting via the Coulomb force. And in, because it's quantum mechanics, you're gonna end up with basis function products which give rise to each of those charges. So some label I and J, which describes the charges used for electron one, some label K and L describing the charges used for electron two. And this is the integral that we're talking about which is just cool. The wave function itself, now I can think of that as a tensor or as some object which has four indices, right? The wave function itself is gonna be written as some expansion over products of determinants or products of orbitals. And they're gonna have numbers in front of them. Those numbers also are going to be these wave function coefficients are also going to be tensors of some kind. They're gonna be objects with multiple indices. So I have two indices for this, the coefficients of the single activations. I have four indexes, four indices for the coefficients of the quadruple activations as a double expectation, right? And then what the problem is, is I'm gonna have to manipulate these tensors. So I'm gonna have lots of objects which are arrays if you want with four dimensions or even higher. And I'm gonna have to manipulate them and then I'm gonna have to contract them, right? So I'm gonna have to take sums over these. And the question that I want you to just have in your head right now is, do these tensors actually have as much information as they seem to have? So if I have a matrix with two dimensions, there are n elements on each side then there n squared piece of information at matrix, apparently, but are there really n squared piece of information at matrix or are there redundancies or are there zeros? And that's actually the question to ask yourself when you see this. So I would just point that, I would just say here, you know, there's lots of numbers, do I really need to worry about all of them or am I just wasting all my time? Right, and that's really the question. So to give you a sense of how we approach this, what we did was we started to think about recommendation system. So you probably remember this, certainly you know this in Amazon, whenever you go to Amazon it asks you whether you want to buy something and if you use Gmail it's probably constantly sending you emails about or showing you in your email, maybe you want to buy this other thing. If you use Netflix, you know that if you want to watch movies it will actually, or if you watch movies on Netflix it will start recommending things to you. And how does it do this, right? How does it actually, how do recommendation systems work? And this is the way they work. So you have essentially a matrix and you'll see in a minute this relates right back to the tensor problem. So you have a matrix, let's talk about movies where I might have several movies that I might have in my list of movies that I could recommend to people and I have several people. And then for each of these people I have an entry, right? For every person movie combination I have an entry. And that entry tells me whether or not that person likes that movie, right? So Joe here doesn't like Dark City, he's never seen Star Wars, he doesn't like, he kind of like Zero Dark 30, you know, he doesn't like Steel Magnolias and so forth. Fred loves Dark City, loves Star Wars, doesn't like Zero Dark 30. And then there are question marks, right? And the question marks are places where the system has no idea. It doesn't know, right? So it has no idea whether you like that movie or not. You didn't watch it, you didn't hover over it, right, has no idea. So the, so what your task is, is to complete this matrix. And by that what I mean your task is to actually take each of those question marks and put a number in there. So the assumption is that Joe does have some opinion about Star Wars, you just don't know what it is and I want you to predict it for me. So is it possible to do this? So I give you a matrix that has n squared elements and then ask you to, and some of the elements are question marks, is it possible to simply reconstruct the question marks? And the simple answer to this is in general, no, right? It can't be possible. There are n squared variables and you're giving me some small number of pieces of information. How could, how would I know what the other numbers are? But, you know, in this particular case, this particular problem, if you could solve this problem, then you would get, then the reward was several billion dollars. So the normal approach, so the academic approach, by the way, would have been, well, obviously we can't solve this problem. Let's go find another problem, right? But the approach in industry was, obviously we can't solve this problem as opposed, but maybe there is a way in which the problem is soluble and we should try to solve it anyway, right? So the industry people had it right and I'll show you here how they had it right. So let's say, let's take that same piece of that same set of data. They have Joe that I know doesn't like dark city and he doesn't like zero dark 30. And I go down and look at Sarah and I see that Sarah doesn't like dark city and she also doesn't like zero dark 30. And in fact, they're giving exactly the same rating for these two. And I also noticed that Joe rated still Magnolia but Sarah didn't. So I can take that one and I can put it here, right? And now I have, now I've, and I can take, and I know that Sarah saw Star Wars and Joe didn't. So I take that three and I stick it up there and then I keep going here. I look at Fred and Jane and see that they actually are giving the opposite. They're anti-correlated. So one of them hates zero dark 30 and still Magnolia. The other one loves zero dark 30 and still Magnolia. Then I know that Fred loved dark city and Star Wars so I can predict that Jane will probably hate zero dark 30 and Star Wars or dark city. And I, so I told you this problem was impossible. Hopefully you agreed with me just two minutes ago that this problem was impossible and yet it seems like I solved it. And you can keep on going right for the whole, you can keep on going through this whole game and basically and make an argument for all of it. And this is actually the machine learning strategy which is actually a really important. So if you're a youngster, super important to take, to pay attention to the philosophical statement here. So the machine learning strategy is, I'm given some problem that is not soluble, that I can prove is not soluble, right? And then what I do is I assume the problem is well posed and then solve it as though it was well posed. In other words, when you're given a problem that you can't solve, your one approach is, as I said, the academic approach is to find another problem. The other approach is to figure out, is to ask yourself under what circumstances, right? With what assumptions would this problem be soluble and then assume those to be true, right? And that's exactly what has been done here, right? So 10 numbers out of 20 was enough to determine the remaining 10 numbers in this example. What's really going on here, for those of you who remember, singular value decomposition, if I have some matrix, I can write this matrix as a, so this is just an arbitrary matrix. I can write this in terms of a sum of what it would call rank one product. So I have a vector, a column vector times a row vector. So this gives me a little matrix, column vector times row vector gives me a little matrix, right? I can basically decompose that matrix into a bunch of low-ranked products. And what you really are saying, if you wanna do this mathematically, what I just did by sort of seed of the pants, right? For filling out that matrix, if I wanna do this mathematically, I want a computer to do this, then what you do is you say, take a matrix and now imagine you can write it in this form and then take all the entries that you know in this matrix and take only a few of these rank one products and then find what are the, do an optimization problem where you try to find out what is the best vectors here and possible numbers in front of them that will give rise to the, all of the elements that I have and then use the minimum number of those in order to make the problem well posed. So you can do this mathematically, is all I'm telling you, right? So that actually allows you then to do that problem on a computer, right? Which is telling you that there are cases where a matrix might look like it has a lot of information but it doesn't really have as much information as it appears it has, right? You can play the same game in, and here I'll just mention this sort of briefly in the context of the wave function, right? So what I just showed you was you could do this in general and you could do that for the integrals which we'll see in a minute but they can first think about the wave function itself and the wave function itself, the full CI wave function that is the exact wave function, we can write in a form which has a matrix multiplying what are these, these are the exact details which are not that important but what these are is basically the, the a set of orbitals that have spin up and a set of orbitals have spin down. So fermions can be either up or down, right? So we can take all the up electrons and put them into some product here and all the down electrons and put them into some product here, put those together and you have a list of all the electrons and what orbitals they're in, right? And then you need some coefficient in front of that. That coefficient is that then could be, it's gonna label it CJK and it could be considered as a matrix that is number of alpha orbital, a number of alpha strings if you want times number of beta strings, right? And the point is just that this is a matrix. So I've taken all those coefficients, I put them into a matrix and then I can rewrite the, I can then I can basically ask whether I can play exactly the trick I just played but take that matrix and write it as a low rank construction of a bunch of outer products of rank one product of vectors. And the point is that you can, right? That actually this does work, right? So I'm not, I'm just trying to give you the flavor of it but they, but this does work and the effort now goes as to become the square root of the original effort of the problem. And the memory requirement becomes the square root of the original memory requirement in the problem. And I'll show you in a minute what that means. But they, but the question is, does it work, right? So how many terms do I need? So what I just told you says you're gonna figure out how many of these do I need? So how do I, do I need the whole matrix or not? And the answer ends up being you need almost nothing. So here we're taking the molecule naphthalene and we're looking at the, at the energy difference the error that you make if you have only, only one, two, three, four of those products, right? In the, in the expansion for the singlet and the triplet energy of naturally. And what you see, this is in, I can't see here because of that but what you see here is the error in listed in part three. So chemical accuracy be right around here at 10 to the minus three is that with 50 terms you have, you have basically k-calcimal accuracy out of how many out of, out of 10 to the four out of 10,000 terms, you only need 50, right? You need almost nothing. If you then ask about getting the, the difference the singlet triplet gap. So the difference in energy between the singlet and the triplet in a bunch of molecules, similar ACEs then what you see is that by 15 you actually have enough. So that, that converges even faster, right? So you don't need very much. And what I'm, the takeaway from this is there's no information in the wave function that basically they're, yes, sure it has 10 billion coefficients and on most of the, don't mean anything. They, now you can ask about what does that mean for timing? Well, so here let's look at the, the, the time it takes as a function of the number of determinants in the wave function for the, for a, a good a very, very good conventional full CI program and go up to like 16 electrons and 16 orbitals that's 10 to the 10 determinants. And they, and this basically is the, the slope is, is one as I, as, as you would, as well as, as I could show you formally should be. So the, so the, so the scaling, the time goes as the number of determinants and the number of determinants goes as a factorial in number of electrons, right? They, and we can now look here at the, at this ranked reduced scheme the scheme which basically takes that coefficient matrix and writes it in terms of low ranked products and tries to pick the slow ranked products as needed. And that goes as, and to the 0.5 basically as a square root of the number of determinants, right? So you're able, that means you're able to go up here to 30 electrons and 30 orbitals compared to 16 electrons and 16 orbitals on this time scale which is the maximum of 1,000 seconds. They, I should point out that even these, even these, these, these timings here, right? Or this, the fact that you can do 16 electrons and 16 orbitals in less than a hundred seconds is because of the graphics cards and full CI work that, so this is already really fast. This is the largest full CI as far as I'm aware that's ever been performed by anybody. And so now you just want to get a sense of what we're talking about, right? So there are 10 to the 16 determinants here. So 10 to the 16 determinants, well I was trying to get a good sense of what that meant. You know, the number of grains of sand on the entire earth is 10 to the 18 or at least that's what people estimate it as, right? So the number of coefficients in this calculation which takes less than 20 minutes is getting pretty within a factor of a hundred of the number of grains of sand in the entire world. The amount of time that it would take to do this and take that, that calculation you just extrapolate this out to 10 to 30, right? To where, in order to figure out how long would a conventional calculation to do that did take to do that. The answer is 10 to the 15 seconds. And then I was kind of wondering how long it's 10 to the 15 seconds. So started Googling around and I found out that the age of the universe is 10 to the 17 seconds. So it's within a factor, so this calculation done in less than 20 minutes is within a factor of a hundred of the age of the universe if we've done it the old way on the fastest possible on the best implementation ever. So now you would say, well, can I apply those ideas to the integral? So what we talked about was the wave function I tried to show you how you can use that. You can take the electronic structure problem and the wave function and make things much show that there isn't really very much information in the wave function. The other piece there was where those integrals those IJKL that we're sitting around. And the question is, can we apply these same ideas to those? And here it's, you know, there's a whole, some of you may know about this. There's a whole branch of art called shadow art, right? So they and every instantiation looks very similar to what I'm going to show you here, but you see here a shadow and you all know what that's the shadow of, right? So obviously this is the shadow of this piece of junk. Right, in fact, then this is the game, right? So the game in shadow art is to take some piece of junk and make it actually have a shadow that other people think they know what they're looking at. And the reason I show you this is because it helps you to understand what's going on actually and everything that I'm trying to give you a glimpse of. What's really going on is that you think because you wrote something down in some way and you looked at some structure that they and you had many, many, many coefficients that this problem was really hard, but actually it's a different problem and you're just looking at it, shadow sort of in the reverse, right? So you're seeing that you're seeing this, but actually then that you sort of opposite problem, right? You see some shadow that looks very, very complicated but the object isn't, right? And the next thing that there are other reasons to tell you about this, that this comes from changing the dimensionality. So if we think about the integrals, we could have taken those and just thought of them as a matrix, taken ij as an index and KL as an index. And but that actually is not really the right thing to do because that's going to actually, that holds the dimensionality in a way that will give you trouble later on. So what we showed, and here I'm just going to tell you the result, what we showed is that you can take this, this two electron interval, this four index tensor and you can rewrite it in the following form where you have basically five matrices that are being constructed, that are being contracted together in some way, right? And they, and now that actually helps you a lot. So you introduced two new indices, right? So you have, so this matrix, this first matrix now has the index i that you had before and some new index p. And the next matrix has this j index from before and a new index p again. And then k has, you know, k and then a new index q and so forth. So there are two new indices that come in, but now every one of the objects is a matrix. So every object is a two-dimensional tensor. These are being constructed, are being summed together to get this four-dimensional tensor. Why is this matter? All right, so you might look at it and think that it's more complicated, but actually it's not. Actually what it allows you now to sum, remember what we're gonna be doing is in electronic structure, what you would do is you would sum over these indices those things multiplied by wave function coefficients. And now what we're gonna be able to do is sum over this guy first and then sum over this guy and then sum over this guy, right? And so we'll be able to unpeel the onion sort of one piece at a time. And in that way, the scaling of the whole, of the whole algorithm goes way down. So what this actually, the wind-up of this is that all methods that include double excitations, mp2, mp3, ccsd, cc2, these are a couple clusters. These are all methods that people would use in electronic structure theory. All those methods have their scaling reduced to the fourth power of the size of the system, right? Now that's then to set back and remind you that the scaling of density functional theory, formerly, which many would be familiar with, right, is the fourth power of the size of the system. And so what I'm telling you is that the scaling of all of these wave function methods if you're supposedly superior to density functional theory is actually exactly the same as density functional theory. And this I can prove that to you, but it's, so here you see for, it doesn't mean it's cheaper, right? It just means that it scales the same. So as you increase the size of the system, the complexity goes down. So you can take proteins, this is ubiquitin, and do molar plastic calculation, and two calculations, including dynamic correlation on proteins and can do this in matter of hours, right? So, all right. So all of this, this now goes back to the beginning of the talk, though, you know, what does this have to do with explaining or with predicting anything? And they, and what all of this says is that it's gonna be much, much easier to explain stuff so you can do calculations faster. You should be able to explain things better. But what I told you at the beginning is that we really need to move away from that, or at least we need to add some more, some more, we need to add prediction and design as more routine pieces of the sort of workflow. So this calculation, you know, it took two hours on the 8200 CPU course today, you can do in five minutes on one workstation. So this is sort of the, and this is showing you the workstation that you would need at a different scale. If I actually change it so it's the same scale, then it's this big, right? So there's, so you have something like a, well, you have actually pretty serious speed ups, right? But now you say, what about prediction and design? I mean, you started out telling you about prediction and design, so I should tell you something about that. And this is where you come into another idea is well, okay, if we can do calculations fast enough, then maybe we can start discovering new molecules, maybe we can discover new reactions. So maybe we can actually take this method, which is where the, because it's solving the electronic Schrodinger equation, it's able to have bonds rearrange and form. So it should be able to find reactions that we didn't know about, right? If we just drive it, right? And that's what we're gonna try to do. So the basic idea here is I have some potential energy surface, I have some particular molecule A and then here's my molecule sort of skating around and you ask what's gonna happen? Well, eventually, if the temperature is high enough or if I wait long enough, you know, it might go over this barrier from A to C and I might get here. And maybe if the temperature is really high, maybe it'll go from A to B, right? So this is kind of what we expect to see. And what we know is that if the barrier is lower, so this barrier from A to C is lower than the barrier from A to B. So that means that this reaction should be faster. And so we think, well, we just go and find all the molecules and then find what all the other possible molecules are and what the barriers are and then we should be all golden, right? But we won't be. And the problem is this, right? So the problem is that this makes it look like there are well-defined minima, but in fact, there are many, many, many local minima all over the place, right? And so we need to find all these, we can find these minima, these minima are molecules, right? And all these paths are reactions that are sort of going between the molecules. So we're gonna need somehow to, so what I'm trying to get across is that the idea that I would take molecules, you know, a molecule AB and molecule CD and find some path between them and discover all the molecules and discover all the paths is gonna be problematic, right? It's gonna be problematic because there are many, many, many paths all over the place here. Each of these sort of, each of these way points is a molecule if you want on the surface and so that's gonna be problematic, right? So the number of species could be many. We don't know if we need to consider all of them we're gonna be in trouble and the number of paths is growing even faster than the number of species. And so we need some other way, right? And the other, the traditional way would have been to find the relevant species one at a time and find the path one at a time and we're gonna try to avoid this. So the way we're gonna avoid this is by, is essentially by just running dynamics with the solving for the electronic structure all the time so that all the bonds, all the chemical bonds can be made informed and we'll do that by putting this in some sphere with reflecting boundary conditions and then we'll just let that run and we'll let it run and we'll look at what happens and from what happens then we will actually begin to catalog all the molecules that are built then from that we'll actually catalog all the pathways and then we'll build up a kinetic model from that. So this is an example, an early example of what the dynamic sort of looks like. So you have in this case a bunch of acetylene molecules and hydrogen and you just watch this thing go and you can see that reactions are actually happening. So here I have now, this is now ethane, right? After the reaction happened. And so as long as the temperature is high enough and the pressure is high enough then those reactions actually occur. When you find the reactions you can actually say well now I have the different molecules and I can actually see what the path was that led between them, right? And that's basically we have a computer do all the work of figuring out the, you know what's actually going on. All right, so our first, you know that we applied this to prebiotic chemistry I'll tell you just a few words about that. They, the backdrop question here was what happened in early earth chemistry? So no one really knows what the early earth looked like but you know some people think it looked like that. They, but the question really is where did amino acids come from, right? So where did biological molecules come from? And the Uri Miller experiment was an experiment that was intended to try to figure that out where they put water and methane and ammonia and H2 and carbon oxide into a beaker and then basically just put this in reflux and they have a little lightning simulator that was gonna basically do an arc discharge inside here and this would just run around be heated up here, cooled off and then they would look in the crap and see what they found, right? And they did find that that experiment was done many, many years ago in the 50s and they found amino acids actually. So just starting from, remember what they started from you start from water, methane, ammonia, H2 and CO, right? So basically nothing, start from inorganic small compounds and go through that procedure and you get the amino acid. So we tried to do this and the first thing that we found was that reactions didn't happen fast enough. So you're still in there which goes back to what I was telling you before. So the graphics card make things faster and tensor hyper contraction which I was telling you about makes things faster but it doesn't make it fast enough. So we basically added in an artificial event generator. So we basically compressed the system so that things would run into each other which is you guys why don't things happen fast enough? The problem is that molecules actually don't, they're sort of kind of sticky. They kind of get next to each other. They stay next to each other. They don't and they don't and it takes them a while to find a way to react, right? But if you sort of smash them together then they'll react faster. And so you can have this sort of piston wave or this periodic piston and that actually makes the reactions happen much faster, right? So then you actually do many other things to make the reactions happen faster. I'm not gonna tell you about them today but then you can just ask, well, what do you end up discovering? So what we found were molecules that you might have expected, methanol, ethanol, various molecules that also are found in the millers so by the way. And then a number of biologically relevant ones including some amino acid precursors and some really, really crazy molecules and hundreds more. So we found hundreds of these reactions. What's really going on here? Well, what it's really doing it's not actually looking for molecule it really is looking for reactions and really what this is is a way to sort of triangulate between two possibilities. So one possibility is that I would take every possible molecule I could enumerate. So I'll take the periodic table there's no periodic table. Take a periodic table, divide all the atoms, put them all and say I'm gonna have five carbons, six hydrogens, one nitrogen and then make molecules by enumeration. And then actually start enumerating reactions that could happen. You could do this that would be one way to solve this problem of generating the sort of a reaction network. The problem with that is that you would take you forever before you did any calculations because all you'd be doing is enumerating. There's so many of these that you wouldn't be able to get anywhere. The other possibility is to say let's do a very realistic and very detailed simulation of the process. And the problem with that is that that's gonna take a long time. So I'm gonna have to actually simulate for a long time in order to see this. The advantage of doing this is that I would be able to actually read rates off of frequencies. So I could watch a reaction happening and say that reaction happened five times per minute. Therefore the rate is five per minute. And you'll see in a minute or you should notice by now I can't do that with what I'm showing you. Here you don't have any idea of the rate. Here you would actually read the rate right off. But both of these are impossible because they take way too long. Instead what we're doing is sort of in the middle where we say, okay, we'll do some simulation which has some dynamics in it which is kind of doing the right thing. We don't really trust the time scales at all. So forget about reading the rate off of the frequency. And that's actually what the data reactor is doing. So now you say, well, how do you connect that back to a real rate and the way you do that is through something like transition state theory. So you take here an MD trajectory. So this is a piece out of that URI Miller trajectory and the two pieces come together and they make some bond, right? And I can say, oh, okay. So I can take that as the initial guess for a minimum energy path that goes from a reactant to a product, right? And we can refine that and you'll get something like this which was showing that was going from reactants to product. And then you can look at the energy along that path and you have some pathway going from reactants to products through some barrier, right? And from that, you can put that in a transition state theory and you can get some reaction rate, right? One of the nice things that you can kind of do you kind of go backwards and see where everything comes from. So you have some reaction that reactant came from this reaction the reactant here came from this reaction and then this reaction, right? And you can kind of trace back where everything is coming from. Now one of the things that is not always obvious here is that reactions can happen that you would not expect. So one of the reactions that comes out of an anti-reactor is the reaction that goes from the autotomization that you go from this guy to this guy. So the only thing that's happening here is that the hydrogen is going from the nitrogen to the oxygen. So if you know any chemistry, you look at that and say that's not gonna happen, right? So the barrier for that, for that hydrogen to jump off of the nitrogen where it's nice and happy and then walk over through empty space to the oxygen it's gonna be really hard, the barrier for this is gonna be super hard, right? And in fact, it would be really high. But the anti-reactor finds this reaction, right? So then the question is how? And this is what it finds. So it finds a prototype. And so that's the, this is the, just because there's a bunch of stuff in here, right? There happened to be some water molecules and so it says, oh, well, you know, water molecules could be in the right place and it turns out that the water molecules have two water molecules in the right place here is not that improbable. And so you find catalytic reactions. And what I'm trying to show you here is that even though you don't ever, you find reactions and the anti-reactor knows it will tell you that these waters are involved. So it will say this species plus two waters gives you this species plus two waters, right? So it knows that those waters are in the reaction. But that means that it's not, this is not the same now. If you go back to the combinatorics, what you really need to be doing was, if you're gonna enumerate all these reactions, you really would have to enumerate them with catalytic partners, right? And here it's basically finding them all without too much trouble. All right, so let's see. So I'll just say, you know, just to try to finish up. So the, let's see, okay, yeah, okay. So the, so the anti-reactor is still going, right? We're doing lots of stuff with it. Let me try to finish here with one last thing and just say, you know, then the next question that you think about is, how do I get molecules into this system, right? So you want your system to be easy to use and you want it to be able to predict and you want it to be able to design and all this stuff. So how do you actually make it that it's really, as you, as what I tried to show you is that you're getting far enough along that you're able to actually use this computational tool in a sort of, in a discovery mode that you could pipe in the prediction and design, right? And the question now is, well, how do you make all this stuff easy to actually do calculations? And how long do you have to sit around and how do you make it easy to sort of make a workflow for this? So the, so all I want to show you here is you can then start imagining trying to use machine learning techniques to make that easy. And the way that we did this was, well, I really just wanted to show you this. This is actually all I wanted to show you. So they, so there is, you can actually download this on your iPhone. Basically we'll, so what we did was we took machine learning techniques of image captioning, took the data so that you could actually take a picture of a chem draw sketch of a molecule and it would actually turn that into a molecule internally, right? And then it would actually do calculations on that molecule to find molecular orbitals and so forth. And so they, and so you can take your notebook where you have molecules that you wrote down and then you can start doing calculations on them and you can actually feed them into an anti-reactor and actually start and start accumulating and find discovering reactions with those molecules. So the app is called MOLAR, M-O-L-A-R if you look for it on the iPhone, you can find it. So that's super cool. I should have spent more time on it but I didn't have time for that. So they, so let me just finalize this. That they, I think there's several things I was trying to tell you, there's several ideas here. You know, the first is that is to really realize video games have driven really theoretical chemistry which little unexpected, that ideas for machine learning and artificial intelligence including these recommendation systems and so forth are leading to, are feeding into more efficient and accurate ways to solve these chemical problems. I think that routine prediction and design at the molecular scale is something that we still need to be pushing for but I think it's actually on the horizon. I wouldn't say that, I would not say that we've done it yet but I think it's actually coming. And they, and automated discovery approaches like the anti-reactor are really, I think gonna be an important part to that. And you imagine that eventually this is, and some people are thinking about this but you know, more seriously than we are but eventually you can combine this with experiments and then you really will be able to automate chemistry and they, and that's sort of where, where I think all of this is gonna go. So, so let me acknowledge the many people that did this have done over quite a long time from this has elements that go back 10 years. And they, so there's been a whole bunch of graduate students and postdocs that have been involved in this and many people funding it from NSF to the Navy and our energy and number of companies. And thank you for your attention and thanks again for the. Thank you. It's been a long journey, but it's really very exciting. If you don't mind me asking, that's okay. Sure, okay, sure. Yeah, yeah, the, yes you do, you get memory and time to flex these energies. It's called tensor hyper contraction. But in any case that and hyper contraction is because you actually are, if you might have noticed that there are indices which are not being, which exist over three objects and you have to sum over all three, which is like a Hadamard problem. But yes, but yeah, you do get memory saving as well. The best thing to do is actually, I didn't tell you about, it's even more complicated, but you can take that same idea and you can sort of play it on itself, right? So you essentially say, the way I showed you the idea, it sounds like you have to actually find the object, the complicated object and then decompose it. And I tried to suggest to you, oh, by the way, did I forget to repeat the question? I did, all right. Yeah, yeah, yeah, yeah, yeah. You could, and I tried to give you the impression that you could actually take the object and find its decomposition without completing it, right? So you would find it sort of piecewise until you got close enough and then you would forget about all the riffraff at the end, right? But you might actually get even more greedy and you might say, I don't wanna ever know the object. I wanna be able to sample the object and construct the low rank piece directly. And so we showed how to do that for a couple cluster, right? And essentially you solve the equations in the reduced space directly. And that means that you lower the memory even more, right? So you never, because you never have the full object at all. Yeah, so the question is whether there's, is whether, yeah. The question is whether there is an analog of folding at home for this kind of technology. And the answer is that there should be. I didn't tell you about that, but that's one of the things that we've been pushing on most recently is trying to find ways to take many of these, to basically make it very easy to do these calculations at scale, right? So we would do thousands and thousands of banana reactor, for example. There's no reason why you couldn't be doing many, many, many calculations. In fact, you have to, but they're all independent, right? So all of those little reactions that you're doing take some reaction vessel and then heat it up and smash it around and see what reactions happen. The information that comes out of that is the reaction, you know, the molecules, the reaction, and then they're then gonna be subject to future refinement, right? The refinement itself, these are all independent. So you could do something like folding at home. We're not trying to do it in the distributed computing way where we have people give us their graphics cards, although maybe we should think about that. We're just trying to be able to do it in a way where if we have enough graphics cards lying around, right? Or an AWS or something, we can just spin up however much we want and let it do compute and then spin things down, a little bit easier. And, but if we could get people to give us their graphics cards, that would be, that would be good. Yeah, yeah, they, yeah, they, well, the most interesting thing was actually in the context of what are, it's an antioxidant, there's a class of molecules called radical trap in the antioxidants, which are, which will catalytically actually take a radical and remove, we'll catalytically remove radicals, basically what they do. And so in that case, we actually discovered a mechanism that was as far as we could tell new. And then we, and so it was, it was surprising, right? So there's a mechanism that actually looked at you. So that's kind of odd. But then it turned out, and so I don't know whether this is, so this was kind of, it was annoying to me, but I'm not sure whether it should be annoyed or not. What we found out is that if you dig in the literature far enough, right, then you find out that, yes, this mechanism was in fact actually a suggested, you know, 20 years ago for a, for in a context, which was almost the same, but we didn't have the arrow group, just an arrow group. And they, and so then I'm not sure then that was, that just put me in, I didn't know what to say because I'm like, okay, we didn't know about it. So I guess it's a discovery for us. This shows you that it's actually reasonable, but they, but it also means that it's not actually discovery. And so then you're getting into, and so the reason I'm telling you this, is that you get into this kind of funny thing of, how do you know when you're predicting or not? Because maybe you predicted it, but it was already known before by someone else. And I'm not sure how to deal with that, right? So I'm trying to figure out, well, mostly I'm trying to figure out how do you, this is your answer to a slightly different question, which is trying to figure out how do you actually know whether you're really predicting and designing or not? But we did find a new mechanism. And then there's range of reactions that you can push in on it. There's a series of computational chemists that just don't have that, and so that's sort of the piece that, if the computer is to find those reactions, that's actually in itself it comes out. That's right. There's another piece, well, anyway, yes, that's right. I won't say any more. The problem with transition metals is that their electronic structure is so nasty. So, but anyway, I'm sorry, you wanna stop your question. Yeah, yeah, there's one, so one thing that, yes, and we don't, so you're pointing to it. So I think what your question is, whether or not they, some of these sort of rank reduction ideas would help you to understand how to actually improve density functional theories. I think what you're asking. Or at least something like, yeah, right, right, right, right. And I think the, so the answer to that is yes, that's one of the reasons why so I didn't actually show you what we did. We used this tensor hyper contraction in the context of multi-reference perturbation theory. And there it does make things much, much cheaper scale, the scaling of multi-reference perturbation theory is essentially the same as the scaling of Patrick Bach. It's just multiplied, the cost is multiplied by a hundred. But they, as for small axis faces, the axis faces get larger than the prefax attire. And they were part of the reason that we were doing that was because my impression is that a multi-reference perturbation theory is the only thing that you could really trust for transition metals in cases where you have changing, you're not saturated, right? So, but we're not at the stage yet, unfortunately, where we do that for large enough, active spaces routinely enough, it's still expensive, it's still more expensive than DFT. And so I'm trying to, what I think is the right thing to think about now is how do you, and other people are thinking about this problem too, how do you actually take some of the ideas, the simplification that DFT has and put them, graph them onto a wave function method? And if you look at some of what I was showing you briefly, you could start to imagine that the reason why DFT works is because of the rank reduction types of things I'm showing you. But it's a non-linear form, right? So it could work even better, right? But if already in the linear form, you can do this kind of rank reduction, there might be a way to actually combine these things and that would solve a lot of problems. Not yet. Yeah, not yet. But we definitely are thinking about, definitely that's the next thing to do, whether I'm not ready right now to ask for experimental lab space yet, but I think, and my own experience in the lab has not always, but yes, so one of the things I didn't tell you about is coupling the nanoreactor to sort of template driven synthesis planning, right? So we've been developing synthesis planning systems and then we're basically just take template for reactions. It's sort of like the expert system approach to try and understand how to do design of reactions. And the reason that we were interested in that was precisely to connect it with the nanoreactor and then start actually being able to suggest new reactions that might new reaction types and feed those into that system and take the reaction types and then for instances with given our groups, for example, start predicting what the yield would be or the rate and then start, and that you would then really want to connect to an experimental system, which then you would have some sort of feedback where you'd effectively have the sort of alpha go strategy. So in alpha go, the way that the machine learns how to play out how to play go is by playing itself. And so this is what you really want here that eventually you get to the point where you have some system that is able to learn chemical rules and it's able to predict things and it is able to test its predictions and then it's able to find out that it failed and then to update its own system and that would, then you wouldn't need chemists anymore. Then you'd really do design, right? But yeah, I don't know. I don't know, I don't know, I might be, I mean, maybe 10 years from now you just look back and laugh and say, how could you have thought that any of this could possibly work? But I think it might be. It might be that, you know, putting things together might not really be any different, it's a little extreme, but it might not be as much different from screwing together two parts as we think. But we'll see. Yes, yeah. Yeah, I had intended to tell you something about photochemical nanoreactor. Actually, we've done this. It's, things are more complicated there, but we have done that. You essentially have the same kinds of, the same kind of discovery sort of approach where now what you're looking for are kind of intersections and basically trying to make maps between them. And that, I would say it's in its early days, but it's working, right? They, for microwaves we haven't tried, I mean, that shouldn't be that different. Some of the schemes that, but I don't know enough about microwaves. I thought microwave chemistry was mostly about temperature hotspots, but maybe I'm wrong. And so yes, so the answer is yes, we've done some things, some of them we haven't done, but definitely in the, those are the things that are on the agenda. No. Okay, so the question was, let's see. The question is whether the summary of the question is whether the paucity of information is system specific. So, so let me, so the answer to your question is no, it's not system specific, it's quite general. But now I have to say what, we have to actually ask that question with slightly a little more rigorous question, right? So the, so if you were to, could you design a molecule which would break these approximations? And the answer to that is yes, you can, right? So you could design a molecule which, who's elected, it doesn't, I don't care where that has new, I don't care what, it's just a way function. I could build, I could write a way function that can't be decomposed like this. And then I could back out some Hamiltonian which gave rise to that way function. And so, so I am not saying, when I say it's general, I'm not saying there's no way to break the approximation. What I'm saying is that every single molecule we've ever looked at, it works. And we've tested a number of them. And there is, and it seems plausible that molecules in normal states, right? So that are near their ground states or whatever. So they're not, not where you haven't broken every bond in the molecule are very likely always actually describable in this way. But it's hard to prove that, right? So, so, so my answer is it's general as far as we can tell, but we know you could break it, but we know the cases that you would use to break it are in fact the same, have properties that make them super, super weird for, for molecules, that would be like super fluid molecule, super fluid being like in very strongly correlated electron liquids and they're not where molecules usually live. That's right. Right, that's right. Yeah, that's right. That was the other piece of your question. So unlike Netflix, you actually know here because you basically can, can just, well, first of all, you, if you had all the object and then you decomposed it, then you can just decompose it better, right? And you know, you can tell how good your competition is. That's actually something I didn't point out, but I should point out. So machine learning is kind of getting, you know, growing in interest, right? And they, and what's actually, what I, what was actually done with the Netflix problem if you sort of step back is that we took machine learning ideas, we use them to learn something. And then we didn't bother to learn at all. We said we're actually, we know the structure, right? And so we learned the structure and then we use the structure and we use it in a way where it's self-correcting. So it doesn't make any difference in that context. You could do all this with a neural net too, which I'm playing around with. This, in this context, it doesn't make any difference whether or not you, what you're, you're learning had enough data or not, except that by having, they need to have enough data to show you the pattern, but it didn't matter whether, where most things you actually fit your, you have to worry, am I just memorizing data? Am I overfitting or whatever? Here you don't have to worry about that because the machine learning ideas are really helping you to see how to recast the problem. And that's a different strategy, which I think is, I don't think it's so easy to pull it off, but, and to repeat it, but if you can do that, if you can play that game, then you're gonna be much better off. You never have to worry about whether I have enough data to train my neural net and I have to worry about whether my neural net is overfit, right? I don't have to worry about any of that.