 All right, it is by size, so I think it's time to start. So hello everyone and welcome to this second talk in our physics and machine learning seminar. Today we welcome Dr. Tim Green from Deepvine. Tim studied natural sciences at the other place before moving here to Oxford to study for a D field in the department of materials with Professor Yates in the materials modeling laboratory. For his thesis entitled, prediction of NMRG coupling and condensed matter, he developed computational modeling with density functional theory. It is postdoc with the same group before joining Deepvine as a research engineer. Since joining, it has worked on computer vision, deeper enforcement learning, and protein folding. It is now the lead research engineer for the AlphaFold team. AlphaFold is a novel machine learning approach developed at Deepvine to predict the three dimensional structure that a protein will adopt based solely on its amino acid sequence. This has been an important open research problem for more than 50 years. Deepvine's novel approach demonstrated accuracy, competitive with experimental structures in a majority of cases and greatly outperforms other methods in the majority of other cases. It has caused a padding shift to the field of structural biology and has since been recognized as method of the year 2021 by nature methods. It is ranked easily among the most important scientific achievement of the last few years. So it's a great honor to have Dr. Tim Green. Brilliant, thank you very much. Can he hear me again? Then can you see the slides? Thank you, Tim, for accepting to give us a talk on this interesting topic and the virtual floor is yours. Thank you. Can you hear me okay? Yes. Brilliant. Cool. So I'm gonna hit, thanks for your introduction. And I'm here today to talk about AlphaFold. So highly accurate protein structure prediction with AlphaFold. And this is, I'll do a bit of acknowledgement later on but here are a range of the people who are on the paper and who all worked on it. So just give an instruction. So yeah, again, about me. So I'm actually supposed to cover this stuff. So if you want to email me, there's my email address at DeepMinds. So Cambridge physics and then realized the area of my ways and came to Oxford. Then I stayed at Oxford a bit longer and then finally I joined DeepMind in 2016. And since then I've worked in a range of things, a computer vision, population-based training, DPRL, you can see in the corner a bit of a picture of some agents playing CSF with each other in a sort of Quake III type environments. Bit of ML infrastructure. So I worked on the Sonnet, TensorFlow, layer abstraction library. And then finally I sort of started working on protein structure prediction about 2017 or so. And ever since I've been working on this really interesting problem. So essential part of DeepMinds mission is to solve fundamental scientific problems of AI. DeepMinds mission I think is to solve intelligence and apply it to scientific and social challenges. And what this is, is predicting the three structure of proteins from just from its amino acid sequence is one of these big fundamental scientific challenges. And our fold is our model that aims to solve this problem. So what are proteins? Proteins and molecular machines that are essential to life. They have many functions for our hair is made of a protein called keratin. So our immune system, which is this incredibly complex sort of practically a biological computer which defends us against all sorts of pathogens. And it's been in the news a lot recently obviously. And so proteins consist of a chain of amino acids and their polymers. And these polymers fold into a three D structure. So on the right, I sort of show a little cartoon sketch of what the backbone of one of these proteins looks like. You have a carbon, you have another carbon, you have nitrogen, that continues for how many residues you have. And then you also have these little side chains hanging off which give the protein what's interesting chemical properties. And the exact three D shape the protein forms are falls to its importance for its function. So the shape usually acts as some sort of framework that allows the right bits of the side chains, the amino acids with the right chemical properties to be in the right place at the right time in order to say bind to another molecule or bind to another protein or for some sort of confidential change or all sorts of things. The three D structure is very important. And understanding protein structure is a fundamental problem in biology and this is why we're interested in doing it. And so why predict protein structures? So why not just work them out experimentally? So working on this structure of protein experimentally can take months to years. One of my housemates spent her entire four year PhD trying to get the structure of one protein and managing the last six months and got a nature paper. And that's like some take longer, some take a bit faster but it is generally a really big task at all year. Several researchers for several years quite often. And structure prediction can provide actual information faster. So we won't necessarily be as accurate but often we are as experiment but we can do it in seconds to minutes to maybe up to hours rather than months or years. So you can skip all of these different steps. Each of which can be very challenging even interpreting the data. So once you've got the protein crystals X-ray diffracted them, you then have this problem of time to actually turn the data into three instruction which it turns out I recently learned is actually not that easy because you only get half the data you're missing all the phase information. So it's a problem called the phase problem which you need to solve in order to get a three instruction. And so by doing a structure prediction we try to circumvent this entire process and just go directly to predicting instruction. And actually I'll talk a bit later about how predicting structure can actually help the experimental process as well. So today I'm going to talk, I'm going to introduce myself, I just have and we'll talk a bit about how our fault works and we'll talk a bit about how we think our fault builds protein structures and we'll talk a bit about its impacts and we'll give an update on a follow on paper we did called our fault Mortimer and then I'm going to briefly summarize at the end about AI for science. Okay, so how a fault works. So the main intuition that I think is useful to have of sort of why something like our fault is possible at the moment is this influence of a protein structure on its function and then how it interacts with evolution. So a protein has some structure, a protein has some function like it might catalyze the reaction and then as a protein evolves so it's a organisms DNA changes it branches into new species these genes pick up various mutations these mutations in the genes changed the amino acids in the protein structure they might insert, delete or change amino acids and these new proteins which are slightly different from their sort of ancestors proteins will be a bit different it might have the function better or they might function the same or they might function worse and evolution selects which ones are fit so it will probably filter out the bad mutations and you'll no longer see those present in nature. So what this basically intuition that gets us to is that the protein sequences we see in nature are the ones which have been deemed to be fit probably because they conserve the function of the protein they probably conserve the function of the protein because they conserve its structure. So if we take a protein sequence and look for all the similar sequences in nature to that sequence by some sort of sequence alignment tool all these sequences which are similar to this one they will probably have the same structure and we can actually use this very important way is that the way in which mutations are preserved as being fit tell us a lot about the protein structure. So for example, often you might see that two residues mutate in a correlated fashion so one mutates the other mutates and that will tend to mean that why would you do that? Probably the reason is that those two residues are close to each other in 3D space. So this allows us to kind of invert the causality here and go from which sequences have been conserved by evolution to what is the structure that actually we're trying to conserve in order to retain its function that ensures its fitness. So AlphaFold, it's inputs and outputs I think it's a good place to start for machine learning models is a immunowatted sequence. So this is a 20 letter alphabet. A sequence of tens to hundreds to maybe up to a few thousands letters representing the polymer chain in which amino acids are put in sequence. And then we have the evolution related sequences. So these are sequences which we found in databases of protein sequences which are similar under some alignment metric. So you can see there what we've done is these tools will take a sequence and find similar sequences and align them so that the things which are the same all occur in the same column and then you also get to all what the insertions, deletions and changes are. So we have this thing called multiple sequence alignments and that will usually is what conventionally if you'll do is use these more classical tools will look for correlation directly in this multiple sequence alignment. So say, oh, this column varies and when this column varies unfortunately there are some confounding effects and it's a bit complex. So what we do is we actually just give this multiple sequence alignment directly to AlphaFold and that's it working out itself rather than trying to handcraft anything. And then the training data. So there was about 170,000 known experiments in protein structures. These being determined by, as I sort of explained, like labor is experimental work by thousands of researchers over decades really and that sort of represents a massive culmination of structural biochemistry. There's about 40,000 after you deduplicate quite similar structures. Another thing we do which I'll probably talk a bit more detail later is we use something called distillation. So we actually predict the structure of an additional 350,000 sequences of unknown structure and then use those as though they were training data. And then the outputs of the model is that we predict a 3D position of every atom in a protein. So there's up to somewhere between a few hundred and tens of thousands of atoms in one of these proteins. So another thought when we sort of think about building AlphaFold is about inductive bias. So deep learning, one of the ways to think about things is how the information flows around. So you might have seen something like this before but we have a say convolutional network. We have data in regular grids. It's translation, the aqua variant and information flows to your local neighborhood. You have like a patch of your receptive field for every activation neuron in a layer. And it's what we're using our FOD1. So this basically is to be best at learning rules which involve local information within the receptive field. And basically the better your inductive bias is suited to your problem, the easier it will be for you to use that data, you'll be able to use less data to learn a better model that learns about the rules that generated that data. So another example of an inductive bias would be recurrent networks. So for example, language where we kind of has a inductive bias that things have a sequence order. So things go in recurrent sequence and also that probably at things in a more recent past have greater influence in information terms and distant past. You have graph networks, so you have this fixed structure and then information flows along the fixed edges. So you have an inductive bias to pay attention to your nearest neighbors more than distant notes in the graph. And then finally, there's another sort of inductive bias in deep learning for attention, soft news language models, which is what we use now for FOD2 where you could say that we have sort of data in an unordered set. And then you sort of have say, think about edges between the items they're set and you're up to this neuron. And these edges are weighted dynamically, they're controlled by this keys and this key inquiry mechanism. So you can think of this as a information flow is dynamically gated by the network itself. So these are all inductive biases. And what we want to do in our FOD2 is built, physical and geometric insights into network, not just by processing data, but into the actual architecture so it learns the right thing. And to do this, we want to produce end-to-end system which directly produces a protein structure. So it has all the gradients, all the right sort of information flowing back to it to get it to produce the best representation possible for this task. And so some of the inductive biases that we put into this network to reflect some domain knowledge about protein physics and geometry is that the position of the residues and the sequence is de-emphasized because we wanted to think more about spatial 3D geometry than a sequential geometry on the chain. What we want to do, most of all, is we want residues which are close to each other in 3D space to communicate with each other more. They need to talk it to each other more so they can find the right alignment to each other, fit in, pack into the right space. And so even though we, both things we don't actually know what the graph is of the protein. We don't know which residues are near each other at the start because we don't know the structure of the protein. So it said what we think is the network and what we want the network to do is to effectively learn a graph of which residues are close to each other while reasoning over this implicit graph at the same time while it's being built. So down in the corner, you can sort of see a 3D structure and say that these are all the nearest neighbors of this residue with those lines. And then on the right, there's this kind of like pair matrix of every residue, like residue i and residue j have a sort of edge between them and we want to have some representation of you on those edges. So in more concrete terms and I want the network actually does, so we have the input sequence to the model. This is the protein we actually want to get the structure for. We throw it into a genetic database. There's the search tools. So we search over a variety of databases using sanded third party tools. And this gives us a multiple sequence alignment. So a set of sequences which are similar to the protein sequence you want to fold. And I said before it's important for evolutionary reasons. And then we also have the sequence you want to fold itself. We also put it in there. And then we, from a sequence, we create a pair representation. So this is a, we sort of take a make a tensor of features which corresponds to what ij is residue i concatenated to residue j. So this all starts off thinking about these residues being connected to each other. Sorry. So then we, next thing we do is that we also additionally search for structures which are similar to the protein. So structures sequences which have are similar to the protein sequence you're interested in that have a known structure. Often they're probably going to be quite similar in structure. So we might want to use that information. Our fold however can ignore these structures if it wants to. If it say thinks you have multiple sequence alignment contains better information. You can decide to discount these similar structures which a lot of other more classical homology modeling techniques can't do. This one advantage to deep learning. So once you've got these input set up, we now kind of have this sort of abstracts we call the MSA representation which is number of residues by the number of aligned sequences. And then we have what we call the pair representation which is a sort of a pair matrix of number of residues by number of residues. And then we run ERO4 which is our main sort of block which is probably where most of the magic happens in a way. And we run 48 blocks of this. And then we get out a new processed MSA representation and a new process pair representation. From that we take the first row which now has all this rich information in it. And we put it in with the pair matrix we then put it into what we call the structure module. This is a smaller sort of neural network block which creates the actual 3D structure and generates the both the backbone coordinates and also the sidechain coordinates. Also things we put on top of the network are a pair rescue confidence predictor. So we actually ask the network how confidence is it in its prediction? Does it think it's any good? And we also ask it for the pairwise confidence which is a bit hard to explain but basically it's how confidence is the network in a position of residue I compared to the ground truth from the residue J's frame of reference. And what this tells us probably not going to be too much detail in this talk but we have resources elsewhere is that can see basically it allows us to see which parts of the protein are predicted to be rigid or we're confident in their relative orientations might be one bit of protein which is quite a rigid block that we're pretty confident that all of this in the right sort of position where we then might have another bit of protein which is just wiggling around. And we're not really sure what the relative orientation of these two blocks is but we're quite confident in the relative orientation of residues within these blocks. So it allows us to get that information. And these confidence scores are really important for experimentalists and biologists to actually use our predictions because without these we wouldn't be able to say when we actually think you wouldn't be able to say oh our faults sometimes right and sometimes it's wrong. This will allow you to distinguish when it's most likely to be right and most likely to be wrong you probably shouldn't use the structure and when you should be using the structure because it's quite confident. So once we've got all these representations we produce the structure we then do this thing we call recycling. So we actually just take all of the outputs from the UFO form and the structure we send them right back to the beginning and run the entire network again. We actually do this three times. And we do this kind of, I guess you could think of as a memory saving thing because at this point we put stop gradients on it and don't back propagate. So we're kind of doing a stochastic back prop of this entire network. So at training time you roll out up to three iterations and then the final iteration based on the representation coming out of the previous iterations. And this seems to significantly improve performance possibly because it's increasing depth and not back propagated doesn't seem to harm it too much. So diving into evoformer itself which we're gonna have quite a few blocks of we have this MSA pair representation coming in and this MSA representation and its pair representation coming in. So we do attention. So on the MSA representation we do row column attention. So we do a, I think called grid rotation or axial attention or something like that because various names. So we do attention on the columns and then we do attention across the rows. And there's row attention we bias with a, we bias the attention logits by the pair matrix. The pair matrix feeds its information in by effectively telling the MSA representation which residue should be talking to each other. So probably the pair matrix has a hypothesis about where which ones are close to which and then it uses that to bias the information transfer in the MSA representation. And then to get information coming back the other way we then take this updated MSA representation which now quite abstract and we do something like an outer products basically we take a quantity which has number residues dimension and then turn it into a number residues by number residues quantity and then add that onto the pair representation. And that gives information flow in the opposite direction. And then we do an update on the pair representation with something you call triangular attention. So let's go into what triangular attention is. So thinking about this pair representation a bit more we basically have a graph. So each pixel in this matrix tensor multi-dimensional array, whatever basically corresponds to an edge. So we have a pair of residues. So ij corresponds to the edge between i and j. So we don't need the, we don't know the graph but you can think of this as a complete graph of all residues. And we kind of have this triplet relation we want to be thinking about. So length three cycles in its graph are quite interesting. What we want to do is update the edge ij or ik to in the text based on all cycles involving this edge. So ik should depend on i to j and then j to k summed over all j. And what this does is that in create say as I mentioned before like sort of inductive bias in the network and one of the big important things that if this pair matrix is gonna represent a say a distance matrix or something in the three dimensions like distances between all the residues that we want to have properties like the triangle inequality and loop closure as being easy things that easy are easy for the network to reason about. So we have this, we found it's quite a good way of updating the pair matrix. So then let's talk about the structure module. So the structure module is on the end of all these embeddings and this is the bit where we do end to end folding. So previously our fold one and some and other previous out protein folding systems often did a gradient descent and energy function. So instead, what we do here is we actually have a neural network which directly and iteratively produces the three coordinates. And one of the main tricks we do here which surprised us on people is that we actually abandoned the protein backbone. We stopped thinking it as a polymer. We actually chop it up into these gas of three rigid bodies. So taking chunks of three atoms, the nitrogen, the carbon alpha and the carbon, we turn it into a rigid three body. So the network's going to have to learn it. The chain has, they go together later but this has the advantage that we found that forcing the network while trying to predict the structure to always obey the backbone geometry actually prevents it making large maneuvers. It's hard for it to jump to the correct answer when you have this big long chain of hundreds of angles that you, if you want to say move this one bit then everything else has to move with it in just the right way. You get lever effects as like small chains the angle here moves the entire rest of the chain. And we just found that actually reasoning about this is a 3D gas is easier. And then to determine the position of these three bodies is that we iteratively update the position of each rigid body using an equivariate 3D transformer architecture. And then finally it also builds the side chain geometry. So those are little bits coming off the backbone using by just directly predicting some torsion angles. So you can see there for one of the protein over the eight iterations that she starts off with a pretty good idea of what the protein looks like. And then over the subsequent iteration that kind of refines it, it breathes a bit and compacts it a bit better. Maybe wiggles a few of the little loops around and then it finally settles down on the structure which is a bit more accurate. So actually coming out of the main stack like out of EVO former, it actually has a pretty good idea of what the protein structure is. You could probably linearly project it off but this seems to help iterate. It helps to sort of iteratively find and spend a slightly higher quality prediction. So I mentioned that we use unlabeled sequences for distillation. So we know about this so 200,000 protein structures in protein data bank but we actually know billions of protein sequences just without structures. So we use this data in two ways. One way is that we do a sort of BERT type thing. So we do a mask language model where we take the, when you give the MSA to the model we actually mask bits of the MSA out. And then we have an additional head where we ask the model to predict what the value is of the masked out locations in the MSA. And this basically gives the model a trading incentive to learn about the correlation structure of the MSA so basically which bits can be predicted from which other bits. I think we found that particularly helps early on in training. And then the other thing we do is that we use these predictions for distillation. So we train a model on PVB. So just from experimental structures we get a version of our fold using that. We then predict structures for a large sequence database of, so 10 to 100,000 of protein sequences. And then we, using the confidence metrics that we've created, we filter these predictions by confidence. And we use that to when we can actually filter down to just the best predictions, which helps us to avoid basically us both learning our worst predictions of the model to overfit on its areas to filter for their best predictions. And then you throw that into the training set. So you, with some, in some proportion, I think 75% of the time we train on a example from a distillation set and 25% of the time we train on a real experimental structure. Those might not be actual real proportions. I can't quite remember. And this seems to have the effects. I'll be bad if the errors are correlated always with what the model wants to do, but it seems by training specifically by training a separate model, it makes different errors and overall distillation appears to have something along the lines of a regularization effects on the network, allowing it to fit better and not overfit the experimental data. So at this point, you might be wondering which part of it matters. The unfortunate answer is it seems that all everything matters. When we ablate various improvements, we find that no single improvement is really dominant. I'd say that probably the main improvements is the methodology of building intuition into the model to try to create these sort of physical biases. And what we find is that if you ablate individual features, there isn't much effect, but if you start ablating pairs of features, then there can be quite a large effect. So things like our example here, if we ablate it to point attention, you only lose a few points, a percent points of accuracy. If you ablate recycling, maybe you lose like three or four percent points of accuracy, but if you ablate both of them, you actually lose like 17 points of accuracy. And I think that's what I'm saying. This is a complex system. Obviously it's adaptive. It can compensate for missing things. But once you start to remove two of these things, they're all property each other up in some ways. So nothing's dominant, but clearly this is sort of a sort of strong interaction between many of the components. So I'm gonna try and describe how we think or how we've investigated how Alpha builds protein structures. I've talked a bit about the iteration of it. And so one of our best sort of introspection that we did was we looked at some or interoperability investigations we did for the paper was interrogating the network by basically freezing all its weights. And then for every layer, so every one of those 48 layers, an EO4 times Y4 for every recycling iteration, we project off a structure. So we attach a structure prediction module to the pair representation that particular layer, then try to get to predict the best structure it can. And basically what this is gonna do is gonna tell us at that point in time, what information is in the pair representation and how good a structure does it allow us to predict. Like, is it, how much the processing has happened and how much does it actually know? And at the bottom, you can sort of see a cartoon that the first sort of layer, it might produce something that doesn't look much like a protein. And the last layer looks something which very much like a protein with like three well-panked beta sheets. And in the middle, maybe it's not quite sure yet, but it's starting to produce something which looks a bit like a protein, which is interesting. So if you look at this SARS-CoV-2 protein Orphate, which is one of the ones which we were pleasantly surprised actually like a quite nice prediction, you can see that over the 192 layers of blocks, it really does refine, sort of produce the protein structure, reason about it, refine it. You can see it sort of starts out a bit of a mess, but then over time, it starts to stack in and pack the protein together. So those big arrows are what are called beta sheets. And they start to get created, then packed in towards each other. And in some ways, it almost looks like a real molecular dynamic simulation. But of course, this is like just what the contents of the neural network is. It's just what it's doing as it is reasoning about how this protein is built. That's a particularly hard one. So it takes quite a long time, lots of iterations or blocks of the network for it to actually come to the final conclusion. And this is a really big protein, T1024 about 2,300 residues. You can see here that right at the start, it starts out this kind of like big black hole. We actually start all the residues out on top of each other called the black hole initialization. Then it rapidly sort of explodes in the part, has a pretty good idea of where they're going. And then once these sort of broad areas are kind of like quite well refined in place, you kind of sort of, it's just almost breathing a bit. You see these large scale motions and it kind of jiggles around the protein domain as trying to find exactly the right places to go. You can see like a little on the left-hand side when the blue helices isn't quite sure where it goes. It's quite uncertain. So it's kind of like trying various places for it to pack against the side of the protein. And yeah, so it's very interesting seeing how it's actually thinking and basically it confirms that it is doing this kind of iterative reasoning process. It does have an idea of what the protein looks like and this idea gets better over time. And we look at this quantitatively by looking at the accuracy of the prediction projected off at every layer, so up to 192 layers. You can see that with every layer, the prediction quality is basically going up monotonically. And for some proteins, it goes up very quickly. So T1024 was one protein in CASP which had quite a good prediction, like quite easily able to predict the structure accurately. It really only needs about 48 iterations or blocks here. So it doesn't really need the rest of the recycling iterations. But T1024, the one I just showed, the SARS-CoV-2 protein needs all of these iterations. It really takes quite a long time for it to gradually improve the prediction until it gets to its final prediction, basically just will squeaks in at layer 192 and then at that point gets better, get a prediction. And this is something you might of course then think that this is surely why don't we just keep on recycling and people have of course looked at this and I can't confidently say that it always increases accuracy. I think it's quite hard to show that iterating recycling for longer, something I call hypercycling, always gives better results, but it can give better results in some cases, especially where the protein needs more time, the network basically needs more time to think about it. Also another interesting thing about looking at this graph is that there aren't really discontinuities at 48, 96 and 144. It doesn't really seem like the network has much problem transferring the information across these points, even though there are no gradients flowing. Okay, so we've talked a bit about the interpretation of how a fault works and I'm gonna have a sort of go through a bit, the impact that our faults had. So it obviously helps when we're working on these other problems that to have a clear success metric. And fortunately for us, the protein starts prediction community had created this thing called CASP, so the critical assessment of protein structure prediction, which is a biennial competition, happens every two years. And it is a sort of blind assessment of how good your structure prediction system is, which is especially important when doing machine learning models because people can access any area of fits and all that kind of stuff. And so it only tests people on protein structures which actually haven't been publicly released yet. And this is very stringent a test and for a long time, there wasn't much progress, but as you can see in 2018, there was like quite a big boost for improvements as our fault one entered the scene, this is our first entry to CASP. And then in 2020, we got this gigantic improvement which really pushed the accuracy kind of over the line to the point where actually it's starting to be competitive experiments. And we consistently top ranked methods achieving very high accuracy in a lot of, in most cases. And yeah, and I guess that's why we're here talking to you say about it. So yeah, so it's every two years. And so our fault two achieves 92.4 GDT median accuracy over all targets and had to be recognized as solutions to protein structure prediction problem. And here's a couple of examples. So we've always talked about 1064 again. So that's quite a nice one. So ground truth in the green and prediction in blue. Here's another one. So this one again, we talked about the 2300 residue RNA polymerase, which I think one of the most impressive things about this is that actually in CASP we actually asked to fold it as individual subsections for individual domains. You can see in the bottom right because this is too big usually, but we actually managed to fold it as one gigantic single chain. And the off-fold system, despite only being trained up to proteins up to 512 residues in length, actually generalized okay up to 2300 residues with some tweaks. So one other part of the impact we've had is that we actually open source the code and the model weights to run our fold. So in our GitHub repo is a copy of all the codes and scripts download the genetic databases and neural network weights. And that allows anyone to run and use our fold on any protein they want and sort of less of the model. And I think I've seen some really amazing stuff in the community since it was released. I'll maybe show some things in a bit about that. Another thing we did we've created the off-fold protein structure database. This I guess is to say, no, not everyone wants to install and run our fold on their computer or on a co-op notebook or something off-fold is fast. Why not just predict the structure of loads of proteins preferably every known protein. So we did this in collaboration of Ember-EBI which is the European National Bioschool Lab, European Bioinformatics Institute, I think. And so they specialize in lots of protein data. They actually host the European protein data bank. So they know they're experts in this kind of stuff. They helped us to, so they developed and helped us to develop this database and this contains about 800,000 protein structures at the moment corresponding to humans, so homo sapiens plus 20 model organisms of like interesting like mice and E. coli and all that plus a thing called twist prop, which is a collection of manually labeled proteins which are known to be interesting and probably quite important. So that's about 800,000 structures so far. And we intend to expand this to Uniref 90, which is a cluster of representation of basically all known protein structures in some sense, sort of protein sequences. So that's gonna be about 135 million structures. So that's gonna be really huge. But I think that's gonna be really cool because hopefully you won't have to install them on our fold, you'll be able to just go on this website, type in the name of your protein and see the structure immediately. And I think it's really exciting to see how biologists are already using this and there's all the day-to-day workflows. So on that about the human proteome, so we predicted every protein in humans, which is called the human proteome, and we looked at how much did we improve it, like how much did we actually learn by doing this. And we found that even when we account for template modeling, which is where you find proteins with a similar sequence with a known structure and then fill it in, it's like yours, our fold actually significantly increases the high accuracy coverage of the human proteome. What the impact of this will be is a bit hard to say because biology takes a while, medical impacts take a while, but I think it's gonna be really amazing seeing all the insights and maybe even sort of improvements to insights and medicine that can come from the greater knowledge and not having to spend years doing structural biology to get maybe a hint of what's going on as a process or something. And also should note that we're good at membrane proteins as people often ask that. Okay, so I mentioned earlier that protein structure prediction can actually help experiments. And probably one of the things is that I feel to talk about experimental data, but in reality you actually always have to interpret experimental data. It comes in, say, blurry or like low resolution cryo-image images, you have a sort of electron microscope image or a protein, these are really incredible experiments that cool down the proteins, stop them jiggling around so much, they image them, then stack all these images, but you still have resolution barriers and the resolution is always improving, but often the best image you have is the best image you have. And what alpha can do is that quite some like cryo-em often knows what the bulk structure is like the sort of gross topology is of a protein block, the details a bit blurry. And alpha is actually very complementary to that. Alpha is often very good at the details and where it makes mistakes, it's not quite sure whether relative orientation of two big floppy bits is. So if you then put that into the cryo-em density, it can actually often really help to interpret the experimental data. So you can see there the alpha structure fits in the cryo-em density, maybe perfectly. And another way of doing this I mentioned about with cosmography, you know, this is what's called a phase problem that you're actually missing half your information. And it's going to be a significant barrier to getting this really structure out of your X-ray diffraction data. And ideally what you want to solve the phase problem is a really good guess of what the structure looks like. What you do is you sort of take a set of three coordinates and then gradient to send it on some, I think maximum likelihood loss that helps you to learn to then sort of minima make it find the structure, which is most likely to have been generated by the data. And so previously you might have had to hope you had a really good template model, so a similar protein of a structure that's already known, but people have found quite rapidly, actually during CASP itself, that our structures can be really quite often useful for what's called macro replacement, where you actually use them as a starting guess for the refinement process, and that will then help you to interpret the X-ray data. So some people have said they've gone back to data that 10 years ago that they weren't able to analyze because they couldn't solve phase and they were then able to use the alpha prediction to then solve it, get a great structure, and then publish it. So that was one of the really incredible things. That's like probably the fastest bit of, I wasn't expecting experimental impact to come so quickly, but basically we were told by the CASP organizers, like we were so surprised by your results that we had to go check with the experimentalists, and they hadn't actually got the answers yet. We actually found they were able to use your prediction to get to solve their data, which is really cool. Another thing we've found is that when Alphold is not confident, actually this corresponds to something which might actually be real. Often it's not confident because the protein doesn't have a structure. So lots of protein structures have what's called intrinsically disordered regions. So these are regions which don't fold to form proper structures. And actually this is, often people don't appreciate this very much because they don't share an experiment very well, but huge amounts of proteins are actually intrinsically disordered. I think some, like maybe a third of human proteins have intrinsically disordered or residues out. And so actually we find that low confidence in Alphold is actually a very strong signal of disorder on a par with existing or maybe in bed, I'm not sure we're sure, than existing disorder predictors. That's very interesting. So something that Alphold wasn't really trained to do turns out actually it's quite good at. And generally there's been a huge reaction from the community of sort of people picking out and using it and applying it to all their problems, solving the other old data, finding ways for it to use for hypothesis generation. So looking at, oh, this protein binds to this protein, but we don't have structures. But if you look at the Alphold structure, you can see that there are these residues here. So if you take those, they'll affect the binding interface and they might cause it to not bind or something. So people are starting to do a biological experiment using Alphold structures as a hypothesis generation. People are also been playing with the code quite a bit. And one thing they found was actually that you can trick Alphold into folding multi-chain proteins. So most of you've been talking so far about single-chain proteins, but actually a lot of proteins are made of multiple protein chains which fold up together. Turns out Alphold, if you put in either a big sort of junk link, linker region or hack the features of it, you can actually trick it into predicting multi-chain proteins, which is quite funny. Which gets me on to the next section, which is about Alphold Multima. Just looking at time a bit. And this is a quick update we kind of did in October where we actually train it properly. So the bunch of people have been sort of hacking Alphold and getting it to predict these multi-chain proteins. But actually we felt that the best thing, the right thing to do is to go back and retrain Alphold properly on multi-chain data with the right features. So it actually handles multi-chain proteins from scratch for sort of properly. We found this actually gave quite a big boost in accuracy. So some of the other methods are there. Another previous non-Alphold-based multi-chain protein method called Cluspros there in green. And you can see that Alphold Multima, the pale blue on the left bar on the left is significantly better than one of these on this dot queue score, which says how good is your multi-chain protein prediction? And on the right, I've given a bunch of cherry picked examples from the paper, showing how we predict some really, I think quite beautiful structures of these multi-chain proteins. And tasks and limitations. So one thing we notice is that compared to Alphold just hacked up, it gives significantly better predictions for things proteins called heteromers, where you have two different protein sequences as two different chains or multi-more than two different chains. And it gives a bit of a small improvement, still some improvement on what are called homers. So proteins were just made of change for the same sequence. It's probably because Alphold can actually impute a loss of missing context by itself. And so even while folding a one chain out of a homer, you could probably actually guess that the homer had some partners that are missing and impute them. And so actually make it still look quite an accurate structure, even though physically it shouldn't be able to. So some limitations of Alphold Multimum is that it requires you to actually know how many chains there are and know the stoichiometry. It seems to perform poorly on antibody interactions. And also one thing we've found is that it's not a great low confidence on a sort of structured prediction, it's integrated into a sign of non-interacting pair. So one of the things people would really like to know is like which proteins in a human or interact with each other? Do you think, okay, what we need to do is just fold every pair of proteins together. But it seems like that's low confidence isn't necessarily a good indicator there that two proteins don't interact or not. So this is all new work. I think sort of the multiple system is at a earlier stage. You could say that I want the single chain system, but also to some extent the case that the Multimum problem is a more ill-defined problem, knowing what the exact correct Multimum state of the protein is probably not as well-defined as the folder state in a sense. So drawing into the sort of end of the talk, I just want to sort of talk, mention a bit about what's next. So these are necessary things that we would work on, but these are broadly the things I think are next in the areas sort of structure prediction and psychety. As we talked about deep learning and machine learning applied to structure prediction. So we want to increase the Multimum Accuracy. And I said, and then thinking about the human structural interactome. So if you increase the accuracy, can we actually start to predict which proteins in an organism interact with each other? We can also extend the system to work on things which aren't proteins. So nucleic acids are just another type of polymer in a way. And proteins often interact with and form complexes with nucleic acids. So the ribosome, the thing that actually in your body that creates proteins is itself the complex of proteins and RNA. And also you want to know what the structure of proteins is about binds to a drug molecule. This is very important for drug discovery. It's called the docking problem. And where a small molecule docks or binds to a protein is interesting because it's so often it will block or promote some sort of interaction that the proteins involved with. And knowing how that works will help you design molecules which will be better drugs. And then also one thing is that proteins aren't these static rigid structures. You've seen them jiggling around a bit as alfalfolsome. But also in nature, they jiggle around their thermal. They're these big floppy things. And actually proteins have conformational diversity. Usually experiment you'll see probably something corresponds a bit to like a free energy minimum, but there'll be a bunch of other states. And often these states will be affected by mutations. So the relative population might change when you take the protein sequence or if it interacts with say a drug molecule or something. Another one is the structural meta-proteo, you might say. So basically expanding the alfalfold database to cover all these known, almost all known protein structures. So that's gonna be about 135 million, 100 million sort range. And also things about how we can actually build tools to make this useful. So bioinformatics is traditionally based on processing sequences of proteins. But imagine what we could do in bioinformatics if every single protein sequence you look at comes with a structure. I think that's really interesting. And then finally, as I was mentioning, confirmations, mutations are very important. So understanding disease often involves understanding what the effect of mutation is. So there's a mutation in gene, gives this change in the protein. How does that mutation affect the structure and stability and function of the protein? And that can help us understand disease. And just to give a sort of more perspective outlook on sort of the future of AI for science. I think there's a really great potential for building like these very highly optimized ML systems scientific problems out of alfalfold. And really AI can be the sort of ultimate tool to help scientists see further. I think there's a bunch of lenses in which you can see AI for science. You can maybe categorize them being sort of data-driven things, a bit like alfalfold to uncovering patterns in datasets, like experimental datasets. There's things like accelerating simulations so people use ML to try to say speed up, say fluid dynamic simulations. There's using ML as a function approximation. So these deep neural networks are very good function approximators. So some of the worker deep minds listed below is things like using deep neural networks to represent many body wave functions. So in that case, it's not even trained from data. It's just using it purely as a function of approximation. So what we have here, like learning from simulation data, so improving density functional theory. So basically where we already have a heuristic that we have to kind of learn, we can actually improve that heuristic by using neural network. Using the NZML1 glassy system. So that's using neural networks to explore and understand simulation data, to understand phase transitions in glassy systems. There's this dynamics work we do on understanding gene expression. And then there's a matter of work we've done so there's recent paper where AI is actually used to guide human intuition. So discover patterns and mathematical objects and then use that to guide humans to actually produce interesting theorems. So yes, that's kind of in a sort of search and using AI to search better. So yeah, I think our thoughts in a sense, I think just the start of the impact that AI is gonna have on science and really how we're gonna leverage AI to enhance human intelligence in this entire discipline. Yeah, so on acknowledgements, these are lots of people that are deep-mined and Emily BI who worked in our fold in our fold release. She also thank the CASP community for creating a really good benchmark test that I think helped at least convince people that we've solved the problem. And of course the experimental data, the protein data bank. So this experimental community has been working for years to build up this really amazing resource of experimental structures and also most importantly, free release them so that people can analyze them and build on them. So that's the end of the talk. I think we now have some time for Q and A. If I've got nothing after this. So happy to chat for at least a few minutes. Thank you very much for this very interesting talk. I'm sure we already have two questions by people raising their hands and I think another one in the chat is being modified. So, Jude, I think you were first. Please do ask the questions. Yeah, thanks. That was a really, really interesting talk. I really enjoyed that. And I was particularly interested in this recycling mechanism. So I understand you've got these kind of multiple representations of the protein structure inside the network. You've got like the pair representation, the sequence alignment, the multiple sequence alignment representation and in the structure module, you also have these like the rotation and translation that actually encodes the position of the kind of residue gas fragment. And which of these representations are actually recycled or are you generating a protein structure and then recomputing features as a template or if you could elucidate, that would be great. If I remember, I think we actually recycled, we put all of them back in. So I think we put the, I think we add the abstract features back onto themselves at the start. And then we might encode the structure that's predicted back in as a, only thing goes in as a template, but I think it goes in at some point. So honestly, these are the sort of details which are probably best found either the open source code or the paper. Because those, I mean, the code is probably the most accurate but the paper is also very good. We have a 60 page SI which has all these details in pseudo codes. Yeah, I think we put all of them back. Of course, then we would like interest in variations like do you just put the structure, do you just put the representations, which one's better? I think the intuition probably is you probably need the structure at least for the moment without other, some other conversation changes because the structure has the side chains and some other information there which might be useful. Like the structure is probably a little bit better than what's in the pair representation. But overall, I know for all, I probably think the most time to think of recycling is a trick to increase the depth of the network. Brilliant, thanks. That's really helpful. Right, I think she can go, Will. Yeah, I was just a little interested in more the kind of process of how you designed how to fold. Were a lot of these updates kind of iterative? Were you testing it on say a chunk of it and finding a floor in it and then adding in some kind of domain knowledge or did you have a kind of big picture in your head of what you wanted to go for? Oh yeah, really interesting question. I think, yeah, you're right. We kind of developed iteratively. I think you can kind of look at the end results and kind of see that it's been developed iteratively. The main way we work for a lot of the outfold development is to lead the board-based development is that we know we're aiming towards CASP, so the Protein Folding Competition and we think that's a good benchmark. So as you develop it, you sort of train all the structures and then we held out a test set of the CASP Proteins or actually all proteins after a certain date. And then we fold that test set continuously and we basically have that as a team infrastructure and team goal is basically to increase that accuracy on that leaderboard. And basically, a lot of the time we just sort of had a hypothesis like, oh, we need more inductive bias on this thing or covariance tension and we'll make this better or something, you put it in, train it, doesn't improve on, and then ultimately see it doesn't improve on the leaderboard. And so lots of what we did was sort of leaderboard-based development and sometimes we step back and then actually start looking at structures and say, oh, this thing isn't doing this thing for this reason, that might give you a hypothesis for what you need to look at next, that oh, it's missing this because it's, oh, this sort of information channel doesn't exist so we should add a better information channel there. Or one thing you actually do is that sometimes you actually analyze what is the network trying to do? So like after 64 layers or whatever, what is it, what representation is it built up? And can we actually find a way for it to earlier on do that sort of faster? So one trick you can do while developing this network is looking at the attention maps inside the network and see, you can see like which residues are trying to talk to each other and can we make it easier for them to talk to each other? Like reduce the signal to noise ratio in that sort of communication. That's a brilliant answer. I just wanna know from a machine learning perspective as well whether you feel we want to maintain humans in the loop here or do you think that will always be a necessity or do you think that part of this kind of drive is phasing that out? Do you think are you talking in terms of like auto ML type ideas? So kind of you're continually looking at it and improving it from human standpoint. Is that, do you see that as a process that will always be? I think it's an interesting question. In my view on auto ML is that auto ML seems to like historically being better, like it can refine and improve existing architectures like find some funny ways to make a update or something. But auto ML that works on confidence then suddenly step back and create transformers. It's still quite hard for auto ML to see that and that might be a, you might call it an AGI hard problem like that sort of level of to a leap of faith or generalization or thinking patterns create that. On the other hand, my thought would be that there's more scene of things like really huge language models is that inductive biases and things seem to become less important the more data you have. So it might be that while I have a really huge network possibly not just train them protein data. Imagine I was doing something ridiculous like, training language model on both Wikipedia and proteins at the same time and like a billion other things. Maybe that one network can actually learn to do all of these things at once and the information flow and the way it learns to reason means it needs very little human guidance to do all those things at once. Of course, the main place that humans will still come in is we need humans are needed to define your objective. As you know, if any optimization problem you will eventually will start to exploit the score and you will no longer actually be doing something useful. And what you really want to do is make sure your score that you're trying to optimize the line with the task the business case like what you're actually trying to do with this. So I think that's another reason you generally always going to need humans and to keep things aligned with what you actually want to do. That's exactly what I was looking for. Thank you very much. Thank you. Thank you, Edith. We have another question from Alde Louis. Hear me? Yes, we can hear you. Yeah, I got super. So just a quick technical question that I missed in your talk. So presumably you use some kind of force field presumably at the ends that we find your structures. Is that right? Or is it all bioinformatics all the way through? So it's two things we do to improve the structure. So at the end we put on a force field like loss which penalizes the network when the bond lengths are incorrect or atoms are overlapping. That's a loss on your network. And then another thing we do that generally removes most what we call structural violations. So violations of the, what do you call it? Stereochemical properties. Like then nothing we do after that is we actually do then run a bit of a gradient descent on the final structure with a open MM and force fields. And that produces like very small changes. We actually restrained it to the predictor structure that cleans up a few of the small like, slightly wrong bomb links or something with like a few overlapping atoms. And generally we just do that. So it will show properly an engine in Pymol or something. It doesn't give any difference in accuracy. So I guess for as like optimizing these force fields doesn't actually, isn't predicted the protein structure generally. And there's no water in this final steps. No, the amber, I think it's amber 99SB which is a implicit solvent force field. That's what you need. Peter Bram has a question for you. So first of all, Tim, I want to thank you for an absolutely fantastic talk. It seems to me looking at several of the results from DeepMind, we've had the pleasure of hosting you a few times over the last three years that you generally work with a substantial team. And also you take a very deep and liberal way of thinking about problems. Like you described in this talk very well in the beginning, I think. Do you have some thoughts about how the more traditional graduate students in a university could also get sort of optimal results in this area? And can universities actually learn some important lessons from DeepMind's structure as sort of an academic inspired new kind of company to join you in blazing the trail? I think it's a very interesting question. I'm sure the MSR CEO has lots of thoughts on this. I think the, in terms of advice for graduate students, I think similar to like what DeepMind does, I mean, if protein structure prediction wasn't going to be solved in some sense by deep learning, we probably wouldn't have done it. So, because our strengths are in sort of pure science and deep learning. And we also bring some domain knowledge and say from physics and things. So I think one advice would be sort of stick to strengths. Like DeepMind tends to be quite focused and do these big things. There's often less, say, totally blue skies, crazy investigations. There's often like much sort of more driven problems maybe on the time to tackle more routine problem, but also as a graduate student, you have quite a lot of freedom to investigate very different things, draw down, maybe go after problems which don't have to be the sort of useful. Like I think as a graduate student, my area I worked in NMR wasn't very competitive. Like there weren't a huge number of other students working like competing against me on it, but I was able to make quite a nice niche that was actually very useful for lots of experimentalists. But I think the main problem to me, I think this probably gets into things that graduate students themselves can't fix but maybe universities can fix. So one example for me was the existence. I really liked DeepMind's research engineering position is that scientific software engineers for a long time hasn't been a very well-developed career track in academia. I understand it's getting a bit better recently I've seen the Royal Society of Research Engineering and that sort of thing. So that's one thing is a lot of modern scientific problems involve strong computer science and software engineering ability to solve and research groups can benefit from having those skills present. Another one I think is probably the hardest one to solve in academia is this incentives problem, is that DeepMinds can sort of form teams of people who are motivated and aligned to all work together on this one big project. And I remember when at the Arthur Ford Conference with even the first one, the PIs we spoke to were kind of amazed that we managed to get sort of six people to work together for a couple of years in one problem. Like that's actually quite hard in lots of academic situations. And because people want to graduate, they want to write their thesis, everyone wants a first author paper, we often find that a team that writing papers is a massive sort of takes a lot of time and you actually quite like just be moving on but academia identifies you to take cuts and like decide when you're going to publish a paper. So I think that's one thing is that working how to align incentives possibly I don't know more institute type structure that allows people to collaborate on bigger projects in a way that like respects their contributions or something like that. I think that's probably the sort of ways I'd be thinking. Why do you still think there are different strengths and benefits to both ways of doing things? Thank you. I think it's a very clear answer. You sort of particularly mentioning these longer lived bigger teams, the impact of computer scientists and then the selection of the problems. Yeah, that's very, very clear. Thanks very much. Thank you. Right, I think Patrick Brennan has a question and then we have two more questions in the chat. Hi, thank you very much for a great talk. I was just curious whether DeepMind has had any thoughts about training alpha fold using nucleic acid sequences as opposed to amino acid sequences. For example, looking at how the choice of codons and how that affects the actual translation of the protein could be used in structure prediction. Okay, so not predicting nucleic acid structures but actually inputting the codons rather than amino acid sequence. We have a, it came up sort of earlier on, I think quite early on in the alpha one project we sort of were, I've been using about this because we didn't really know much about bioinformatics. But I think it turns out, other people might know it's better is that a lot of these databases of protein sequences don't come with a codon sequence. Mine standing is often the sequence from nature where someone's just found a protein you do reverse transcriptase and then that's, you actually don't have the original coding sequence that the protein came from. And that means that say there's these MSAs don't come with the DNA, the codons come with them. And so yeah, you, I think the information often just isn't available. But we're very interested like probably as research looking at this that certainly you would think that sort of correlated mutations might be more obvious if you were able to look at some of the actual bases that we're mutating rather than just the actual effects on the current what's called what it actually changes the amino acid in the end. I suppose the one thing there is that evolution is ultimately selecting for the amino acid, not the codon sequence. So you can imagine, I guess there's a mark of chain maybe those two steps really just collapse into one what the overall effect is on the mutation rates. I think I've read some stuff that actual codons can affect some things biologically, like it can affect transcription rates I think and like how fast the Vibesome works, but those might be quite subtle effects, I'm not really sure. I think, for example, if I remember like parts of the, these mRNA vaccines that have been produced, some of the optimization there was caused code on optimization. But those are probably details that don't matter if you just wanna actually get the structure not sort of work out how it's being produced by the Vibesome. Thanks very much. So I'm gonna read the two questions in the chat so that they are part of the recording. So the first one is from Greg Collier, who's asking you, can you compare the output against a local physics model where the large discrepancies may indicate that the environment has a significant effect? For example, when the in vivo structure worked on by evolution, may die first significantly from the isolysis structure or crystal structure. I think the problem here, this is kind of allude to when, perhaps I'm minimizing the amber, like the force fields is that local physics models are less accurate than our fold. So we have actually found that where there are disagreements, and it's actually even happens at the experimental level that sometimes when there are disagreements between our fold and experimental structure, it is at the level of this could be a crystal artifact in the experimental structure. So for other people, like in a crystal, we have a lattice of proteins and these proteins are artificially packed up against each other. And so like a loop in one protein might be folded down to avoid contact with another protein. And that would be an artifact that was just due to how it crystallized all of this particular protein. So yeah, I think the answer would be that we can't use local physics models because physics models are generally worse than our fold, but the discrepancies do happen with experiments which are sometimes in our fold favor due to these in vivo sort of differences, like solution state versus crystal states and all that kind of stuff. And if you look at say NMR, NMR isn't usually solution states and even there you can get quite, you can get differences. Actually recently, there've been some papers that have found that our fold structures are usually more the better fits for the NMR data than the sold NMR structure. So we're kind of getting to that range where like we're so accurate sometimes the experiments wrong, which I think is a great place to be as a theorist. Yeah, I guess I was wondering whether if you can eliminate the discrepancies that are not crystal artifacts, then might that tell you what the active part of the protein is? Because if you know there's something else there and you know that it's not another part of the same protein that's coming from the crystal, then it must be something else that's there in vivo maybe. Yeah, I can see what you mean. Yes, I suppose one example here would be that sometimes, so I mentioned our folk can often impute extra missing context. So for example, you often get these transporter proteins which have a pore in them and often turns these pores have like a drug molecule jammed into them and that will force the protein to be in a certain conformation. So this is the first protein in Caspaxia you could either be like this or like that and which one it was. It turns out that actually it was in this position and there was this big molecule down the center of it that we hadn't been told about. And yes, and similarly there will be other things like so in a homo or several other proteins. If you look at the structure, it's clearly non-physical because you have these like say large greasy parts of the protein which are apparently corn to powerful just open to water, but actually you realize our fault realized it's a dimer and actually repairing up with itself to not be exposed. So at that sort of level, I can imagine you could use physical models to try and see if there's a discrepancy, but there's still the problem like amnecodynamics is too expensive to run. Like how would you, you basically have to run amnecodynamics on the protein long enough to show that it predicts something different. At that point, you might sort of think that maybe you should have just run, if you could afford that, you should just run amnecodynamics in the first place. Could you just do like an equilibrium calculation? You don't need the dynamics because you've already got to the end point, does it work? Equilibrium with, you don't know the free energy function of these proteins. So it's hard to just jump straight to the answer. I mean, that's kind of what our fault is trying to do is jump straight to the equilibrium answer. Whereas like a dynamics is trying to sample equilibrium with the Markov chain. Yeah, thanks. It's an interesting question. I think that there is potential therefore looking for where is the discrepancy because our fault isn't fully physical because it's learning from statistics of the data. Cheers. Right, thank you. Kyle Duffy has three related questions for you. The first one is how did you settle on the number of everformal layers? The second one is if you observed any trends in accuracy with input sequence length and the last one is following the high level understanding of stacking everformals as reasoning about structure. Do you think deepening the model on more complex inputs, effectively allowing to reason longer would significantly affect performance? The number of everformal layers, I think it ended up being trade off between when you're training machine learning models what you want is a turnaround time where it doesn't take forever to train. And also there's memory limits so we're training on TPU. So you're trying to fit the even of rematerialization you're trying to fit these things into memory. I think we found that possibly going up 64 layers didn't give any noticeable increase in accuracy at the exchange of much slower training. So I think it was roughly on those lines or that recycling gave a better improvements by effective increase in depth while being faster. Trends in accuracy input sequence length is a problem here is that the accuracy measures themselves are a bit sequence length dependent. So you imagine for longer proteins they're more likely to be a bit flexible. And so the accuracy over longer distances is gonna be lower. But there are some ones which are more like local estimates of accuracy like what's called LDT. And I don't think we really see a huge difference in accuracy of longer sequences. Maybe it goes down a bit. Possibly our papers will have a plot somewhere of accuracy versus length. Probably the biggest effect thing that affects accuracy is the number of aligned sequences. So I think we found so long as you've got about 30 aligned sequences, we can generally give you a very good prediction. Probably also the big difference of long sequences is that there's one thing with long sequences as you get above a certain length proteins tend to be built out of what are called domains. And these domains basically form compact structures themselves. They kind of, one of the definitions is they fold independently. And that means that they don't have that much effect on each other structure. So they're large and if you fold one accurately, that sort of, you can be accurate no matter the length of the protein. The main challenge is in terms of like memory, like our attention scales cubically with length, which isn't so great for speed of really huge proteins. But we find we can fold up to about 2,700 proteins. I think we've had previously during CASP, we mentioned it in our CASP paper that we found that the really long protein, that 2,300 residue protein sort of had something what we kind of call long chain collapse where it seemed that for longer proteins it got less accurate. But we fixed that by just fine tuning our folds with slightly longer crops, the protein at training time. And that seemed to correct for whatever misapprehension it had. You might imagine it hasn't seen the position coding up far or it hasn't really, it's a bit out of distribution. I wouldn't be surprised if it had even longer protein lengths through our evidences of long range collapse where it's a bit out of distribution for what it's been trained on. So the other question, deepening the model more complex inputs, not to be sure even by more complex inputs, I'm deepening networks I think should generally just be on the raw data. But yeah, I think better architecture like still trying to improve and refine the information passing in it, possibly who knows sparsity or something like that could help. And generally we know it in deep learning that deeper is better, stack more layers all that kind of thing. And I would think given the model more ability to think about it and reason for longer would possibly increase performance. Though as I said, we're starting to at least for single chain prediction bump up against the experimental accuracy really. So it might not be huge amounts of extra performance gap to get who knows, I probably will regret those words at some point. All right, very interesting. Thank you for your details answers. We still have two other questions and I was wondering if you have some of the time to answer this question. You're short, okay, perfect. Thank you. It seems like your talk generates a lot of interest. One of the questions from Maya was what are some other scientific problems that you think can be solved with AI in similar methods to alpha fold? And I'd like to ask you if you think any advancement in our field of astrophysics could be something you studied? Yeah, I think so. I mean, the similar methods to alpha fold, I suppose one way of thinking that is things with lots of data. So protein folding is admittedly like a area like very rich set of experimental data which has been created for a long time. We know it's useful. I mean, hundreds of thousands of samples isn't actually that huge in the grand scheme of things. So certainly like astrophysics is an interesting one. I know there's some papers from the Flatiron Institute where they, is it Sherby Who? Who where they say run large scale universe simulations with different say cosmological like lambda CDM parameters and then try to, you know, simulate observations and then try to back out of those to predict the lambda CDM parameters from the observations which is quite very interesting. So obviously lots of things in biology where just sort of better and faster image processing and passing recognition would allow you to probably handle bigger and bigger data sets. There's, I think probably like a quite big thing which I feel hasn't really been quite correct yet is simulations or how to speed up simulations. So quite often when people talk about say your networks speed up simulations, I think you often find that, you know they can simulate 64 times fewer steps but at the cost of 256 times more compute. So you kind of think, okay we haven't really managed to amortize the cost of the simulations in your network. But that's one of the things where I think with, you know, if you work it out right, maybe some of these big cost of simulations could be amortized by a neural network in some way. I think Jennifer mentioned climate issues that could help speed up quite climate simulation or something like that. Though I suspect the real climate issue is probably people and politicians rather than simulations. And yeah, I think anything which involves lots of data and lots of compute can ultimately benefit because another way I think about neural networks is they help us to like compress data and find often find shortcuts for how to extract information and process that information. Interesting, thank you. And our last question is rather technical from Nelly Kost. She starts by thanking you for the interesting talk and I think we can all join in that. And she's asking you, if the distribution of the equilibrium structure given the MSA is now encoded in the train AF network, do you see a possibility to invest a structure to receive a distribution of a sequence in which specific amino acids and their positions are weighed by importance for that structure? I just want to think about it. And this is one interesting point actually before we say that when you vary the, sometimes when you fold one protein several times at our fold, it does produce some slightly different structures and sometimes you can actually produce like flip into a different mode. So it does seem like our fold does sometimes know about different modes of the distribution, but just we haven't really worked out the right way to ask it to give us all the modes at once. But in terms of inverting the structure, so I guess this is kind of getting to like protein design. So people do have done work which is affecting this. You take a protein structure and then make a neural network which they with a an auto-aggressive language model something generates a sequence, I mean, amino acid sequence in which you then sample to get a generative model sequence, which would perhaps fold up to give that protein. And some obviously you have to make sure it's not a trivial problem because if I knew exactly where every atom is in a protein, I could just read off exactly what those amino acids are. And you also get additional things where like even if you say sense of the side chains you can only see the backbone. I can still kind of measure the volumes of every round every residue and use that to work out what the amino acid is. So there's lots of ways to trick the problem of going from structure to sequence. Probably the most useful way actually is people generalize it to kind of like topology. So you say like, I want three helices with this topology and then try and sense for all the other details the actual structure. But generally that's kind of called protein sign. And probably the main way people actually do it is they just randomly change the input sequence to a model like our fold, look at the output structure and then say, do some sort of metric comparing the output structure to the structure they want and then do a sort of, you know, similar to annealing where you just randomly replace you mutate the input sequence until you get a structure that fault is similar to the one you want. There's lots of ways to do that. And I think protein design is a unsolved problem right now. Probably because protein design requires you to have a neural network which probably knows more about protein physics in our folders. Like it's probably more sensitive to the actual conformational distribution and more sensitive to mutations in our folders right now. Thanks. Thank you, indeed. And thank you for your talk. It was a fascinating experience to jump into our full view and thank you for the time you took to answer so many questions. Brilliant, thank you very much. So I would say I'm probably gonna be in Oxford in a month's time and 25th of February. So if anyone's interested in meeting, I can probably, I don't know about timings yet but I think I might be able to. So email the TFGG at deepmind.com if you're, or I want to, and I'll get back to you about where I can fit you in. Very interesting. I'm sure you're gonna have a few emails. Well, thank you everyone for joining and thank you again, Tim for presenting this very interesting work. Thank you very much. Thank you very much. Have a good afternoon, everyone. Bye, everyone.