 Okay, everyone, settle down. So good afternoon, everyone. So today we welcome Charles Shears again for a second lecture in a steamboat lecture. And today, Charles is gonna talk about the technology behind the exciting work he presented yesterday. I don't have to introduce Charles a second time, but we all know him really well from the software package reliant that we use a lot in our cryoEM processing. And I'm sure he will present a lot of stunning work going on in today's lecture. So without further ado, please welcome Charles for his second lecture. Thank you. Thanks, CJ. Yeah, so today I thought that I'd speak about some of the software developments that have happened in rely on and kind of give an overview of what has happened to the field in recent years, kind of why everybody wants to do cryoEM nowadays. So I thought that starts with this picture which was taken on our Polara microscope. These are, this is purified solution of ribosomes from the Ramakristen lab. And for those who aren't that familiar with cryoEM, we've put a small drop of purified protein solution on an experimental grid. We then take away excess liquid with a piece of tissue paper, basically, and then rapidly freeze it into liquid ethane. And hopefully in a carbon support film with punched holes in it, which are about one or two microns in diameter, the kind of surface tension of the water will kind of have spanned a very thin film of solution in which these ribosomes were tumbling around in random orientations, which then get frozen so rapidly in liquid ethane that the water doesn't have time to crystallize into hexagonal ice, but you get this vitreous solid state of water, which then does not diffract in the electron microscopes who we can do transmission electron microscopy imaging. So each of the rather grainy little black dots here now is a two-dimensional projection image of the scattering potential of the molecule, the Coulomb potential of the macromolecular complex of interest, and you have many of them in each field of view. Now, what we now want to do is we want to calculate a three-dimensional reconstruction of this scattering potential of the object. And in first instance, we're going to assume that all of these ribosomes are in an identical structural state, and all that matters is they're in different projection directions. Now, three-dimensional reconstruction is, I think, best understood in the Fourier space. So if we have a three-dimensional object, then of course we can take, in the computer, we can discretize this on a 3D grid and we can take a discrete Fourier transform, which is represented with this blobby thing. Here that's just a mathematical operation. We can use standard libraries to go very quickly, back and forth. It's the same information expressed in a different way. Now, inside the electron microscope, what we have, and that's kind of depicted here with these red arrows is kind of the electrons going through the three-dimensional object and then yielding a two-dimensional projection image. Now, the optics of the microscope isn't perfect, which gives rise to certain point-spread functions, et cetera, which I'll ignore here. For now, we're just going to assume it's just a normal two-dimensional projection. Now, a two-dimensional discrete image collected on a detector, for example, I could then do this similar two-dimensional Fourier transform, then it can go, again, back and forward. Now, there is this projection slice theorem that I'm going to need to do three-dimensional reconstruction, which says that this two-dimensional transform is actually a central section through the three-dimensional transform of the original 3D object. And the orientation of the slice, it goes through the origin of the Fourier transform and the orientation is orthogonal to the actual projection direction of the initial image being formed. So, sorry, well, that, let me go back. What that means is that if I can somehow orient all my two-dimensional projection images relative to a common three-dimensional framework, then what I will be doing, I'll be sampling the 3D space with many differently oriented 2D slices, which slowly will fill up the entirety of 3D Fourier space, and then doing a three-dimensional reconstruction is nothing more than doing the inverse transform in 3D, and I have my three-dimensional reconstruction. So that then raises the question, how do I get to know all the relative orientations of each of these individual 2D projection images? When all I have is this collection of many, perhaps thousands of these gray images with multiple copies of these molecules in it to start from. So to explain that, I'll briefly mention the projection matching algorithm. So let's assume we start with something that we had with some three-dimensional estimate with a little bit of information about our 3D reconstruction that we're after. And the algorithms have now gotten so good that in practice you don't need to have even a low kind of resolution blobby impression of estimate of what the 3D reconstruction looks like. You can just start from a featureless spherical ball and the whole thing still has quite good convergence properties. Now what we'll do, we'll start from this initial three-dimensional guess and in the computer we're now going to generate what we call a reference projection library. So in all possible 3D orientations, I'm going to calculate line integrals along these red arrows on the first slide and make reference projections, top view, side views, front views, et cetera, everything as finely sampled as I think would be necessary. Now for each of the boxed-out individual noisy images from that large micrograph images that I showed on the very first slide, we've identified all the particles and I'm now going to compare each of these little boxed-out particle images with all of the possible reference projections and I'm going to choose which one it resembles most. Now resemble most is kind of an intuitive concept if all that the program does will be some calculation. So for example, you can subtract one from the other and see when the difference between the reference projection and the actual experimental image is the smallest. And if I do that for all the now tens or hundreds of thousands, perhaps even millions of individual particle images, then some of them will be fit best as a top view and some as a front view and a side view and besides doing all the different rotations, I could also, perhaps I didn't pick it exactly in the center, so I can also translate them up and down a bit and find all their exact centers. Doing that for all the hundreds of thousands of images then assigns a three-dimensional rotation and perhaps an in-plane shift to each of these individual particles and now I'm going to just calculate their 2D Fourier transforms, put them all back in this three-dimensional Fourier space, do the inverse transform and that then gives me a reconstruction of the molecule that I'm after. Now you can show mathematically that if you do this, you're guaranteed to make the map better. So you will make an improvement starting from this map and that means that you can now repeat the whole process because we started out with a relatively featureless thing and in practice I told you, you could even start from just a featureless sphere and that means that the initial orientations, all these relative orientations, translations, rotations are perhaps not so good yet and we've made some mistakes but this map will be better than that map so we can iterate our way through and this now gets ever, ever better and will hopefully, and this now is not guaranteed, yield to the best possible reconstruction from the 2D image data. So that's kind of the basics of three-dimensional reconstruction from cryo-images. Now, my colleague at the LMB, Richard Henderson, already back in 1995, promised that you would be able to do this kind of analysis and then based on just physics of radiation damage, which is actually the limiting factor in all this. So the first image was very gray kind of featureless image and the main limitation is there that the electrons that are used for images will destroy the actual structure that I'm looking at through radiation damage so I'll get a breakage of bonds and so on and if you keep the beam on too long, some of you have seen you'll just evaporate the whole sample into the vacuum of your column. So the physics of radiation damages would kind of limit this. We need to use very few electrons in order to prevent or at least reduce radiation damage to an extent that we can still have a structure to look at and reconstruct. And it was Richard who kind of looked at this from a theoretical point of view and he said, you made calculations and predicted, you know, for 100,000 Dalton protein complex, you should be able to do atomic resolution and by this, Richard meant a reconstruction to enough detail in which you could then do de novo atomic modeling, that's probably around three-angstroms resolution. I'll speak to you about true atomic resolution a bit later on and he would say, oh, you need approximately 10,000 particles. Now I joined the field of cryo-M in 2003 and worked for quite a while in Madrid and during all those years, people thought really Richard was quite the optimist because the type of resolutions that people would get on complexes that were way bigger than 100 kilodoltons, people would look at megadolts and big ribosomes and get reconstructions in 2003, perhaps of 20-angstroms and then they kind of slowly got better and when I left, they were perhaps eight or nine or so angstroms, but it was nowhere near de novo atomic modeling for 100,000 Dalton proteins from 10,000 particle images. And that all changed quite suddenly in 2012, 2013, by what then people like Werner Kuhlbrand did a resolution revolution in cryo-electron microscopy. So just to illustrate this, all these black points were 21st century structures, 2000 and 2010 I think, and this is molecular weight here in megadolts and the type of resolutions that you would get in the EMD database, so the EMDP and you can see as you go towards larger complexes, resolutions would improve and rather than just having blobs for domains, perhaps for some of the larger ones you could see alpha helices as kind of tubular densities which you need about nine angstroms for. But for example, separation of two beta sheets and two beta strands in a beta sheet as I told you yesterday happens at 4.7 angstroms. There would be very few reconstructions in the EMDB where you could actually get to that level. So doing actual main chain tracing in the EMDB map would be virtually impossible for many of the proteins out there. And then in red are some of the structures that came out 2013, 2014 or so. So over a whole range of sizes you suddenly got to resolutions where you do see clear main chain connectivity, clear side chains, et cetera, and people could build in over atomic models. So what happened? What kind of underlight did this resolution revolution? A very important part was work actually also driven by Richard Henderson was development of new electron detectors. So until then people would use photographic film was the best detector until then, but of course not very convenient yet to kind of develop films and scan them and it was intrinsically very hard work and low throughput. And then people developed CCD detectors which some people loved. You could automate them. You know, this is early days and developments of automation started with these though. For example, Legend on by the character and the Potter groups at Scripps, for example, but CCD and film were still both not such good detectors. Film was the best one, but, and this is now DQE, think of it kind of how much of the signal, the fraction of the signal that I keep or that I keep in the detection process of the electrons. It's typically expressed as a function of the frequency which kind of makes it counterintuitive. But you know for film you kind of one out of three electrons if you would like to say would get detected correctly up to about half Nyquist and then by the time you get to Nyquist, so your highest resolution, you would only detect one out of 10 electrons correctly if you would like to say. So and then in green, red and blue were the same curves for three commercially available detectors by kind of 2013. We got prototypes around 2012 of the Falcon 2 by Thermo Fisher and it was Richard very much involved in development and collaboration with Thermo Fisher. But for example, here in the US, David Agart worked with the Gattane company and they did the K2 detector which counted individual electron events having quite a big effect at the lowest resolutions and then at the higher ones it was kind of a question which one was better. But suddenly we went from being able to detect one out of three electrons to at least half of all the electrons and at the lowest resolution even better than that using the counting detectors. So that meant that signal to noise ratios in the individual image had a big jump forward. And on the other hand was the introduction of better image processing methods and I'll discuss some of those in a bit. So I kind of like to explain the effect of detectors and kind of what you could do with it with images of these two, my favorite particles when they were still a lot smaller than they are nowadays. So this picture was taken on a very rare, bright sunny day in Cambridge, England and taking pictures of molecules in the electron microscope is very much unlike this because I've told you, molecules are extremely sensitive to the radiation that we use to image them and in order to prevent them kind of burning away we have to image under low dose conditions. So that would be the same as taking a picture of Jan and Matt under very low light conditions. So signal to noise kind of goes down tremendously and we get these very grainy pictures. But what happens in the microscope is even worse than this and what happens in the microscope actually, oh, sorry, then for example, just using a better detector where the camera is slightly more sensitive is a bit like using your, perhaps your old CCD like first, you know how you got these first digital cameras they were pretty poor detectors but now your iPhone has a beautiful CMOS detector which is very sensitive and taking pictures at night isn't actually that hard anymore and you could take a better picture where you could better see the images, the particles, sorry. But what really happens inside the microscope is more like this and to explain that better I'll switch the light back on so we can see what happens. So when the electron starts to hit the particle and you get radiolysis or lots of chemical bonds get broken and in the protein and the solvent around it and probably things like local formation of little gas bubbles, et cetera. What happens inside the sample is that motion starts to occur. This is the kind of by now famous beam induced motion which happens as a consequence of radiation damage all the particles start to move and then we're taking pictures of moving objects which necessarily get blurred. So much like your iPhone can now be put in burst mode and also these direct electron detectors came with functionality to write out very fast frames so we could record movies rather than individual long exposure frames. So rather than taking one picture like this you could then take multiple sharper pictures which kind of sampled the motion and how that looks for like, now is it the slow tweak, yeah. How does that then look for our ribosome image? So I'm just taking out one of these ribosomes. This is the average over a movie consisting of 16 frames actually and these are the individual movie frames. Now there are even worse images than the average because the dose that I put into the average images now spread over 16 individual frames so the signal to noise of each of those is even lower but somewhere hidden in these 16 noisy movie frames is a ribosome particle that is moving as I took that little movie in burst mode. So there is the potential now if I could align the ribosome's individual particles in these 16 movie frames all on top of each other to undo some of the motion and make a much sharper image which then will lead to a higher resolution reconstruction. So that's what we implemented and what we then what we saw and others for example Niko Grigoriav and I think Tim was involved in some of the early work as well had already seen that some pictures that we took in the microscope were actually very good and some other pictures were very bad and we didn't really understand why that was and there's been some progress made but still it's not completely clear. So here I include two of the images that my postdoc Xiao Chen Bai recorded. He's now at UT Southwestern in Dallas and he was very proud of this image because you can actually see the ribosomes quite well. There's all kind of little details. And he said, you know, that's a Xiao Chen Bai image which he would be proud of. And then I was quite keen to also put in the paper an image that Xiao Chen was not so proud of because you know here you can see the ribosomes are quite harder to see, the image is more fuzzy. And with little red and green circles connected with white lines are the motions of individual particles of the ribosome that we were able to detect and correct for exaggerated by, I think it's a factor of 25. So otherwise, you wouldn't be able to see it. So the motions aren't as big on the real image. They're 25 times exaggerated. But you can see that on this image which Xiao Chen didn't really like, the motions were much larger than on the left image. This is such a good image because somehow Xiao Chen managed to take a picture where the sample was not moving and this one they're moving around quite a bit more leading to blurring and resolution loss. So in the early days, if you would collect only individual images you could still get potentially high resolution images. You just have to throw away all the bad images and just keep the good images but that would be kind of wasteful. By being able to now follow motion, we can get back to sharp images by realigning the ribosomes as they move around. Another perhaps surprising observation is that here the particles move from green to red from left to right was at the bottom here. They move in the other the way around as if they're kind of rotating around some point in the middle or so where there's nothing really special to see. At least we don't really understand why that is. So being able to correct for it kind of saves this image but ultimately what's going on exactly inside we don't really know but also being able to monitor this now we had a tool to actually see when it happens and this then became the basis for all kind of developments to develop experimental grit supports. For example, by Chris Russo, a colleague of mine at the LMB to try and stop these motions by making more stable supports where all your images look like that and being able to follow what you're doing experimentally by looking at these movies was an important tool that he needed to be able to do that. Cool. So that's one part of the better image processing rather than having the still images with blurry particles we could make movies and then realign them and then average them. Now, all of that we did in our own software package Relaun which I mentioned yesterday and CJ just mentioned again today and you can download it from here. It's all open software so you can go in my code I'll come back at the end of my talk to make a few more comments about open software. If you do follow me on Twitter you'll see I'm quite passionate about this topic and if you find I rant about open software Accelerate Science then I hope by the end of the talk you understand a bit why it is. Now the introduction of Relaun was around the same time as the first prototypes of the direct electrons detectors became available and that has kind of made it not entirely clear in some cases where the advances came from but I thought I'd explain to you two fundamental aspects in which Relaun was different from the software that was available back then and one of them is this concept of maximum likelihood type refinement or as the mathematicians call it marginalization and marginalization comes from a concept with the mathematicians called we have these hidden parameters which kind of make it difficult to solve the problem but they're not really part of the observed data themselves and in this case these are these relative orientations of each of the individual particles. If Thermo Fisher would sell us a microscope for 10 million dollars rather than 7 million dollars and each particle came with a little label this is a top view and this is a side view then I would be out of a job because you were just reconstructed straight away right but the orientation of the particles is not known and we have to go through this projection matching cycle to be able to find them and what I told you is that we're going to compare each experimental image with all of these reference projections and then I'm going to assign the best possible orientation to each of the particles and do this three-dimensional reconstruction. Now in Relaun and also in software I wrote earlier in Madrid in XMIP this concept of maximum likelihood refinement we do not only assign the best possible orientation to each particle but we're going to use a statistical model of the data and of the noise of the signal and of the noise to calculate probabilities that it is a top view, a side view, etc. and you know if it looks a lot more than a top view than a side view then the probability for the top view will be a lot higher than for the side view but they're not necessarily 0% for the side view and 100% for the top view so that's kind of the difference between all the methods that were available until then and first XMIP and then Relaun was this concept of doing these if you like fuzzy orientational assignment based on a statistical model. But then the whole cyclical algorithm is still exactly the same only that each particle will go into the reconstruction in all possible orientations but all weighted according to these probability weights. Sounds expensive and definitely is more expensive than doing it only in one orientation. So for that we're going to need a statistical description of the images we have to define a likelihood function so I thought today I'll show you a little bit of math I did not discuss any yesterday we're going to assume in Fourier space that the noise on each of the Fourier pixels is Gaussian and it's independent between different pixels in Fourier space which is probably not too bad an assumption the Gaussianity of noise holds pretty well through independence we could have discussions over beer over but I think overall it is a reasonable description and because I could let the the power of the sorry the width of the Gaussian so the kind of the power of the noise I could vary that with spatial frequency in Fourier space that gives me then the the option to model what is called a color noise or pink noise or whatever the noise doesn't need to be white kind of I can have which means complete independence also in real space I could let go of that I can have a resolution dependent model for the power of the noise so I'll use sigma for that oh that went too quick so that based on these assumptions you can look for each pixel j in Fourier space you can you can make a Gaussian where this is the value of the Fourier pixel in the experimental image and then the CTF has to do with this point function which I will ignore and then this is the projection of the three-dimensional volume in Fourier space that just taking a two-dimensional slice out of that three-dimensional Fourier transform and then the Gaussian is as wide as I think the the the the amount of noise that I have in that in that spatial frequency cool now a second thing and this is where then Rilan was different at a fundamental level from my previous software that I wrote in in Madrid is an explicit regularization as well and regularization is based on the observation that the problem that we're trying to do is ill-posed so there is so much noise in the experimental images that you could have an infinite amount of three-dimensional reconstructions each of which is equally likely in the in with the with the experimental data as evidence and you can you can kind of imagine that you know there's all this very high resolution grainy noise in the experimental images that you could have many high resolution grainy reconstructions that are all kind of equally likely in terms of the data and so if you if you have an ill-posed problem which as a mathematician which you would like to do is to regularize this to include prior information such that together with the experimental data and the prior information there is only one unique solution and you can have a Bayesian view on this where you have a posterior the probability of my model being the correct one giving the data is is a multiplication of the likelihood function which is a probability of the data being correct giving observing giving the model times the prior the the probability of the model itself being correct divided by the probability of of taking the data in the first place now the probability of taking the data is kind of more dependent nowadays on how many how many dollars you still have left on on your grant but once you've taken the data it's kind of a given so that kind of goes out and you you can you can just optimize what is called a regularized likelihood function and that is what rely on stands for its regularized likelihood optimization so where we multiply the likelihood function that i've just shown you marginalizing over all the angles all the orientations by a prior on the model now the one million dollar question becomes what information do i have about the model it was p theta p the probability of the model but x was not part of it so what do i know about my reconstruction if i haven't if i don't take the data into account and let it please be something that i can can optimize over nicely in in a computer as well with some some nice algorithm so um what we came up with was well this has been done in in in parts of mathematics mathematics already so you can if we assume that not only the noise in the in the images is Gaussian and independent but also assume that this signal is Gaussian and independent now we kind of know the signal is definitely not independent on Fourier space because it lives on a limited domain in real space so there will have to be dependencies in Fourier space but let's ignore those for now and assume that the signals Gaussian could be okayish and and independent then and we now have a also a resolution dependent estimate of the power of the signal which i'll use a tau 4 and if that falls with resolution then what i'm actually saying that in real space my reconstruction needs to be smooth so reconstructions are very very rapidly changing white and black values kind of very high resolution noisy maps are unlikely even if i haven't measured any data that's the type of prior that goes in but if you do this way you can just express it in a similar Gaussian function you can then make this regularized likelihood function combining everything together which gives this algorithm and i won't go very much into the details of it but it it can estimate the the resolution dependent power of the noise from the images themselves iteratively it will estimate how much the power of the sigma is from the reconstruction that we do iteratively and it combines these estimate into what is called an optimal filter in in mass or a wiener filter if you like which then accounts for the correction for all these point spread function artifacts that i meant with the cdf but also does the best possible three-dimensional low pass filtered weighting of the reconstruction given the assumptions that i i've told you about independent Gaussian signal and noise and the great thing and that's what many of these Bayesian type of algorithms have in common there is no need for an expert user to tune all kind of buttons because they all get estimated from the data themselves so you have an algorithm which just learns by itself to do the best possible filtering and that became a very powerful tool because you know and then in 2014-15 lots of people came flocking into the field who were not used to to kind of tune all kind of parameters that existing programs needed to have and yet expert user tuning them because you had to keep overfitting and and build up of noise in your reference at bay reliant kind of did that for you automatically now that came at computational cost and reliant 1.0 to 1.4 you would basically need quite a big cpu cluster to run you know people would run it on 300 400 cpu type of nodes and and you know again some of the cryam labs had that but not you know many of the x-ray crystallographers flocking into the field definitely were not used to to use high-performance computing systems so it was really thanks to a collaboration with eric lindau who's at the sidelive lab in stockholm and his two brilliant student björn forstberg and dari kimanius who's currently a postdoc in my lab that they actually took our code and did an implementation that used nvidia gpus based on on the kuda software to then do all these calculations in a gpu accelerated manner and then you know times have gone down much further even even after that basically we came to something where you could just use a desktop box more or less to do processing more or less in the time that you that it took you to take data back down now currently probably we're collecting data around 20 or 30 or so times faster than we did back then so perhaps it's getting slightly as cute again but there was suddenly a whole jump in what you the type of calculations you could do and not only on high-performance computing software but on computers that you could just buy for 10 000 dollars and set up in your lab so what what became possible with all of that and one important thing that reliant has traditionally been very good is at is the separation of of mixtures of structures into the into into subsets of particles of of the same structure so many of the of the protein complexes we study are molecular machines so they use relative motions of of distinct parts of the machinery as an intrinsic part of their functioning so if you if you purify these complexes then very often in solution you they will have multiple of these functional states as a big mixture and then you know you end up with these images that you don't know which of the particles is which now this concept of marginalization you can then extend to class assignments where i do projection matching not against one reference but against give a user a specified number of references for example three and then you can assign probabilities that each particle belongs to this orientation to this class and that orientation to that class and and do again probability weighted reconstructions but now three at a time and you can show that that then tends to converge to solutions where all the particles from the same structure end up hopefully in the same class and you can then do three reconstructions simultaneously from the mixed data set and this is just an example just for for a ATP synthase kind of thing which rotates around then you can then separate out these three different rotational states from mixtures of the images now of course many of the of the protein complexes are not described by this type of heterogeneity where you have a user specified number of discrete classes and many complexes have a much more kind of flexible way of of you know where one domain flexes relative to another domain and or multiple ones and to be able to describe those we then introduce a algorithm called multi-body refinement and this is an example from a spliceosome particle and just a reconstruction from a whole data set of a few hundred thousand of spliceosome particles gave this map where the core region has pretty good details and the food is kind of okay too but the helicase is getting water rough here and and most of the density of the of this factor sf3b is is is absent or kind of very low thresholds and that is because the the sf3b the helicase they move independently from each other often with respect to the core and even the food kind of wobbles around a bit with respect to the core so you could now do 3d classification in a discrete number of states but if the food moves left and the arm moves right for one particle you know this movement can be completely independent from from this movement so you and you quickly end up with an explosion of the number of classes that you would need to be able to describe each of them so to be able to deal with that we introduced multi-body refinement and what you do what the user does the user provides based on on their expertise of the system and this is where now user control comes into play masks which are indicated here with these these colored outlines of the individual domains that the user assumes are now moving as individual relative rigid bodies relative to each other and that's an assumption that needs to hold for your complex in order for this method to work if that's not the case if they're not rigid body movements then multi-body refinement will not work very well but doing this for this splicism so we have masks around the the core in the food in the helicase and the sf3b domain these are just slices through the 3d reconstructions and at the top is the consensus map so you can see the the core was okay but here at the top the the helicase and the sf3b domains were rather fuzzy but now by masking out these individual domains and having iterative methods that subtracts the other domains and tries to refine only the relative orientation of each of these four domains you can get densities for each four of these domains which look better than in the consensus map the difference for the core is not very big because that was kind of stable anyway but the food does get better and especially the helicase domain here and to the largest extent the this sf3b factor gets a lot better this would be impossible to to build any model in but this now gets to a resolution where if you kind of have a crystal structure i think was available of individual parts then docking those in would be would be quite give you quite good model i moved too far away from okay so this is just a quantification of how much better the map then becomes so you go from very low local resolution estimates to pretty good ones and this these are the fsc curves cool now not only can you make the density of the individual domains better also for each particle you now have not just one orientation with respect to the consensus model but you have four orientations one for each of the individual bodies that you have have refined more or less independently so you now have particles where the food is is up and the arm is to the left and you have particles so what you can do is you can do a principal component analysis just on these orientations to see what is kind of the most abundant types of motions that are present in these complexes and you can then make movies where you take now these better reconstructed densities of the individual domains and move them along these principal components and then you can get kind of these kind of movies where the biologists can say oh this i like this type of movement and you know i think this means whatever and then i can only sit and listen and but for example the the helicase domain and the sf3b domain in this principal component which is the second one kind of move together so these domains do tend to move together there's another component where they move independently and both things happen in the data but this is a way of visualizing what kind of motions are present in these in these big complexes cool how are we doing the time yeah i have to hurry up but yeah cool so then in 2020 we started we collaborated with abhay kutcha who's at the thermo fishery factory in eindhoven and date mates at thermo fishery made some new hardware and i think you already have some of it here so they did a cold-fuel mission gun i'll come back to that in a second a new more stable energy filter i'll show you some data of that as well and the next generation of the direct electron detector the falcon four with even better dqes and faster frame rates etc so we work with abhay we sent him some samples apopharitan and the gallery scepter and then takanori nakana postdoc in my lab pros at all the movies that came off from this microscope now one thing i haven't told you about yet is that our electron microscopes the objective lens is actually not that good if you compare it to a optical microscope and it suffers quite badly from chromatic aberration chromatic aberration also happens in optical an optical microscopy or any just normal light lenses it means that light of different wavelengths get focused in different points so for example here that the the red light gets focused further away than the and then the blue light so rather than have a beautiful picture of a building this one would be taken with aberration corrected aberration chromatic lens with aberration chromatic aberration sorry so the same happens in the microscope it means that electrons which are slightly higher energy will be focused at the different point than electrons that are generated with a slightly lower energy now that wouldn't matter if the energy source that we had the field emission gun that generates the electrons would make only one wavelength like a laser type of electrons but that's not the type of sources that we have and what's what was present in all the cryo and still is in all the cryons as we have a dL and B is called an x-fag and there is an energy spread it's not that big actually if you think about it 0.7 electron volts over the you know that compared to the 300 000 electron volts that that is generated is still you know that's that's very narrow but the chromatic aberration of the objective lens combined with this spread leads to an envelope you saw a kind of a how much signal still survives as a function of spatial frequency that looks like here in blue meaning that at one angstrom's resolution if I have an x-fag just the chromatic aberration of the objective lens means that of the original signal I have less than 10% left now the cold fag is a different way it operates at much colder temperatures than the x-fag and they managed to reduce the energy spread of the beam to 0.3 electron volts and that then gives an envelope on the CTF which looks like this so even at one angstrom still more than 60% of the of the signal is still there so chromatic aberration of the objective lens becomes much less important and together with that filter that filters away inelastically scattered electrons much more stable than the filters available until then over many days you get just a few ev movement of the of the slits and that means you can make a much narrower slit and do this filtering away of inelastically scattered electrons better and all this combined with then the software which I already briefly refer to yesterday an important part here was this this optical aberration correction you still if the scope isn't perfectly aligned and if we now want to push towards that one angstrom regime then even very small errors in in the alignment of the microscope can completely kill the signal so there's lots of effects that go to the square of the cube with the spatial frequency so they become ever worse and just thank you wrote this software to be able to detect asymmetric and symmetrical aberrations basically phase shift resolution dependent positional in Fourier space dependent shifts of the Fourier components in the images you can measure this and then reshift them back at a post hoc and kind of correct the optical aberrations that were present in your data in the image processing and all those combined then led to a reconstruction of apopharitan to 1.2 angstroms resolution on data that were collected in I think a day and a half on one of these prototype microscope and at 1.2 angstroms resolution you know that's about the distance the shortest distance between atoms heavy atoms in a protein molecule so you can see individual blobs of of reconstructed density for almost all the atoms heavy atoms in the in the apopharitan structure and hydrogens are of course much lighter than the than the carbon nitrogen and oxygen atoms but what we found and this was worked together with Garib Murshadov and they've now software in several cats you can do easily if you build an atomic model that just consists of the heavy atoms you can do a difference map much like the x-ray crystallographers have done most of their lives and you can then visualize beautiful difference peaks for the hydrogen atoms as well and I think you know you could say oh why do I need to see an individual blob of density for each atom because I know a phenylalanine structure so I only need two and a half angstroms resolution that's fine but being able to visualize individual hydrogen atoms then makes it possible to visualize hydrogen bonding networks which if you do perhaps drug design and so on then these could be extremely important and we were especially this is my favorite part of the whole reconstruction we I was especially pleased we could see H2O here so this is this is one molecule of water with two hydrogen sticking off the side of it now we also put on some samples from Radu Arichesco at Noribology division at the LMB of the GAVA A receptor a homopentameric kind of model system that then went all the way to 1.7 angstrom you can see beautiful density for ligands and some sugars here and even at 1.7 angstroms we think you can see hydrogen atoms in different maps and you can see here this beautiful beta sheet with loads of the hydrogens being shown in the difference map cool before I wrap up just quick words on some trends in the field machine learning is picking up so deep convolutional neural networks you know the whole thing that the only not only has an impact on on alpha false structure prediction also in our field so picking heterogeneity, alanzone, cryo dragon etc and there's lots of other things coming up so in this in this light I think it's really really useful if you could all submit your data sets to the empire data sets not only your pdb models in your emdb reconstruction but just the raw movies so that people like us can download them and train big neural networks and make your life even easier in the future. Automation throughput on the flight processing I understand some of it already going on here as well lots of different solutions and there is a kind of a shift a lot of the excitement is now moving from single particle analysis to tomography, sub-tomogram averaging and a lot of progress is being made there and it's great to see you have your national facility of electron tomography here already in Madden. I promise you a few comments on open software Relaun did not come out of the blue I did the postdoc with José María Carrasso where I learned to program for CryoAM we did all our work in xMip and Relaun relied heavily on code written in xMip. José María did his postdoc with Joachim Frank got lots of input from spider so xMip has inputs has kind of a lot of influence from spider etc you know I think Relaun used code from bsoft all kind of libraries from from Bernard Hyman I didn't use code from Freeline but I did use some of the of the concepts and Fourier space reconstruction etc so it's definitely influenced a lot by the code of Freeline and for me it was satisfying to see that you know by keeping our software open source which is what all these packages have in common also Relaun then led to a new program in China by assuming Li I think at Thunder and then hopefully because Thunder is open source all the people can build on that so and then you know it's it's great also to see Freeline was of course made by Richard Henderson and then Niko Grigoriyev and then now there's kind of again another line in the tree of the family of Cryam software with system and and and it's great you guys have been able to attract Tim here because having the ability to do software development in-house you know that makes you so much more flexible with new problems come up to to be able to respond and have more flexible things than they just standard out of the box solutions available so open software sharing all these ideas I think has underlaid much of the rapid progress that the field has made in the past decades so free flow of ideas used to leads to efficient scientific progress and that kind of opposes then closed-source software and in the field is two examples iMagic was kind of in the times the same as spider and then because it's closed source not nothing develops further on it and now recently also CryoSpark commercial software and I probably it's fair to say that CryoSpark was heavily influenced by Relan it uses our regularized likelihood function used on different optimization function perhaps optimization algorithm but the the target is exactly the same as we introduced for Relan and they had a very good look at our GPU acceleration code which helped them make it faster too so but again this is closed-source so further developments are more difficult so commercialization you know what you can do as a software developer there you can have restrictive licenses and many academics do that too you can keep your source even closed and that also happens in academia you know that's that prevents people from building on your software and but I think a recently new thing is patterns for cryoImage processing that is new and that's only done in company so far it's very familiar concept for technological developments but I think for mathematical concepts for algorithms it's kind of relatively new also being able to patent algorithms but that's what you can do now and that starts to hurt because you know we do that's algorithm development is what we do and now there's lawyers telling us we can't do certain things so there's lawyers telling me I have to take parts out of Relan which is not so good so commercialization not always a bad thing I don't want to run too much about it and perhaps it's the way the world works it may even allow for more money into software development for cryoIm which makes for very nice click-to-click GUIs and being able to accelerate software a lot more because they can hire engineers that that we can't for example but ultimately I think the community thrives on diversity of software so a mix of commercial and academic is a good thing too but I think scientific progress should be protected and patterns I see as a threat there because in the end I don't want power of what academics can do in the research to be in the hands of lawyers and I feel we're at the danger of sliding in that direction so that brings me to my conclusion slide so let's keep it happy new detectors and software revolution I think you know being able to do different strike multiple structures for mixtures is really a key advantage of cryoIm it will have great you know going in the future you know I think we can push that much further flexible type of heterogeneity separation learn about protein dynamics achieve up to atomic resolution is now possible for april ferritin but we all work on things which are much more interesting than april ferritin how far can we push that and then ultimately I think we do have to think about protecting scientific progress and funders as well as users both have a role to play thank you very much thank you Charles for giving us such a informative lecture so don't rush to queue up in front of a creos yet we have questions for Charles anyone has any questions Erin it's a wonderful lecture I would say there may be some lawyers at Worf on campus whose ears like a cop fired during that last part but the question I had asked you with the noise modeling and I'm a little bit naive on cryoIm and so do you guys use emccd's or CMOS cameras and I'm kind of wondering what the future is there because on the type of imaging I do people traditionally use dmccd's but the noise modeling is completely different for a CMOS camera and so that's kind of led to some division in the software because you can't use some software with some cameras and vice versa yeah so as you go down in exposure you know you no longer will have Gaussian distributions you'll get to real Poissonian distributions from from real counting statistics so if you go in in the very low dose ranges per image frame which probably with some of the new detectors you definitely are in those regimes then Gaussian models might not be the best but Poissonian models are extremely difficult to handle computationally so I think probably that's not where our main limitations were live or for quite a long time because by adding multiple frames together quickly multiple Poissonians together become very much like a Gaussian so the kind of detectors influence on the noise models I think I'm not too worried about at this point do they use emccd's or are they using the CMOS? the CMOS type detectors yeah I'm sort of interested in your 1.2 angstrom structure why do you think you didn't get better than 1.2 angstroms do you think it is the microscope the Aper ferritin or the the processing oh I think we're probably quite limited by particle numbers so for example Dimitri Tegunov presented a one angstrom structure at the Golden Conference last autumn and he had two data sets to play with one was ours and then Hogarth Stark used a different type of microscope with a new kind of CS type corrector and a monochromator which is kind of hardware wise a lot more difficult perhaps to handle but what he did do he imaged for many many many days at the microscope and got a way bigger data set and he then published a resolution which was slightly lower than ours but Dimitri of course made a very good choice by looking at that data set because there was a lot more raw data and probably Hogarth didn't push you know Takanori is an absolute king of squeezing the most out of the data so and Hogarth I think did not do that so Dimitri probably just squeezed that data set pretty well dry so but to answer your question so amount of particles probably plays a role our beef actor is quite good so I don't think apopharity in itself the structure is is very much limiting it's quite quite a good rock at that in that aspect and anyone else have some questions for Charles hello well this is amazing in 1981 I saw a talk in Heidelberg with Jacques Dubochet and there's quite a lot of progress my question is I'm not in the field but I remember it talks about handiness you're like right hand left hand is that not an issue anymore oh it is so yeah from a 2d projection image just like if you stick your hand in the x-ray generator in the hospital from the x-ray image you will not be able to see whether it's your left or your right hands right because you can just flip the image around so you can reconstruct your molecule in in either way in the left or the right handed form and that and both of them are equally likely in in light of the data until you go to very high resolution and then small effects so evilsphere curvatures start to play up but what happens normally is you solve it in one hand and then you realize that all your alpha helices go the wrong way or and then you just flip the whole thing around but you can't do if you're too low resolution so if you're at eight or nine angles you might be having your structure in the wrong hand and then you'll need kind of external data to to to make sure you're in the right hand that makes sense thank you very much hey I had a question you mentioned a couple you mentioned two instances where you had human input model parameters for the number of classes and a mixed structure that you're trying to pull out and then the number of um positionally independent modules of a single structure I was wondering if you had explored automated algorithms for model selection on a either or a mixture of the two of those input parameters right yeah so what happens in the short answer is not really so what happens in practice is that you use we like to use quite a few more classes than we think there are different structures and then quite a few of them will go and become quite small with some particles that don't like to be with the rest because they're probably not so good particles and then it's still the user kind of has to interpret the three dimensional reconstructions of the more populated classes to see what they mean and perhaps some of them are actually the same structure should be combined for example or or the really distinct states for 2d class averaging and I haven't really spoken about it we did now do some automated selection of 2d class averaging which is often used to as an initial analysis tool to see how good the data is and to select the very best particles out of the data set so there we've we've done automated procedures also based on convolutional neural networks trained on big data sets that we've collected over the past few years to to aid the novice users to make those type of decisions or to be part of a completely fully automated pipeline but those do not yet exist for 3d so there we still have users kind of interpreting what's going on Thanks. Do you have a quick second question sorry in you you talked about reconstructing vectors of motion in the molecules and described it as a principal component analysis where you're taking out components of the positional information when you're doing a principal component analysis its most simplest form some of those vectors aren't necessarily all occupied at the same time like you can have the data can be correlated and where it is along that yes is that an issue that you oh so that's what we visualize in our analysis software so you can look at how all the particles are distributed along each of those components you can then even decide you know if that's not a monomono monomotive distribution but for example a bimodal then that would be an indication there's really discrete kind of heterogeneity happening and you can then separate your particles based on on the eigenvalue in that for that eigenvector thanks right i think so we have one one more final question from the online audience so i'm going to speak for john in he's asking how can we relate pca motions that you showed earlier to for example to compare it to molecular dynamics stimulation so how would you compare it to you know modeling let's compare pca motions these are kind of representations of what the distribution of all the particles in the data set look like now that probably means that you know the an individual particle could pass through those motions that's perhaps not entirely guaranteed but that's what i think one would would think to happen so you know currently it would be quite i think they they're a bit apart so you have these motions which happen in your cryo sample and and you have some some predictions made by molecular dynamic simulations what i hope will happen in the in the next five to ten years is that the analysis tools to look at protein dynamics by cryo because it is a single imaging single molecule imaging technique although the individual images are very noisy it is single molecule imaging so there is information about a lot of different conformational states and what i hope is with better analysis tools and then combined with deep learning methods you know alpha 4 2 etc deep convolutional networks are now very good at predicting protein structure they're probably not very good at understanding the dynamics of protein structure because they haven't had any data to train on but perhaps cryo i'm in the next decade can provide a wealth of data of much finer grained individual states of protein molecules and we can then train deep convolutional neural network to actually predict what happens if i have a protein structure you know if i perturb this in some way how will it react you know i bind something what will happen currently that's only the realm of experimental structural biology but one would hope that at some point you gain enough data and understanding hopefully to to be able to do that also in a computer and then we can all go on a long holiday okay thank you shawza everyone give shawza a round of applause thank you