 So we'll continue on the topic on the sans data analysis and I will talk about the software which is called access and also the other methods and then access is the software primarily developed for the biological samples or analyzing biological small-angle scattering data. So there will be quite bio-component in this presentation. But I think it would be also quite useful just to listen from the different perspective because some of these tools may be used directly to the other systems. And also the I think overall methods might be just interesting from the sort of theoretical or methodological perspective. So let's move on with this. So what is the kind of the primary motivation of giving this talk? So we'll be using SASFI and already some of you did. We'll use it for the practical session tomorrow. But SASFI obviously doesn't cover everything that we need for the small-angle scattering data analysis. So for example there is the concept of using the ab initio modeling of explaining the data. I will talk more about this later. So there is also possibility of using some components building blocks that are assumed to be a rigid body blocks but then they can for example freely move in the space. So therefore that's also not really covered with SASFI. And we discussed last time this size polydispersity and I showed you very briefly how one can account for this in SASFI but shape and conformational polydispersity is not covered there. So that's something that we'll also briefly mention today during this talk. And SASFI has the possibility to calculate the intensity from the PDB file. PDB stands for the protein data bank, which is the archive of the protein, mostly proteins but also nucleic acids and the other bio and non-necessarily bio-related molecules that are written in this format and if you may remember from the previous presentation one can calculate the intensity using this debate formula or some variant of it. And this is possible to be done with SASFI but it's not very efficient and therefore there are other software tools to be used with this respect. And also the molecular dynamic simulations, this is something that it's beyond the scope of SASFI and so the molecular dynamic obviously gives us trajectory but the other question is how to analyze this with respect to the small angles gathering data, for example if you want to fit some dynamic system to this kind of data. And there is also for example software for the inferring the 3D electron density directly from the small angles gathering data in order to represent also what the distribution of the sample looks like directly from the small angles gathering data and that's not covered. And then many more of course this is just a short list but if you are interested in kind of getting the software overview for the small angles gathering for the data analysis but also for the process which is called the data reduction, this is the step when you reduce the detector data into this I of Q curves, this software is very nicely collected into sort of one on the one web page that it's referred here, that it's run by adventveni. So this first four functionalities that are not covered by SASFI are available from the software called ATSAS that I will mostly talk about today, molecular dynamics and this link to the small angles gathering is covered by something called SASSI and also others but this is one of the examples, there's 3D particle data from densities from small angles gathering data that there is a dedicated software called dense. So if you are interested in these two other aspects rather than ATSAS then you can quite easily find them on web. So ATSAS that there have been a few version of this software package currently is the version 3.0 and that's essentially collection of the different programs now it's the over 90 programs that are available from this package and it's working with the major operating software systems and it's free for academic users and you can download it at this web page. Also at this web page when you go to this download page you should see the link to the documentation and other descriptions there is a very nice tutorial given by Alex Kinkley if I pronounce his last name correctly and he is kind of running for all the SASFI functionality in 24 minutes as he claims it's not very elaborated when it comes to the different tools but I mean it can give you kind of overview what is available from the software package. So what I will be doing I will try today at this presentation I will try to go a little more into the details of this different software tools. So the kind of overall overview of this ATSAS environment it's presented here and we have kind of three four different steps that can be used in the data analysis. So we have something which in ATSAS is referred as a primer processing that's essentially the getting the experimental data and using different operations on them directly like averaging subtracting buffers and this essentially the basic processing steps in order to get what they call regularized or reduced data and then with the goal of sort of obtaining initial structural parameters like radius of direction or molecular mass. Then there is the whole repertoire of tools for the performing this app initial modeling also possibility of comparing it with the electron microscopy electron density maps then there is the there are tools for the for the rigid body modeling as I mentioned and also accounting for this polydispersity by different normal mode analysis and the and flexibility team and also there are some utilities for the kind of gluing all this together and I will go through this different steps starting off from this primary processing. So that as I said the kind of idea for this is that we have a data sets for the sample and buffer then we do this averaging of frames and the and then subtract the background or buffer from there from the sample what it's then typically done it's the we perform analysis in order to get this basic structure of parameters and I believe Adrian explained the gear analysis to you and one of the previous lectures and then yeah then it depends a little bit on the on the sample one can do some additional corrections to the to the data but generally speaking the goal from this gear analysis is that we have the have a have input data for the for the next step which here is stated this IFD which is the inverse Fourier transform or indirect Fourier transform that allows you to for the calculation of the per distance distribution function which on the other hand can give information about the maximum dimensions of the sample and I will also explain this in more details later. So the and then kind of the next step for this particular if you have the structure available then it's comparing it with the experimental data and there is a software called Chrysler or Chrysler for neutrons and again I will talk more in details about it later. So this guignet approximation can be performed from this from this application called Primus and you can see how the graphical user interface looks for for this and so again the kind of the goal is to get the estimate of the RG from here and there are different plots that one can obtain one can account for the different shape of the particle and it also gives you some information about the overall quality of the data. Then the the other parameter that we also want to extract from the from the data early on is this molecular weight that's as I said primarily important for that for the biological samples and also for example gives you indication if you have the monodisperus or the polydisperus sample so if you have a mixture of the let's say monomers and dimers so these are different what we call oligomeric states that can co-exist in the in the sample then molecular mass should give you an idea what what if you have a for example monomer because then that would corresponds to the to the sort of single molecular weight or if you have the or if you have more species in this and there are I think four different methods that primus runs automatically for you and then the one that it's presented here is referring to the something that it's using the Bayesian influence and that's kind of the using some prior information from the other structures in order to get the and the estimate of the of the molecular weight and also something which is called the credibility interval which essentially gives you the error estimate of this mass and once we are happy about the our initial estimation of the structural parameters the next step is to do this p of r inversion that's the calculation of the per distance distribution function that there is the function that corresponds to the maximum dimension or the or we can learn from this function maximum dimension of the objects under investigation but also the shape of this p of r differs depending on the on the objects that we measure and so therefore for the for the cylinder it will look different than for the sphere and that's the the sort of basic principle behind this is that the intensity scattering intensity can be explained using this p of r but then we kind of have the sort of inverse relationship with the with the p of r and therefore if we so we can calculate sort of back and forth we can calculate the intensity from the p of r but also p of r from the intensity and the and the reason why I'm mentioning this is that it's sort of the calculation of the intensity is the check of the p of r I will show in the next slide what I what I exactly mean so the there are essentially two methods that are sort of typically used for this the one been developed by Glatter the other by Moore that's maybe not the most important thing but they both of them take some parameters in order to infer the p of r function so SAS view it's also using the or enabling the p of r calculation and that's the using the i of t method developed by Moore and the sort of the what one the the way that it's calculated you one takes the one assume that sort of the shape of the base functions and based on this the weighted sum of them gives the estimate of the p of r and then from from from there with the sum coefficients one can calculate this intensity and then this this is minimized with some regularization term so the how that works in practice so we have this interface in SAS that allows the that one needs to supply the number of these terms the number of this base functions and some value for this regularization constant and then the one can generate the p of r function from from from the data directly I will also try to live demo it after this slide actually so just for the reference the access is kind of using conceptually something similar it's also the using the sum estimate of the p of r with the with the regularization term or the regularization parameter and then it's here so I mean it's really like in the details that that that these methods differ but the just to give you some illustration of how does it work with this software called GNOME which is used for the p of r calculation in SAS view we when we have here the prediction of the p of r and the fit to the to the i of q data then one can sort of by setting not not sufficiently large enough the d max and then the shape that we get it's a kind of quite bumpy here and that's usually indication that something it's been wrong with this function so what you usually expect is this kind of smooth decay at the high q region and and that's that's usually something that you do by playing with the with with the maximum dimension value that in both methods either SAS view or to GNOME has to be estimated or the input by user that's that's what I mean so just before moving to the ab initio models let me just start off SAS view and again if you if you feel like please follow this with me so we also check if if you're well set up for the for the tomorrow session right so you if you have a SAS running you should see something like this it will differ slightly on the the on your system and now I see that I have some other data running here before so you should be referring directly to the test folders so sorry I will just very quickly go there where you should be also be at right so I presume you should be seeing something like this now or maybe one level up but if you pick the for example this file which stands for the for the sphere with the radius of 80 angstrom we what we can then do you can go to this inversion perspective what we call in SAS view so the by default is the is the fitting which is by far the most commonly used functionality of SAS view but then we can go to the inversion and then we do the send to and then as you can see our data file was loaded and now we have this number of terms and SAS gives you the suggestions for this what this can be let's go with this to be honest it's sometimes going a bit crazy about these values so sometimes these are good values sometimes not but it's usually quite easy to actually spot if there is any problem with this so let's calculate this for the for the for the sphere of the 140 angstrom and the maximum dimension so that's as I said it's 80 angstrom the radius so the so what we should expect as the maximum dimension is the is the 160 right the angstrom so as you can see here and it may be not the best illustration because there's been quite a shrink on the on by using this screen but the again the kind of the you can see the bump and then kind of the rapid decay in the high q region or not the high q but it's actually on the d max region and the so what I will do now I will recalculate it for the actual value and as you should be able to see the and that's it's been decreasing here much more slowly which indicates that that's that's probably correct maximum dimension of the of the objects that we are studying so that's that's about SAS q and now I'll come back to the discussion about the ab initio models right excuse me okay so the as I said that the other kind of important concept for the for this access is to use this ab initio models and what exactly do we mean here about the ab initio models and I guess for you coming from the from the other fields or the or being familiar with the with the other modeling that's that might be that might be a little bit misleading term but in principle that here the ab initio means to get the estimate of the volume directly from the experimental data of the volume that it's occupied by the molecule so the it doesn't really have anything to do with the modeling from the first principle first physical chemical principles as for the some methods you may be more familiar with so how does this ab initio models work we first of all have to choose the suitable range for the data that it will be applicable for this for this model and then there is a in this access package program called Shannon then we have this per distance distribution function that can be calculated using GNOME and that's the starting point for the for generating this ab initio models also sometimes called as a as a dummy bit and the the way that it works we start quite I would say aggressive simulated annealing procedure in order to get the feed to the data and then we get the ensemble of models and then in the next step and one cluster them together irrespectively to the data and then you select the kind of the one model that would explain the experimental data best and there are also some potential statistical checks for these models because I mean the one of the problem with this is that you really need to kind of have a good quality of data but also be careful in order not to overfeed to the to the experimental data because those models generated with the with at this stage using the simulated annealing procedure they are using quite a lot of parameters and always feed well to the data so to say so one needs to be careful when performing these calculations and there are a few different possibilities of generating these models so this there is a software called Kasper or it's one of the programs in this access package called Kasper that it's using the representations of the amino acid residues that proteins are typically built from and it's kind of using this concept of the dummy residues and by this the kind of the occupied volume can be modeled but again this is this has nothing to do with the actual protein chain tracing it's just the sort of collection of the dummy dummy models and dummy bits that the occupied space and the that's the it's using the kind of the same principle with the simulated annealing and by finding this coordinates for the for the given number of dummy residues and for each of this there is the we calculated this intensity that can be compared with the data and being input for the simulated annealing algorithm and again the this bit corresponds to the different amino acids so one of this is so each of these 20 different amino acids it's represented by by just a single fear so that's the very coarse grain representation of the protein that we have. There is a variant of Kasper which is called Kasper MX which is used for the extension of the uses extension to model the equilibrium of mixtures so that's along these lines of the polydispersity when for example we have this sample that that contains data for the equilibrium between the monomer and the tetramer as it's demonstrated here and the and the Kasper MX is using this taking also and interconnectivity between these different parts of the oligomer into account the the sort of next variant of this ab initio models is using this the concept of the dummy atoms that's again not real atom but it's the placeholder in the in the space and that's essentially the there's something that we this dummy atom is allows us to calculate the scattering pattern from from it and the it can be both applied to the to the solvent or particle there are no there is sort of no limit on the on the number of these atoms so like in previously we with Kasper we were modeling the exact number of the of the amino acid residues in the protein because we know how many are done here it's not really the case it's we let algorithm really sort of the the derived the representation so this is how it looks when it's at the beginning of the simulated annealing as you can see we have essentially the above the dummy atoms for the solvent and particle and the the the blue curve is the experimental data that we want to fit to the red one is the one calculated for the overall space and as you can see the fit is not very good so by running the simulated annealing essentially the the goal is that we want to get the shape of the particle with the respect to the solvent at the same time fitting to the data and the just on the kind of technical note about this this software which by the way it's called either dummy or damif there are slight variation between the two of these that I will just explain later but the the scattering intensity for this dummy atoms is computed using the concept of the spherical harmonics which is the simplified yet faster implementation of the of the of the calculations so by this one can calculate the scattering pattern efficiently and there are some penalties introduced in order to make sure that the model is compact it's not to lose and disconnected so so there are a few terms applied to the simulated annealing algorithm to make sure that we get the shape of the molecule as expected for the for the biomolecules there is also variant of this dummy atom modeling software which is particularly useful for uh sands data uh because it allows the taking into account the conference variation which is the the multi-phase version of it which is called monza and essentially the it allows by using the different curves from the different contrast to build this ab initio model by assuming that each of these particular parts of the model are the same in this space so they they don't move over the the contrast variation experiment so in in this sense we cannot account for the dynamics here it's it's more for the static samples nevertheless it's a can give the idea of the of the composition of the complex and that we and that we study using the contrast variation so just to sum up this dummy or ab initio modeling that's the there are three sort of main programs to run this the one it's called dammin as i said dammin and monza and the and then this dammin and dammin works for the single phase so it cannot be really applied for the for example for the contrast variation where the monza can model up to the model complex is up to four different components and the they are all producing models which are of the low resolution and they have a different search volume procedures the constraints are essentially the same applied to all of this there is a difference with performance dammin is by no means fastest but again the results are not representing atomic models it's in just the field volume and the just as i've been referring to sans that's the that's what can be applied and with the with the different packages so gas bar it doesn't really contain the form factors or atomic form factors for the for the sans therefore it cannot be applied and dammin and dammin when we have the case that we just want to model from the single curve and not contrast variation in principle can be applied but also the kind of it requires for the good data so usually the sex data is the is more appropriate for the for this algorithm and monza in principle can be applied for the sex and sans and with sans with disadvantage of the conference variation so just to give you an idea of how this in practice may work like so this is the the model for the RNA molecules that were obtained using this dammin and as you can see we can get the really like large library of all these models in the next step the what one can do it's use this the use the clustering techniques one can obtain the sort of centroid and also the spread of this eventually what people typically do is they take this obtained model and they compare with the with the with the existing and PDB structure this atomic structure available from the from the different experiment crystallographic experiment the as I said that the kind of important point here is that you really need to have a good input data in order to run it and efficiently and the and not really modeling the noise so it's very very easy to kind of work in this concept of the garbage in garbage out so I really recommend to be careful whenever you run this application models so we'll change topics a bit now and I will talk about them feeling high resolution or these atomistic models to SAS data so the kind of the main concept for this feeling is the is to use this debate formula or this different variants of it using for example spherical harmonics and sort of on the on the on the atomic level what we take into account is that the atomic scattering in vacuum scattering from the excluded volume but also the scattering from something which is called the hydration shell so the obviously when the molecules particular proteins when they are measured by its mangoes scattering they are not in the vacuum they are in the solvent and there is a certain interaction with the with whatever the surrounds the protein directly and therefore it's important to take into account this hybrid hydration shell and there are particular two programs from at SAS there is the one it's called Chrysal which is for calculating the intensity from the extra experiment and chrysan from nutrients so the kind of the overall goal of this is that the we can compare the atomistic structure high resolution structure with the experimental data by minimizing this term essentially the what Chrysal or chrysan does they take this atomic composition they calculate the intensity using the previous equation and then they also optimize the scaling factor between the experimental data and the and and the and the calculated intensity and then they usually report the the the chi-square statistics which takes also in the account the number of points that we use in this calculation the next thing that is also the next functionality that it's available from SAS view and it's also using high resolution models is this concept of the rigid and flexible modeling and it essentially works like this that if we have the if we have it for example two structures of two subunits and we can put them in the relative positions between each other and then by rotating them with the different degrees of freedom defined for example by this Euler angles for the rotation and the characterians for the for the translation we can move around this complex in order to fit to the data and this can be done with the software called SASref which is also particularly useful for the for the contract variation data that allows you to take multiple curves and simultaneously fit the complexes to to this data so by this we can get the overall distribution of the for example different domains or the different protein complexes fitted simultaneously to the data but again that requires the models of subunits we need to know what is exact composition of this but we can also introduce some extra additional constraints so for example if we know that the complex is symmetrical we can account for this or if we know that there is a certain distance between the molecules for example from the cross-linking experiment that this can be also introduced to the SASref the requirement of the of the complete structure is alleviated with the program called punch which combines this SASref concept with the with the ab initio modeling so what it allows is to position the domains independently between each other and the order with respect to each other I should rather say and the and model the missing parts using this ab initio models and again here we can use the multiple experimental data sets for the simultaneous fitting to them and in terms of the constraints there is a similar to what can be done with SASref the kind of ultimate combination of the of the different methods is to really use the bunch SASref in order to build the complexes so here we can freely model the different complexes and the into the large assemblies by accounting for the parts that are model ab initio and the and and having missing parts and so that's that's kind of the for the particular useful for building a large complexes so that brings me to the to the summary of adsas so adsas is a powerful toolbox to analyze small-angle scattering data primarily from biological macromolecules but in principle it can be applied to the to the different molecules to dummy models or the ab initio modeling can be used with some careful understanding of what the input data is and how much we can push it in terms of the obtaining the overall shape of the molecule and there are different ways of the comparing high-resolution structures atomic structure with the spying of scattering data by either using them as a as a building blocks or adding this ab initio modeling parts through the flexible modeling and the kind of the the the big advantage of of using adsas is that it's very user friendly and it's quite easy to get the result but it's always a bit of the danger here that the the answer not necessarily have to be correct so it's as I said it's very powerful package but one needs to be quite careful sometimes when using it and that's it what I wanted to present about adsas are there any questions that doesn't really seem to be the case and how how do you feel like before tomorrow's session or does it does ask you work for you have all you managed to to download it and started up at least I see some thumbs up but not from everyone so maybe maybe just very quickly well tomorrow we'll have the session that that that that will be using this fun learning so it's also both such and the access to fun learning that will be required for the for the tomorrow's sessions and and we'll go through the fun learning stuff particularly in the morning which for the for the section one and two but in the afternoon if you also have already some data collected that you want to get a help with analyzing them or you want to just basically discuss them please feel free and to to also bring them up the other option is to to play a little bit with adsas so if you are interested in it just please drop me a line I can direct you to the to the installation and guide you should you should be able to to download it from without any problem by setting up account and and there is a question in the chat if this is all right to to use the sex data that's that's totally fine it's applicable to or can be used both for the sex and son's data and I presume that the the problems that will be solving might be similar in son's so yes please please come up with with sex data too