 Can I start? Yes, please. So can I start? Yes. Yes. Okay. So, yeah, so welcome everybody. So it's a pleasure to have you all here. So my name is Angelo Rosa. I'm one of the co-organizer of this digital event. Welcome everyone. So just a few words about this event and the organization. So I'm part of the CISA people who organized this event and I'm also the coordinator of the local SECAM node here at CISA. This event is organized in collaboration with ICTP and tomorrow Ali Asanali will tell you a few words about ICTP. So just a few words about SECAM for people who don't know. So SECAM is a big European consortium who is putting together many institutions interested in computation, molecular simulations and is probably one of the biggest consortium in the world. Normally, as a node, we organize such an event each year as a school mostly for undergraduate people. We had the plan to do this year too in collaboration with ICTP. Unfortunately for this COVID crisis was not possible. So we decided to change the event a little bit and organize it as a group of talks online. So it's the first time for me actually to do these things. So please be patient if everything will not be perfect. So again, welcome everyone. We have a very nice list of very, very good people and very good talks. So for this first day, we will give the talk to the organizers. The idea is basically to do seminars which are a bit introductory for the most topical talks of the next days. So in such a way that people who are not familiar with many of the topics, because you have seen that the program is quite broad, would get used to the language and then can follow easily the most specific talks of the next days. I have two advertisements to do. So please switch off the videos in order to save a bandwidth. Okay, so that's good in order to avoid the technical problem. And also a more technical announcement. So the talks are recorded. So if there is someone that does not want to be recorded, just switch off the video and keep the video switch off during the session. So you are allowed to ask questions by raising the hand. And then I will pick you and tell you that you can ask questions. So I don't know for time being is all by my side. I don't know if Fali wants to add something. Yeah, maybe I'll also take this opportunity just to welcome everyone on behalf of ICTP. So I guess all of you know, so ICP is the Abdul Salam International Center for Theoretical Physics. We would have loved to have you all here in Trieste, but unfortunately, this is not going to happen. But you know, we're trying to make do with this virtual medium. We have ICP has a whole bunch of, you know, programs and specific mission to develop capacity, scientific capacity in developing world. I'm not going to go into details here in the beginning of this event. But for those of you who are interested, definitely get in touch with me and be happy to talk to you. So I think we're already running late. So I just hand over to Angelo as the host. Thanks, Fali. And so I think we can just move to Giovanni and to his talk. Okay, so we share screen, right? Okay, so can you see my screen? Okay, good. So good morning to everyone. So my name is Giovanni Bussi. So welcome to everyone. I'm also one of the organizers. And so, yes, as Angelo said, this is an experiment that you are doing. And it's also the first time that I co-organize a virtual event like this one. And actually, it's also the first time that I give a talk to a virtual event like this one. So I hope everything will be with work smoothly. So the idea of my talk is to give you a flavor of what you can do to reconstruct biomolecular dynamics using molecular modeling tools, some molecular dynamics and experimental data. And in particular, I will focus on RNA dynamics, because that's what my group is devoted to. And so at the very beginning of my talk, I will try to convince you, I mean, to tell you why we decided to work on RNA, so why RNA is cool. So if you look at the central dogma of molecular biology, what you learn is that your genome is made of DNA, which is also the information, and then it's copied, it's transcribed into RNA. And then RNA is used, the instructions written on RNA are used to build proteins, which are the real actors in the cell doing all the stuff. But actually, if you look at real numbers, this is only the fate of a very, very small fraction of the genome of complex organisms. And for mammals, in particular, is around 2% of the genome is coding for proteins. But the majority of the genome is coding for RNAs that have then other roles, other functions in the cell. And typically, they have regulatory roles. So there are a lot of RNA molecules whose role is not just to store temporarily genetic information, but they can do much more. And so it's very important to study them. So let me also give you a very short primer on RNA structure just to set up the language. So the RNA structure can be analyzed at three different levels. So you have the primary structure or 1D structure. People call it 1D because you can represent it in one dimension as a string of characters. It tells you the sequence of nucleotides composing the RNA polymer. Then you have secondary structure, which is the list of helices and loops, how they are arranged. Also, which are the nucleotides that are Watson-Crick-Paird and so on. And this is also called the two dimensional structure because it's easy to put it on a piece of paper. But then you know we are not living in flat land and in reality molecules can be arranged in complex three-dimensional structures. And that's what the tertiary structure of RNA is. Okay, then that's the structure. But what about the function? Clearly, the function is related on the structure, but depending on the specific RNA molecule, it could be related to the primary, to the secondary or to the tertiary structure. For instance, if you look at coding RNAs, those whose goal is just to store temporary genetic information, clearly the function is primarily linked to the 1D structure, so to the message stored on the RNA molecule. But if you look at non-coding RNA, the function could depend on the sequence alone, as it is in some case, or on the secondary structure, which basically tells which nucleotides are paired and which are not, so which of them are available for further pairing, or on the tertiary structure, which is the three-dimensional arrangement of the molecule. As I think it's easy to guess, so the three-dimensional shape is particularly important when the function is exerted by interacting with particles, because typically when two molecules interact, they need some kind of complementarity in shape. And in this case, in particular, also dynamics is fundamental, because typically molecules, when they interact each other, they have to rearrange a bit. So clearly for RNAs that are involved in functions, like binding to proteins, to ligands, or doing catalysis and so on, the three-dimensional structure and the three-dimensional dynamics is fundamental. And that's where molecular dynamics comes in. And so molecular dynamics is a really powerful tool that can be used to model, basically, the system of interest at an atomistic scale. So every atom is represented as a point in space. Then you have some function that tells you which is the energy of the system, how it depends on the position of all the atoms. And typically, this energy partly comes from chemical bonds, from angles, from torsional potentials, and so on. You have very, this is a very mature field where people have tried a lot of different forms. So to give you an idea of the kind of applications that we do are typically done using these empirical force fields, where you have interactions that are chemically motivated. You describe the system at atomistic details, so every atom is a point. And we typically include explicit water and ions, even if in many movies I don't show them for clarity. And unfortunately, these models do not take into account any polarization effect. So that means that the charge of each atom is decided a priori, the partial charge of each atom, and it has no way to adjust based on the context. And this is an important limitation. And if you want the extreme case of polarization is charge transfer or chemical reactivity. And that's also out of question with this kind of approach. Okay, so the good thing, that's the bad thing. The good thing is that these models are pretty fast and you can easily run several nanoseconds per day. I would say it depends on the machine that you have and how big is the molecule that you are studying. And as a tool, what we use is Gromax, which is a very popular and free molecular dynamics engine, and Plume, which is a tool to do enhanced sampling simulations that I'm also co-developing, and I will advertise a bit better later. So why do I like molecular dynamics? So I like it because it gives you access to very high resolution to your system with very high resolution and it's virtually unlimited. We can access basically to as short timescale as we like and also to very short length scales, so to the atomic scale. And in addition, we have access to dynamics and it's relatively cheap in the sense that these typically simulations can take a few days or a few weeks, depending on what you have to do. There are also drawbacks and in particular the total timescales that you can access are not as long as you would like in many cases and I will give you more details later on this. And then all the accuracy of your results is related to the fact that the model that you use is reliable. So I discussed already some of the limitations of these so-called force fields and clearly the better the force field, the more faithful will be the result. And then you have to remember that it's still, it's just a model, so it's a way to model your system to learn a lot of stuff, but the most important thing is to compare your results with the reality. And in this sense, I like to quote this sentence by Vijay Pandey that is, in science, as in life, it is very dangerous to fall in love with beautiful models. So we always have to validate our MD simulations against some experiment to have an idea if they really make sense or not. Okay, so the first problem is timescales. So which are the timescales that are relevant for RNA dynamics? So you will find around many pictures like this one in several reviews or papers. This is a picture which is customized for RNA dynamics and where you can see that if you look at very simple dynamics like rotation around bonds, that's easily accessible on the nanosecond timescale. If you want to look at base pair formation and opening, that's more challenging because it reaches the microsecond timescale. And there are other processes like binding of different cations or even switching between different to the structures that are totally out of reach for atomistic molecular dynamics, as you can see in the bar above. And that's why in order to study this kind of process, you really, you cannot just use plain molecular dynamics, but you have to complement it with the enhanced sampling methods. And as I told you for implementing enhanced sampling methods, what we use is a tool called Zoomed. Okay, then there was also an issue that I mentioned with force fields. So what's the state of the art for, so what's the typical level, how accurate are force fields that are used to describe RNA molecules? This is just a bunch of pictures from the literature. I will not go into the details, but you have the reference if you want. These are just studied that so from the group of Hirgis Ponder and Tom Cheetham on tetranucleotides and also from our group on tetradupes. I will just summarize this side is saying that the community is not very much satisfied with the current state of the art of RNA force fields. And there are many things that can be improved and should be improved. There have been recent improvements. I would mention a couple of relevant works, one from the David Shaw group and another from the Hirgis Ponder group published last year, which are improvements, but still I would say that it's really, it's not easy to start the project using molecular dynamics on RNA and be confident that you will be able to reproduce experiments. And so in this year, as soon as I realize this, I switch, I slowly switch it to instead of just comparing with experiments, say using experiments as restraints in my simulations, which is a nice way also to understand the more about how the experiments are done. So which type of experiments would I discuss now for RNA for, so in particularly interesting are experiments that allow you to access dynamics. So that take into account the fact that molecules are not like frozen in a single structure. And in particular, I would like to mention NMR. I'm not an expert of NMR, but to summarize very quickly what you can access with the nuclear magnetic resonance is like distances between atoms, between atoms labelled in some way or torsional angles or other geometric features. There are, there is a really a zoo of different techniques and people that are very specialized in each of them. The way to connect the formulas, the equations to connect the three dimensional structure of a molecule to an experimental signal are kind of established. So it's kind of easy to compare with your simulation with NMR or even to use NMR data as restraints. The only tricky part is related to the fact that NMR experiments are done in solution. This is technically a problem, but it's actually a very great feature because it allows you to access dynamics indeed. But the issue that comes in is that what you are measuring is not the property of a single copy of your molecule, but you are measuring some average over an ensemble of copies of your molecule. And so you have to keep that into account whenever you want to use NMR experiments as restraints for molecular dynamic simulations. Okay, so after this pretty long introduction, this is my optimistic agenda for today. I'm really happy to be interrupted. I'm not sure I will be able to see if people interrupt me, but I guess Angela will be able to see raise the hands and stop me if necessary. And I will be happy not to arrive to the end if I receive too many questions. So just stop me if there's something that it's not clear. So the first part of my talk, I want to show you how we combine experimental data in particular NMR with molecular dynamic simulations. And I will give you a bit of the theory first. And then I will show you how we use these to study a very important non-coding RNA system. And then in the second part, if I can arrive there, I will show you some different way of combining molecular dynamics and experiments. And here I would say molecular dynamics is done in a semi-quantitative way, just to give hints about which experiments to do. And then the real work is actually doing the experiment. And this is something that we published recently. And it's a nice work where we managed to understand a lot of things about RNA and DNA and zipping dynamics. So let's see if I can arrive to that part. Okay, so the first issue is with the average restraints. So this is something on which the community of molecular dynamic simulation has worked a lot in the past 10 or 15 years. So the issue here is that so you have a molecule that can explore a number of different configurations. And whenever you observe something that is an average over these configurations, what you will see in your experiment, in your signal, is like an average over all these conformations weighted with their probability to be observed at P. So that's what you are expected to compare with. And now the problem is that this is a highly undetermined problem. What do I mean? So if you have one observable, so if you measure the average distance between two atoms, there are a huge number of different possible probability distributions for that distance that are all of them compatible with the same experimental value. So in a trivial way, you can say if the average distance between two atoms is five, it could be that the distance is always five angstrom or it's half of the time four and half of the time six. So how can you choose the probability distribution that is more reasonable among the list of all of those that are compatible with experiment? So there is a kind of consensus in the community that the cleanest way to do this is to pick among all these distributions, the one with the largest possible entropy and actually to be more precise with the largest possible relative entropy, relative to a prior decided distribution, which would be the Boltzmann distribution associated with the force field that you are using for your simulation. So in other words, what you say is that your posterior probability is among all of those that agree with experiment that would give you an average equal to the one proposed by the experimenter is the one that is as close as possible to the distribution that you would obtain just running a straightforward MD simulation with your force field. Okay, so how can you do this formally? So if you maximize the entropy and you compute its derivative with respect to the probability and you can do some math and it's easy to show actually that the posterior should be proportional to the prior multiplied by an exponential factor which is e to the minus lambda times and this f is the quantity that you are measuring. So let's say you are measuring the average distance between two atoms. This would be e to the minus lambda times the distance between those two atoms. That's the same as saying that the potential energy should be augmented by a factor by a term that is linearly. It's linear in the quantity that you are observing with a pre-factor lambda and you have to choose this pre-factor and then you have to choose this pre-factor in such a way that the average over the posterior is equal to the experimental average. So this can be done iteratively. You can start with somebody of lambda and then compute the average, compare it with experiment and then adjust it a bit until you have agreement between simulation and experiment. So some more comment on this method, this maximal entropy method. So here in this plot I have like a histogram of some observed quantity and let's say that my force field gives me the red distribution with two peaks, one here and one here with two different weights and the average from the simulation will be some value here and then I go to the experimentaries and it tells me now look the average is here. So if you just add to your system harmonic spring keeping the distance equal to that value you will have just a single peak in the posterior distribution here. If you use max and instead what you will do you will rely on the peaks produced by the force field by the molecular dynamic simulation and just modulate them so increase the weight of one and decrease the weight of the other until the average becomes compatible with experiment. So from this simple example you can see that the amount of information that you retain from the original simulation is much much bigger with respect to just enforcing restraints in a simplest possible way. So you need very good starting point so very good prior distribution and then you need also a lot of data and the more data you have the typically the more reliable will be your result. So what I'm discussing here is max and but there are many closely related approaches in the literature so the group of Michele van Druskolo has used since many years approach based on replicas that is has been shown to be equivalent to this and there is a method from the group of Parinello that was born as an un-set-something method but can be casted also in a maximum entropy framework and experiment directed simulation from leg vault and there are also bahisian methods with very similar formulas and very very similar properties. Okay good so if you look at this maximum entropy based methods you have basically two manners to use them one is to enforce so to use your restraints during the simulation and here you have a problem you you might have got that you have to choose these lambda factors to run your simulation but you don't know the value of priority so you have to adjust these lambdas during the simulation and this is done typically with an iterative procedure that is not straightforward but this allows you to sample the correct conformations during your MD simulation or you can do it up posteriori with the reweighting schemes where what you do is you take your trajectory done with a possibly incorrect force field and then reassign weights to each of the snapshots so that you weight more the snapshots that are more correct let's say but using this maximum entropy framework in such a way that the weighted average on your ensemble will agree with the experiment and okay we have also done together with Max Bonomi and Mikhail Vendrusko a technical work where we compare very the efficiency of these two possible approaches so to make it short so clearly reweighting is really nice because you do it up posteriori so you take your 10 years old simulation or you take your fresh simulation but you're weighted with datasets from 10 different experimental groups and every time it's very cheap because you just have to reanalyze your simulation whereas if you want to enforce your restraints you need to have the experiments before you run the simulation and clearly if the experimental data change you have to run your simulation again okay so I don't know how much time I have yet so I will try I will continue with this nine minutes okay so I think I will only show the first part I try to go though to be kind of didactical on this part so I will show you this application now so this is an application of this maximal entropy based method on a non-coding RNA this is a sign B2 element that was identified in the group of Stefano Gustincic at CISA some years ago so this is a antisense RNA that is used to enhance is used by the cells to enhance the gene expression so it basically finds a complementary messenger RNA binds to it and then enhance its capability to produce proteins by a significant factor and the nice thing is that the experimental group working on this was able to identify so they managed to identify first this element here which is a 100 and something or almost 200 nucleotides with this function but now they are coming out with the idea that this herping alone is able to to trigger this function and so we decided to try to study the dynamics of this herping to have an idea of which are the structures that this herping could have so what we did or actually what the experimenters did they did NMR experiments first NOEs which means that you measure pairwise distances between protons and you know which pairs of protons are close which pairs of protons are far or on average let's say and then what the experimenters typically do they try to build a single structure that satisfies all these restraints all these constraints so if two protons should be close to each other yes that should be close full stop and then if you try to enforce all these signals at the same time you so I you can guess that could there could be some trouble because this signal could come from a mixture of different structures so now if you look at the PDB that was solved using this experimental data it contains 10 structures it's shown here but basically can be grouped in the like structures equivalent to this one and structures equivalent to this one with a small tilt in this nuclear base okay so the first thing that we try to do we took these structures which are experimentally refined and try to run MD and the first thing that we see is that so the first structure agrees with all the experiment we run MD for a while and suddenly we don't have agreement with experiment anymore so the first thing that you might think is okay that's because your MD simulation is total crap so you have a wrong force field the system starts from the correct configuration but goes into the into an incorrect one that's actually not the case if you look at the structures it's it's not too difficult to to realize this and in particular so here in this plot what I'm showing is the distance that is measured between a specific purse of proton and this the guy the glycosidic bond angle is the the rotation around the bond that connects the nuclear base with the sugar which has some typical distribution that should be around here then if so what if we take our MD simulations and then we use reweighting in order to assign a high weight to those frames that agree with the experiment let's say and experiment tells me that the distance should be smaller than this horizontal line then I will add a lot of weight to points which are here but these points are in the tail of the distribution if you look at the structure visually you will see immediately that these structures are incorrect they are really stressed so in order to have this distance to be in agreement with experiment you make your nuclear base very very stretched you put it on the tail of the Gaussian and this already suggests you that something should happen actually what you can expect is that in reality there is a mixture of this structure and of another structure that is completely different but once averaged correctly with this one would give you the the experimental observation okay so in order to see this this torsional this rotation of torsional angles you need an anset sampling so here we used some variant of metodynamics and replica exchange that if you want you can find in the paper the details and during the simulation we we also for this the agreement with these experiments we we see dynamics and and then if when we analyze with the same metric as before our simulation we see that from time to time so some of the structures goes to a different rotamer where the where this torsional angle is flipped to a sin conformation which is expected to be rare but not impossible and indeed that's able to explain the experiment without the need to resort to a very uncomfortable position for the nuclear base okay so how can we then rationalize our results so at the end of a max end procedure what you have is an ensemble you have a lot of structures but you would like them to be not too many not thousands because you don't you don't want to watch thousand structures so the first thing we did is a clustering analysis and also we Sabina riser who was the the post of working on this project worked a lot in doing these cool pictures so this tells you basically the how heterogeneous is the ensemble that you expect to exist and to be in agreement with the experiment but okay there are so there is one important comment about this clustering because we struggled a bit on doing clustering correctly here what we wanted are clusters that are really homogeneous structurally so with basically the same NMR signals and the structure which are similar to each other and standard clustering methods do not have this feature and so what we did we we decided to use a homemade very expensive method based on maximum clicks that work pretty well but there is a very nice paper published this year by Gonzalez Aleman et al where they actually show that the quality threshold clustering which is a method published in the 90s can be used very effectively to obtain exactly these so clusters that are structurally homogeneous and that's not true with the some of the methods that are used very much in the literature as it's explained in this really nice paper cited here okay so still okay we have clusters but there are too many I really don't want to watch all of them so Sabina Reiser had the idea of trying to use a combined maximum entropy with maximum parsimony so let's try to instead of using all the structures that we have let's say let's try to only use structures coming from one cluster can we reproduce the experiment the answer is not so let's try to use structures from two clusters three clusters and so on and what she discovered is that for this specific system what you need is four clusters which due to the way we define them they are different from each other and so they contain structures with different annotations and different base pairs and stacking patterns and with this specific population so the answer is not unique you have different possible sets of four structures but the nice thing is that you can really find out which are the structural features that are required to reproduce experiments so you can see that you need some of this guanine to be in a synconformation and you need some of these AU base pairs to be broken and that's very interesting because for this we have kind of independent validation we have other experiments that we did not use in in this calculation that suggests that these pairing AU here are at least transiently not formed so I would say this is a kind of independent validation of our result yeah okay so just summarizing these okay this okay I will not summarize let me just make a small advertisement so if you are interested in learning more about enhanced sampling methods we are organizing plume masterclasses and I think Max Bonhomme will tell you more about this at his talk there will be zoom lectures with a limited number of participants the participants will interact through a slack workspace and the deadline for application is soon is in a couple of months and okay now let me jump to the very end of my talk just to show you the pictures of the people that did the job so this is an old group picture but I needed this because this is the work the work that I showed was made by a poster that now left my group with Sabine Reiser who is here and then I want to thank also the experimental collaborators for the work that I show you that are Stefan Augustincić and Silvia Zucchelli who disappeared last year and then thank you all of you for your attention thank you very much Giovanni I send you also my reaction and so yes so questions so as I said if you want to make a question so just raise your hand and then I will reveal which of your audio so there is a thank you very much from Davide Bassani don't be shy yeah don't be shy just thanks a lot can't find there is your end may I speak sure yes this is Shravani from India hi hi nice to be in ICTP after a long time so I have a question very nice talk I just needed to understand I do see in the literature not only maximum entropy but also maximum caliber models so what is the difference between the two okay so maximum entropy is for ensemble averages maximum caliber is for a dynamical trajectory so in a nutshell it's a it's the same but instead of being applied to averaging between conformations it's applied to averaging between trajectories so again if I have a non-ergodic system then only there will be a difference in the two predictions. Ergodic systems it should not matter right if I have long enough sampling. I'm not sure this is a correct interpretation so the point is that with the maximum entropy what you obtain is ensemble averages but you don't obtain time correlations now let's say that you have an experiment that measures the time correlation function of an observable then you can use the maximum caliber in order to enforce that time correlation function in your simulation so it's like the difference between enforcing an ensemble average and enforcing some time dependent ensemble average as it would be a time correlation function so it's really I would say it's related to the kind of experiments that you use not on the effect on the ergodic of the system. Okay okay and I have a second question if you know the rate constant of a given associated with some some kind of a conformational fluctuation in the kind of systems that you are talking about so which enforcement is going to be more appropriate. If you know rate constants I would say you you cannot use maximum entropy you have to use maximum caliber or in other words maximum caliber is just it's nothing less than from my understanding nothing less than maximum entropy applied to trajectories so but you have to use maximum caliber to enforce rate. Okay, thank you very much. Giovanni there is a question by Sharon Arti. Is it possible to use the experimental restraints to then refine the force field? Yes it is possible so this maximum entropy framework is not designed to do that because the correction to the force field is going to be by construction the one that changes the least the distribution so as I said if you are measuring a distance you will correct directly that distance whereas when you want to refine a force field typically you choose a priority based on some other chemical intuition the functional form let's say you want a force field made with torsional angles and then you can refine those torsional angles in order to get agreement with experiment but that's not within the maximum entropy framework because you will not maximize entropy anymore you will maximize agreement with experiment directly I don't know if I answer to your question. Other questions? Are there more questions? Questions? I can't see any. Yeah, don't be shy ask questions. Well okay I don't know if there are none I think we can thank Giovanni again for the talk. It's time to you and so I will stop sharing now. Then I think we can move to Edgar. Yes can you hear me? Hello. Yeah we can hear you Edgar yes. Okay very nice one second let me try to share screen. Right, do you see my screen? Yes we do. One second more. Do you see my pointer? Do you see my pointer? No. Ah yes yes. The right thing okay. Yeah yeah. Yes the right dot. Exactly. Okay okay. Go on. Okay so welcome everybody I'm one of the co-organizers of this online activity. I want to thank the other co-organizers especially to Angelo who took care of a lot of work for this to happen. As Ali said before I also work in STP so it's a pity we cannot meet in person but okay we can do what we can do at this moment and I would like to invite everyone who has questions to make questions during my talk especially young people because we're going to discuss about stochastic thermodynamics which is not the main topic of this of this event so there will be a lot of new concepts for many of you and in particular I will discuss on something we call martingale which is a tool used in finance that we are using in the recent years to explain fluctuations of thermodynamic systems like the ones sorry to interrupt there is a small window in the middle of your slide. A small what? There is a like a small gray window that that is overlapping on your slide I don't know it's written no big defects. Do you see it now? Or maybe just move down okay. Okay okay. Well let me start with a review of thermodynamics this is what it's in many textbooks and classical thermodynamics so the first law says that you can get something for nothing so energy is illustrated with this stone is transforming to two different forms of energy one is heat that is dissipated in the surroundings and the other is called work that is used to to move things like this bucket you have here. The second law on the other side it discusses the irreversibility of thermodynamic processes and it says a few words that you can't even break even meaning that as I showed here in this with this glass you can processes tend to increase the disorder of the system or of their environment following this law so you can in general the entropy of a system increases meaning it becomes more disorder and also the entropy of the environment so you can make a system more organized by reducing the system so the system entropy reduces but there is a price that this you have to pay this price heat dissipated to the environment which increases the entropy of the of the surroundings right so in the recent years there has been a lot of progress in doing thermodynamics of smaller systems like for example a DNA herping a colloidal particle or a nano electronic system here I show an illustration of a colloidal particle that is driven out of equilibrium because there is an external force pushing the particle but it's a small particle so it's a bit of fluctuations. I'm also an organizer of the this activity and I want to also thank Angelo of course he made a lot of work for making that this work and thank you all for being here I'm going to talk about a topic that it's somehow transversal to all the Alex that we are going to be clustering technique clustering is some supervised machine maybe we have to change the speaker that deduce the... Okay, can someone mute Alex? Without using it in a priori classification. Okay, all fine. Can you hear me? Yes, so sorry there was probably a misunderstanding. All right, let me go on. So I was talking about this particle moving right and left with fluctuations, which is a typical example of stochastic thermodynamics. So we do thermodynamics with few degrees of freedom. For example, we aim to understand what is the heat dissipated by the particle when tracing a particular trajectory. This type of formalism was introduced in the 98 by Kenseki Modo, who by the way will be giving a talk in this activity. And he introduced a formalism to describe what is the heat along a single trajectory or what is the work along a single trajectory of the particle. So these are new concepts developed in the last two decades. And there's also the concept of stochastic entropy production, to which I will discuss in detail in this talk. And there's a second law, which is now for ensembles of trajectories. So you can have a trajectory of the particle going against the force, and this is interpreted as a negative stochastic entropy. But on average, the particles, if you look at many trajectories, they will move more towards the right and towards the left, because there is a bias. So we'll go into this in more detail later. This is a typical example of a model that we study in stochastic thermodynamics. So we have a chemical reaction, reacting some products. We have different chemical potentials. And thermodynamics tells you that you will have a flux in the direction of the gradient of the chemical potential. So you will have, for example, this reaction happening in this direction. In stochastic thermodynamics, we say everything can happen. So we assume that all reactions and the time reversals can happen with different rates. And a key assumption in our field is that the rate of a process and of the time reversed is related by what is called the environment entropy produced along one of these transitions. So this is for one transition. But as I was telling you, you can look at a particle that is drawing a long trajectory so that you will have many transitions. So when you collect these many transitions, you can build probabilities of trajectories given an initial state and probabilities of the time reversal trajectory given a final state. If you take the assumption of the equation I gave you before, these two will be related by the environment entropy change along this trajectory. Similarly, you can do the same with the system entropy. This is not the Shannon entropy, but it's the stochastic version of Shannon. It's just the logarithm of the probability to be in a site extinct at Antti. There is still this technical issue with your slide. On the screen, it appears like a small gray window on top of your presentation, on the top right corner. Yeah, let me try to... I'm closing everything I have open in my computer, so... Yeah, because otherwise it goes on top of the formulas, then we don't see things. Yeah, I don't see anything. Is it fine now? No, it's still there. I mean, I don't know. It's written no build effects. I don't know. No, it's group. It looks like something. What are you using? I'm using keynote as always. It looks like something. Edgar, right now, there's something on the right called build order. Can you close that? Okay, give me one second. This is the animation panel in... All right. I guess now it's fine, no? Okay, thank you. Sorry. All right, so this is the definition of the fluctuating system entropy, and if you sum this entropy plus this entropy, you get what we call stochastic entropy production, which becomes the Boltzmann constant times the logarithm of a full trajectory divided by the probability to see the time reversal trajectory. So this is a measure of time irreversibility of a process. And we often write it like this, so the total entropy associated to a given trajectory is the logarithm of the probability to see the trajectory divided by the time reverse probability, so the probability to see the time reversal trajectory. This is a formula that we use in stochastic thermodynamics to study irreversibility of different processes. Here are examples of systems that are paradigmatic. For example, is we have a periodic potential with an external force. We say this is a non-equilibrium, but this steady state. So this is an external force that is time independent. So we are not changing the Hamiltonian in time, but there is an external force pushing the particle, so such that it has a net current. But there are more complicated modes. For example, Feynman-Ratchets with two thermal watts or active matter, which is attracting a lot of attention in the last years. So stochastic entropy production has some universal properties. One is that if your system is in equilibrium, it doesn't fluctuate. It is always equal to zero. So its average is equal to zero. But the most interesting part comes out of equilibrium. So out of equilibrium, this is a fluctuating quantity, and it can even be negative. You can have an experiment where the particle is climbing a potential, and this we interpret as a negative stochastic entropy. But on average, when you average over many experiments, you will have always a growing curve. This has been seen also in experiments, and you can see negative entropy in electronic experiments that we will discuss later. So in the last years, there has been a lot of research on finding universal results for entropy production, work, heat, etc. So first of all, let me say that this looks like a purely statistical object. So in principle, it is because you just have to measure probabilities and trajectories. You don't need to know anything about the physics below. But if you take a physical system like an electrical circuit, you can measure the net growth of this stochastic entropy, and you see it's related to the heat dissipated in the resistor in this experiment, for example. And as fluctuation theorems, as I said, these are theorems. These are relations that are universal. They are common for many physical systems. And one is a very important one is the detailed fluctuation relation shown here that says that at a given time t, it is more likely to see positive entropy than negative entropy by this relation, which implies the so-called Yashinsky equation. All these are fixed time properties looking at your system at a given time. But I was interested in the last years in finding other type of new properties for which there was not much known. For example, the random time. So what is the first passage time of entropy production? How long it takes to produce one KBT? Or what is the extreme value of entropy production? We say it's negative. It can be negative. So how negative can it be? And these have obviously important consequences for, for example, schematic reactions, which we know that happen at fluctuating times. Or in polymerization processes where you can have backtracking events and this extreme entropy could be related to the extreme excursion of a motor. I will discuss about this in a way. So this is also to motivate anything. And maybe now go to the key concept in this talk, which are martingales. These are processes used in probability theory, also in gambling and finance. And they have interesting mathematical properties because they provide a shortcut to tackle extreme values next stopping times. So we can have tricks to calculate extreme values in an easy way. And they have been surprisingly unexplored in stochastic thermodynamics until this paper and also our work which started a new research line on thermodynamics with martens. So it's very surprising because they appeared in all the books in probability theory gambling and finance, but there was no link to such a strong link in physics until recently. What is a martingale? So martingale is a process that we say a martingale with respect to xt is a real value function. This is a real process that is bounded. And the most important thing is this condition here. So if you know x from time zero to time s, this is a martingale with respect to x, what you expect in the future is the last observed value of the process. So it's a process that has no drift. For example, Brownian motion is a martingale and fair financial markets are market or fair games. To give you an idea, this is an illustration. So you look at the process and you know the history up to time tau. If the process is a martingale, what you expect anytime in the future is that its average value is exactly at the last observed value. This is what this means. So it's a process without drift. All right. And this has been applied very much in finance. So let me ask the following question. Can a gambler make fortune in a fair game by quitting at an intelligently chosen moment? So you have a process and you say, okay, for example, this will be the money I have in the stock market. And I say, okay, I want to win. So all I need to do is wait until this process reaches good benefit. I don't know, 10,000 euros, for example. You just wait. So you would make profit out of this axon. However, the problem is we have finite money. So at some point you will enter, if you have Brownian motion, Brownian motion can go infinitely low. So you can get infinite losses. So you would make profit, but this is not possible in gambling or in finance. However, you would say, okay, I want to win. I want to stop if I get 10 euros as a revenue or if I'm losing five euros. So I do a two boundary first-party problem and I win. It turns out that if you have a martingale, by doing this, there is no combination of thresholds by which you will win. So at the end, at the stopping time, the average of the process is the same as at the initial time. So on average, the gambler makes no profit. So you cannot make profit with martingales. This is the main message. So a gambler cannot make fortune if it cannot foresee the future and it has access to a finite budget. So this means it's using a stopping time. You cannot see the future. And second, it has a finite budget, is uniformly integral. So this is a bit technical, but mainly there was this theorem by Dubb that says if you have a martingale, what you expect at the stopping time is the initial one. This is called Dubb's optional stopping theorem. And it's valid for any stopping time, not only a two boundary first-party problem. Okay, this is a very powerful result, true for all stopping times. So this is a big class of first-party times, let's say. And we found in 2017 that there is one property of entropy production that is martingale. First of all, we found that entropy production itself is a so-called sub-martingale. So you have this condition. If you know the past, you know that in the future it will grow given any previous knowledge. And this implies the second law, because if you put tau to zero, you know nothing, so it implies the second law. This came from finding that exponential of minus the entropy is a martingale. So it has no drift. But we have this condition that I'm trying to show you, this one. It's a martingale. And this, if you set tau to zero, this implies it's a k-quant. So it's very nice. This looks a bit mathematical, but this opens up because you can take a book of martingales and look at the mathematical properties and apply them to physics. And this was what we did next. So we applied properties, mathematical properties of martingales to get into these stopping times and the extreme values of entropy production. So in the stopping times, we found applying the martingale theory, very surprising results. One was that if you put two absorbing boundaries and you measure the time it takes to reach the top and the bottom boundary, you will have more hits in the top than in the bottom boundary. But the time it takes, so the distributions look the same. The histogram has more counts, but the shape is the same. So there is a symmetry in how long it takes to produce entropy production. There is this symmetry for the times, which was for us extremely surprising. So you are going in this direction more than in here. But given that you reach the boundary, you take the same time to reach the top boundary with a drift than to reach the negative bound. So this was a very surprising symmetry that we could test with an experiment. So you have the particle here, you put a boundary here and a boundary here. And of course, it will fall down because you are pushing the particle more. But if you measure the time to reach the boundary in the bottom and the top, they are the same. They are the same distributions. And this is a new result that we found using martingales. Moreover, we applied this theorem that I told you that you don't gain with martingales to entropy production. And we have now a fluctuation theorem at random times. So any gambling strategy that you use, you will have no gain for e to the minus s. You will have a gain for s because you apply Jensen inequality and you get there is a second law at stopping times. And this is for any gambling strategy. You can put two boundaries and the boundaries, you can put them in any positions you want. It's a very, very, very general result. Right. And this is an illustration. So in a fluctuation theorem, you have systems in a non-equivalent steady state and you wait for some time. You let them evolve, let them evolve dynamically up to a given time and you collect the entropy at the final time, the entropy production. And if you sum them, it's positive. Now we say something else. We say we put a criterion. So we put, for example, a ring and we track the systems until they cross this ring. This is a random times. Each system crosses at the different time and we measure the entropy production at this crossing time and there is a second law even in this. The nice thing is, sorry, that this one is a particular case of this one because one specific stopping time could be you have your clock and you wait until your clock ticks three seconds. So it's generalizing the previous results. Right. And now I think this is even more exciting, our extreme values. So we are looking at what is the negative record of entropy production. You look at entropy production to a given time and you'd like to know what is the negative record? What was the minimum value in a finite time? With Martin, as we can calculate the statistics of, we call this infimum. Because there is a theorem by Dupe, well by Jampil, because he is totally from him, which tells you the distribution of the maximum of a supremum is bounded by its average at the final time. You can apply this to entropy production and get these very nice results. The first is a bound on the full distribution of the minimum. And the second is that the average minimum cannot be below minus one constant. So there is a lower bound, fundamental lower bound for the mean of entropy production on average. And this is very, very general. So these mathematical results, you can take a theorem and see what are the assumptions of the theorem. These assumptions become conditioned for the physical process. It's a value for Markov chains, for Langevin and Amis, for continuous time, microprocess in a steady state. It's a very general result. It's so general that we went to a lab and tried to test this in an experiment. This was a condensed matter experiment. It's an electronic double dot. So we have two islands and they are electrons. This is a very low temperature. Electrons that can tunnel to these islands. The electrons see each other and they have a Coulombry person. So all in all, this behaves like a four state system in which either you have no electrons, you have one electron on the right, you have one electron on the left, or you have two electrons populating the islands. In this system, you can measure entropy production, stochastic entropy production for different biases. You bias very strongly in the system. This grows very fast. And from here, from this trajectory, measure these extreme values. And it turns out, measuring the extreme values and building the distributions that our theory is really good. It's really providing a bound to the extreme excursions, the distribution. And then, if you do the average, the average minimum, a very long trajectory, cannot be below this bound we found, the minus one bound for the infimum. You get closer when you are closer to equilibrium, 25 microvolts. When you drive strongly the system, you don't saturate to minus one. And this is something we applied later in biophysics. So now, let me go a bit more on the topic of the conference, which is an application to biophysics. So we use this theory of extreme values to relate molecular motor fluctuations with random matrix theory. This is a surprising result derived by my student, Alexander Giet. We were looking at a simple model of a molecular motor. So we just have a walker that walks to the right for jump rate K plus and backwards with jump rate K minus. And this is in periodic boundary conditions. The time is continuous, but the space is discrete. We define here A is a cycle affinity is the bias parameter, which is related to the force, net force applied to the motor, the length and KBT. And there is also a symmetric parameter it's called rate that appears as a pre-factor in the two rates. This is a simple model. So in principle, one should be able to calculate everything analytically here. The nice thing is, if this is in a ring, all states have the same free energy, this one, this one, this one. You have a uniform distribution in the ring. So the entropy production becomes just the work. And this is FL times X, simple. And therefore, the entropy production in this model is proportional to the position of the motor at time t proportional. This is not general, but it is a simple example to illustrate our results. All right. So this means that when you jump with the drift, you dissipate heat. So you produce entropy. And when you jump backwards, you are reducing it. So it's like you have a particle rolling in a linear potential. And when you roll down, you dissipate heat, you roll up, because you are taking energy from somewhere, from above. The question we wanted to address is the trajectory of the motor, single trajectory. We look at what is the running minimum, what is the minimum up to time t, and the running maximum. We'd like to understand what are the statistics of the running maximum and of the running minimum, and also of the running maximum of the entropy production and of the running minimum of the entropy price. These are important in biology in general, because to understand robustness and resilience in biological systems, which are subject to failure, like for example, there's here a micro tool that is an extreme event in which the polynomial, it starts to enter in catastrophe or sperm navigation is an extreme event as well. So we think that understanding resilience and robustness is important in biology, not only averages. So the first thing we could derive in this model is the long time extrema. What is the long time minimum of the position of the worker? And first of all, we find this is the distribution of the minimum, it's a geometric distribution. And second, the average minimum has this expression, so it only depends on the bias parameters. And when A is zero, this goes to minus infinity. So at the stall force, the extreme value diverges. The average extreme value, which is this line diverges. However, this doesn't happen with entry production. This entry process is A times this. So what happens is the following, that this is bounded. It's not bounded by minus infinity, it's one minus the both one constant, sorry, which is what we show in the previous work actually. So when you make the bias stronger and stronger, the long time minimum gets closer and closer to zero. When you're close to the equilibrium, you are here. So more importantly is that not only we can see the average long time values, which are here, here, but that my student could derive a very nice expression for the full curve, the finite time average minimum, which is given by this formula. It looks a bit scary, but I'll show you a very nice way of expressing this formula. So the main point is we found there is a symmetry, mean maximetry, between the minimum of entry production, this excursion, and the maximum. So this is the value of time t minus the maximum at the same statistics as the minimum. And this is related by this formula. Interestingly, we found that this formula can be rewritten as an integral over times. This is like a relaxation spectrum, and where the relaxation spectrum is given by the so-called Marchenko-Pastur distribution. This distribution has a finite spectrum. It has this asymmetric shape. This is called Marchenko-Pastur, and it appears interestingly in random matrix theory. So it turns out the following, that if you want to calculate the extreme value, you can do a very nice shortcut. You draw a random matrix. So this is a matrix, which is size m times n, and its entries are random numbers. And they can come from Gaussian random numbers, for example, with a given rectangularity. So the size, the asymmetry in the size is related to the asymmetry in the bias of the walker. So you plug in random numbers in this matrix, and you construct a wishart matrix. So you multiply the matrix by the transpose, and you divide by the number of columns. Then you diagonalize the matrix with the eigenvalues, and it turns out that if you make this sum, exponential of minus t divided by this eigenvalues, this gives you the average meaning. The matrix, yes, you have three more meanings. Okay, fine, fine. I have more stuff, but I'll summarize some. All right, so it turns out that you can compute very efficiently extreme values of the molecular motor by drawing very small random matrices. And this is what we show here. The black line is the exact calculation, and the points, for example, is taking a four by four matrix, just one matrix, making the eigenvalues, and making the sum. This gives you this very nice approximation to the extreme value. So we got very excited about this result, and we are trying to fully understand it because it's a surprising connection to us. I don't have much more time just to tell you very briefly, these ideas can be applied to more complicated models, where you have a spatial coordinate, and a chemical coordinate, for example, ATP consumption, and the position of the motor, and okay, I don't have much time, but recently, very, very recently, we introduced what is called gambling demons. They are different to the Maxwell demon. It's illustrated here. It's a demon that is gambling. So you can either stop the process at a fixed time, or stop when it has gotten a very good revenue. So it means it is using, okay, this work is done by Gonzalo Manzano here at ICTP. You can stop, for example, the work at a given time, only if the work didn't cross a given threshold. So this is similar to what I was telling before on the stopping times, but now with driving, before it was steady state, now that it's driving. And okay, I finished soon. It's just that we derive this second law, work minus free energy at the stopping time. It's not greater than zero, but it's greater than something else. If you have a question we can discuss in the discussion tomorrow, which is related to the asymmetry of the process. And we could find in an experiment that you can go below the free energy by doing these gambling techniques. This we could see very well recently in an experiment. The more irreversible you are, the more you can gamble, and more you can go far from the second law. This is very interesting. You can see this in our archive. Okay, just to finish, Martin is in the air. So there are different authors that are following this approach. Here are good examples. Kerskeimoto would give a talk about it in this activity. We are writing a review. I put here to be finished in 2050 because it's taking us a long time to finish. And I want to thank you for your attention and also to different collaborators on my current team for the great work. So thanks a lot. Thank you very much, Edgar. So as before, we are open to questions. As before, don't be shy. Yes, even if the question is a concept or anything, please go ahead. So maybe I'll just start to break the answer. On this last part, Edgar, sorry. I didn't understand what is this delta. I just missed it. So this is, okay, one second. The delta is information theoretic quantity, which is the logarithm of the ratio of this is the density at time t in the forward process. And this is the density in the backward process because here it's not a steady state. So it's a driven process. So you drive the process from time zero to time t and you look at the snapshot at a given time. This one is the same snapshot in the backward process. So you are looking at the density of the process in the same time, but in the forward on the backward process. So this is the density. For example, there is a process that goes from time zero to time 10. And this would be the density at time two in the forward process. And this is the density at time eight in the backward process. Okay. So it's a measure of the things diverge. Yeah. So this is the forward process. And sorry, one question. You start from zero. You start from zero and you finish in town. So you're looking at a given time in the forward process. Sorry, my drawing. The forward process is flowing in this direction. And the backward process will flow like this in this direction. So you are looking here at this time in the backward process here. Forward and backward. When the process is very reversible, the statistics of the process here in the forward process will be different than the statistics of the process in the backward at the same time instance. So this is measuring the density of x at time t, this one. So this one is here. And this one, I'm just messing up the colors, but this one is here. So this measuring the probability to be given x at a given instance in the forward and the backward process. When you are in equilibrium, they are the same. But when you are out of equilibrium, the driving processes generate asymmetries in the distributions. So that's why we are calling stochastic distinguishability. Distinguishability in the forward and the backward process. Thank you. So I see that we have a hand raised by Marcelo Albuquerque. Please. Okay, thank you. Thanks for the talk, Professor. Nice talk. My question is about the bias in the system. I don't know if you understood what you said previously, but you said that the system, I don't know if you would be biased or if it could be biased. I would like to understand. So this could be, for example, a motor pumped by ATP and there is a chemical reaction that is, and there is an asymmetry. For example, here, there is a potential here that is asymmetric and there is a force ATP that is pushing the motor more in one direction than in the other. In the way we describe, this can be anything. So this is a net force on the motor. For example, there could be someone pulling from the motor. For example, you can have a motor in an optical tweezer and you can pull from the motor in the optical tweezer. This means there is a bias. So the motor is going, walking more in one direction than the other. It has transport. Okay. Thanks. So we will have to to take into account the force that you are putting. Yes. Okay. Yes. Thanks very much. All the models I discussed have a bias. They are non-equivalent. Okay. Thanks. Excuse me. I have a question. You mentioned about microtovols in bonus slide. I don't remember. Could you please explain more, because my work is about microtovols and so please explain more. Yes. Of course. Thank you. Yeah. So here, the model I take is a motor that moves forward and backwards step-wise. So this is a forward step. This is a backward step. And the motor is biased. So it makes more forward steps and backwards. And I'm interested in looking in this process, the black one, what is the maximum of the process? What is the running maximum? Or what is the running minimum? So the blue process is what is the minimum up to this time? If there is a minimum now, the process goes below, the minimum has changed. This is like the record, I don't know, records in athletics. So this was the shortest time. But now there's a new athlete that has a shorter time. So this goes down. Right. So we are looking at records in stochastic process. This one is stochastic. In microtool, these type of events, downtrends are very important because you can build that trajectory with the length of the microtool. So now I'm not looking at what is the motor. I'm looking at what is the tip of the microtool. When it grows, this goes up. And when it shrinks or has a catastrophe, the length will decrease. Okay. So all in all, you have a process that is doing like this, fluctuations, like the black process here. So here, the extreme values are very important because negative extreme values are catastrophic events. So if the extreme goes very, very low, here is catastrophic because it means the process will, the microtool will be dislodged fully. So I think there is a lot of room for application of our results that are general. Two of these modes can be applied as well. Thank you very much. Are there questions? Yes. Sorry, I have one more question myself. So regarding this model, you said that this is discreet in space, but continuous in time. Why is it continuous? Is it not just a random walk? This is a bias random walk in a lattice in continuous time. Yes. Yeah, why is it continuous in time? Is not, because if you're in some kind of... Well, I assume time is continuous always. And I put discreet space because it's a coarse graining. Yes. But so mainly what we are doing in this time series is a Gillespie algorithm. So we run at random time until the next step. But this is a good description, I think, for single molecule experiments when you can distinguish states in the model. Of course, let me just say one thing more. Of course, it is more precise to me to have something like this. Like you have your motor here and you can either do a displacement via mechanical force. You can consume ATP without moving, which would be this. Or you can consume ATP and move, which would be this event. So using discreet, to me, the best model is continuous space, continuous time. But using this type of approach with discreet is useful when in an experiment you can only see parts of the dynamics. You can see only coarse grained states of the system. But I would take that continuous time because I think it makes more sense. So we have another question from Catherine Azizi. So please. You showed some graphs on the entropy evolution in time, which was decreasing. So I was curious, is it for the isolated systems? Or is there any entanglement effect? Or is it just so that it is isolated? Yes, this is a good point. Here the description that we do in stochastic thermodynamics is something more similar to what you see here in one second in this figure. This is the paradigmatic model. So we have a colloidal particle. It's the gray particle. This is in a potential. It's this black line. So this is the colloidal particle. It is in a potential. There is an external force. And there is a bath. I'm not putting the bath molecules here, but this is full of other particles, green particles that are around that are interacting with this particle. So I'm not talking about an isolated system. Of course, all together will be an isolated system, but we look at the dynamics of only of this guy, which you can describe by this type of model. It's a long given dynamics where there is a bath, which is this noisy term. In other words, I think it was better this one. We are doing thermodynamics on this colloid. But this is the entropy that we look is the entropy exchange between this colloid and the environment, not of the full field. It's a classical system. It is canonical. It is in a thermal bath. It is not an isolated system. Of course, to do this simulation, you need to isolate the box, but in stochastic thermodynamics, we just see this red particle. We don't see the others. We want to do thermodynamics on the trajectory of the red particle. So we look at the entropy of individual particles, and then we will see some decreasing. No, no. This, yeah, yeah. It decreases because this particle is moving backwards. So we live in the world of land-driven dynamics. We live in this world. For us, this is what is our world. So in this world, you can have events where the particle climbs up, and this is because it's absorbing heat from the environment. But we are not tracking the heat exchange with each individual particle, but we are living in this description, which is noise, the noise accounts for an average force of the bath molecules. So it's an effective description. It's a mesoscopic description, what we do. Thank you very much. I think we stop here. Thank you, Edgar, for the very nice talk. And so I think now we move to Alessandro. Hi. Okay. Alessandro, you can just go on. I can share the screen. All right. Fine. So. Okay, great. Please tell me if you can see my presentation. Yes, we do. All right. Great. So my name is Alessandro Laio. I'm also working in CISA. I have been working for quite a while on the development of advanced sampling techniques, even if more recently I am more focused on data analysis and data science tools, also for analyzing biomolecular systems. But, okay, today I will give you an overview of, well, not of advanced samplings in general, because this would require five days at least. But at least on the reason why advanced sampling is necessary. Okay. So, well, our dream as scientists, well, let's say my dream as scientists, everybody can have his own dream, but, okay, one of my dreams as scientists is actually has always been trying to understand from a physicist's point of view how biomolecules work. So these things, biomolecules are amazingly interesting machines, which actually are extremely efficient. They do incredibly precise tasks based on what? Based on the laws of nature, of course. So they basically convert thermal fluctuations into ordered work in a manner which is absolutely amazing, and we would like to be able to copy in order to, well, for example, design nanomachines. So this thing is understanding really how biomolecules are working is actually a big challenge for nowadays science. And, well, molecular dynamics has been and is going to be one of the key tools for addressing this extremely important challenge. Okay, so what is molecular dynamics? We have already heard a lot about in the talk by Giovanni. So given a potential energy surface, one solves new consequences of motion. So very, very easy in principle. But in order to make it efficient, you need a lot of work, a lot of ideas. So basically, nowadays, modern molecular dynamics tools, codes are extremely efficient, are parallel and is scalable. So these, these required a lot of work, a lot of work with more than three decades of work by, well, not really hundreds, but I would better say thousands of people. So nowadays, the molecular dynamics tool that we have in our hands are extremely powerful. We have to understand where the first thing we're going to understand the first part of my talk is how powerful they are. So why is it necessary to have powerful codes? Because if I want to simulate a realistic system, for example, a realistic biomolecule, I have to cope with three different problems, the different codes. So one is that I would like to, to simulate large and inhomogeneous systems. Yeah, this is the image of a membrane channel, for example. So this is like 300,000 atoms easily. Well, then I have a problem with the time scale. This is the one on which I will dedicate a little bit of time in this presentation. And the other issue is the accuracy. You know, one would like to have from these simulations a quantitative answer, not only a nice movie or a qualitative description, but really something that helps you to understand how, for example, a biomolecule is working and eventually adapt it in such a way that you can, for instance, control its behavior, right? So here in order to do this, you need accuracy. So here you have to choose the correct level of accuracy for performing your simulations. Here, in this overview, I will simply start from there, because this is actually the key point which will then determine how much time we can simulate. So which level of accuracy of description should one choose? So this clearly strongly depends on what you want to simulate. So, well, let's say the chip option is using classical potentials. If you use a classical potentials, of course, you will not be able to describe a chemical reaction, but you can really, really do a lot. You can already study how, for example, a protein works, how a protein, a typical protein, not an enzyme is able to perform its task. So here there are no quantum effects on atomic motion, no electrons, no chemistry. Well, what we call a force field is a potential energy function and a set of parameters which enter in its definition. It is designed to reproduce the molecular geometry and some selected properties of the molecule. Also, in order to develop these force fields, these required really, really a lot of work and a lot of insight. It's a, now almost a four decades old story. So here, just to give you an idea, so this is the functional form of a potential energy function for performing biomolecular simulations. You have a bond term, an angle term, this one. Then we have a dihedral term which describes the interaction between four successive atoms. And then, well, and then we have the non-bonded part, which is actually the rate-limiting part. So here computing the first part is somehow cheap because it involves only the few atoms which are close by. Instead, computing the second part is what makes the computation expensive. Okay, so here then you have another option. Let's say that you want to describe a chemical reaction, then you have to deal with the electrons. So if you want to deal with the electrons, you have to go and chat a little bit with Mr. Terschrodinger and use his well-known equation. Okay, then in principle, you would like not only to solve the electronic structure for a given configuration, but also do dynamics using the forces that are derived from your electronic structure. Then you want to use at the same time the Schrodinger equation and the Newton's equation, and the manner of using these things together was actually first developed and invented here in Sissa by Carparinello in their famous 85 work. Okay, so this approach is great because variance with a classical force field like the one that I described in my previous slides does not require any fritting, no free parameters. Everything comes from first principle, but of course you pay a price. And this price is basically it is orders and orders of magnitude more expensive than the classical force field. So orders and orders of magnitude. Okay, so here now this is an important slide because it gives you a little bit an idea of what we can force simulate in now a day. So let's start with the classical force field. So let's make an example, a simulation of HIV protease with a classical potential. So this system has approximately 50,000 atoms. Okay, so each atom interacts approximately with 100 atoms. Let's say it's neighbors. So in order to compute the forces in a given configuration, I need to do something that basically I need to compute forces between 50,000 times 100 atoms. So in order to perform this number of calculations and ordinary CPU, a single ordinary CPU needs to do basically this can be done in approximately 0.2 seconds. So in 0.2 seconds I compute the forces, all the forces on this system and then I am able to move my system for one step. So here basically in one day I can move the system well here basically you can read the calculation is very simple but basically half a million times. I can move it half a million times. So this means that I move my system half a million times by a time step. This time step is if you know a little bit about molecular dynamics is one tenth of a second which means that in one day I will be able to basically simulate one nanosecond, one nanosecond in one day. One would say so this is already this is an enormous technological success if you compare to what we were able to simulate decades ago but it's only one nanosecond in one day. So here this is what we can do with classical potential if I want to use a carparinel or a quantum potential in which electrons are treated explicitly. What I can actually simulate is something much smaller. I can simulate a much smaller system and instead of 10 to the minus 9 seconds I can simulate only 10 to the minus 11 seconds. Well of course I can use something which is much better than one single CPU and people across the world have built large computers. In particular David Shaw in New York City has built a computer which is able to do only molecular dynamics of proteins. So this gives you an idea of how important this task is. So he has invested an enormous amount of money so this is the money that he has invested 100 million dollars of his own money. So this guy decided to invest his own money in order to make this thing work and he has built an hardware which is able to do only this. So of course it is much faster than our computers of course. So what can you do with that? So with that you can simulate 17 microseconds per day on a system more or less of the size that I mentioned in my previous slide. So 50,000 volts. Okay so this is an enormous technological advantage and in fact these guys are obtaining amazingly good results using these fantastic computers. But how far can we go with these numbers? Okay so let's say that I am rich and therefore I have this computer so I can simulate 10 to the minus three seconds in one month. One millisecond. So this number 10 years ago would have looked like a miracle science fiction. So let's say that here with my resources I am able to simulate instead 10 to the minus five seconds in a month. So two orders of money to last but still quite a lot. But now okay so the typical time scales of interesting transitions are typically orders of money to longer. So even protein folding. One millisecond there are many many proteins which in order to fold take almost a second. So this means that we are even if I have this enormously powerful computer I will not be able really to see even a single folding event in many many proteins of great interest. The situation is even worse if I want to simulate phase transitions. Phase transitions happen on mesoscopic time scales. We are used to see phase transition happening in our real world time which means seconds hours. So how can I simulate a phase transition if I am only able to simulate one millisecond? I have a problem. Abination is even worse. Chemical reactions also chemical reactions. There are many chemical reactions which happen on the time scales of minutes hours days. So these things are clearly out of reach for direct molecular dynamics simulations. Okay however even with the moderate computational resources we can do something. I'm going to show you two examples which are well both not from my lab. So this is a simulation of the folding of a knotted protein of 86 amino acids which in real life falls in approximately 0.1 seconds. So 100 times longer than what it is possible to simulate with the big computer which I mentioned. So this is the folding trajectory which was obtained well in the lab of Pietro Faccioli in Trepto. So it's possible to obtain this thing and it's possible to obtain many folding trajectories like that. First example. So this is a second example a phase transition in the zeolite. So this system in order to form to observe a phase transition like that. So this is something which in real life happens on the time scale of minutes if I remember correctly. So here already more than 10 years ago it was possible to obtain and observe systematically and study systematically many phase transitions of this system with moderate computational resources. So here you see a trajectory with the system really jumps from one phase to another. How is this possible? How is this possible? Well the key for making this possible are enhanced sampling techniques. Here of course in the 20 minutes that I left in my presentation I don't have really a chance to give you even an overview of everything that is around but at least I would like to leave you with the idea of what is the gist, what is the main thing that makes this miracle possible. So how is it possible to pass from well a simulation that is 10 to the minus 8 seconds to get information on a process which would occur in on the scale of seconds. How is it possible? All right well let's try to give a close look to the systems which we already mentioned. So the idea is to understand a little bit what's going on. We should first of all find the function of the coordinates which is likely based on our chemical and physical intuition to take very different values in the different states. We call this function S of X. So here for example in the case of phase transition a good function is the box shape. Here you see that in this transition the box simulation box changes shape. Here in a chemical reaction the bonding pattern here you see that these atoms form different bonds in the different states. Here in the case of protein folding I can choose many the fraction of alpha helix, the number of contacts, the gyration radius. So I have to find the function which basically takes different values in my different states. Okay now how would this function behave as a function of time? Okay so this is an example here this is a very simple system. Here I have a simulation time and here I have my collected variable. Here you see that I have a fluctuation in one state and then a jump to another state. Okay so here so this is the time before observing the reaction. So this is the time which is of the order for example of seconds or hours or of days and this is the reaction time. Here you cannot even almost see but it's actually much much shorter than this time before the reaction. When we say that protein folding of this knotted protein of course on the time scale of 0.1 second we are actually talking about the time before the reaction not about the time that the reaction takes to happen. So this difference this subtle difference between these two things is actually our handle where what we use in order to develop power and unsamplified techniques. So how do we do it? So in a molecular dynamic simulation of a chemical reaction if a transition the system actually spends most of the time oscillating in a local minimum. So this is actually the key point. Only rarely it performs a jump to another state. The time required to perform the jump is short. It's very very short. For example in a chemical reaction you can have a waiting time of one hour but then the time that the reaction of course to really happen as soon as it starts happening can be one picosecond 10 to the minus 12 seconds. So one hour so waiting time with nothing happening and then 10 to the minus 12 seconds everything happens all of a sudden. So the key trick which is at the basis of all the enhancement technique is finding a manner to simulate only the jump. So this is actually the key trick. Then all the different methods work in different manners but this is the key trick of the business. Okay how do I do this? Here I can do it well in many different manners. Here as I said in order really to give you even a hint on all the different manners I would take hours. But okay one idea is to pull the collective variable s. So once you have found that your system is nicely described by a specific order parameter for example something which describes the order of your structure then you can simply add a spring on this specific variable and simply pull it like exactly you would pull a spring and force this collective variable to change. So this is the idea which is at the basis of the thermodynamic integration which is a very well known method and also steered molecular dynamics. Then you can do something which is more rigorous which is a well I like to call it important sampling on reactive trajectory exactly like you do important sampling on configurations when you do Monte Carlo you can actually devise a sampling scheme which preferentially sample only reactive trajectories. So the most famous approach in the field is transition for sampling but actually there are other methods which are also related with this idea of simulating only the reactive part of the trajectory for example the finite temperature strain the nudge elastic band in the framework of the chemistry. Then okay another idea where which I will actually describe in a little bit more detail is the idea of flattening the free energy here I will spend the next few slides on that so I'm not going to tell you anything on that at the moment. And then another idea is to rise the temperature so of course if you rise the temperature your rare event become less rare but you cannot simply trivially rise the temperature of everything otherwise you simply melt the system and transform it into an ideal gas which is not very interesting. So you have to rise the temperature in a smart manner and you can do it in many different manners. One manner is temperature-accelerated molecular dynamics where you rise the temperature only of the collective variable only of S or you can rise only the temperature of the solid. It's called solid temperature. So all of these approaches have their own history their own advantages and disadvantages. Now I'm going to spend the last part of my presentation in describing a little bit more what it means flattening the free energy. So what is the free energy first of all? So here this is the trajectory that we were looking at before. From this trajectory what you can compute is the probability density. You can estimate the probability density as a function of your collective variable S and it will look like that. You will have a maximum here and another maximum here. This maximum corresponds to this state and this maximum corresponds to this state here. So from the probability density you compute you define your free energy. Your free energy F as a function of S is just minus KBT the logarithm of the probability density. So here if this is the probability density this is the free energy. So a minimum in the free energy corresponds to a maximum in the probability density and so in this specific system you will have a free energy which looks exactly like that with two separate minima. So here what does it mean flattening the free energy? So let's say that I start with this free energy here. If I do my simulation on a system with this free energy I observe a trajectory like that. Long waiting time and then only few transitions in this time that I am able to simulate. Then they say that I add to this free energy an external bias. So the external bias here is in this specific example which I'm going to show this weird form which is a sum of two exponentials. So the sum of the free energy plus my external bias would be this red line here. So you see is flat. Now if I simulate my system under the action of my normal potential which has this free energy here plus this external bias what I get is this trajectory here is the same system but now you see that the system is going up and down again and again and again. So this thing is actually making exactly what I was telling you before. So adding this bias allows me to simulate only the jump only the jumps but now and so therefore instead of wasting all my simulation time waiting for the system to vibrate and then for a very rare event which happens only from time to time I force the system to go back and forth again and again and this allows me to observe many many times the relevant transition. Okay so here this is great. This was actually invented in the 70s and it works great but well we have a problem and the problem is who gives us this fantastic bias potential which is able to flatten at least approximately my free energy because of course when I start my simulation I don't know how the free energy looks like right so this. So Alessandro sir you have four more minutes yeah why pardon four more minutes four more minutes yes but I am almost ready okay so how can one find a bias potential b of s which is approximately equal to minus the free energy and which therefore when added to the free energy give me something which is approximately flat. So here so I have many many options which were all used again and again and again across the years well first of all is using my chemical or physical knowledge to invent a correct bias potential option number one option number two simulations with simpler potentials I simulate it with a cheaper potential I compute the free energy and then I define my bias using this previous simulation so the other option is an automatic procedure I'm going in the last three minutes give you some hints on a possible automatic procedure which is called metadynamics so in metadynamics the bias potential flattening the free energy is built iteratively by an approach which consists in a loop described in this two lines here you basically put a small Gaussian and then you evolve the system for a short while under the dynamics of the small Gaussian plus the free energy so first you add one Gaussian and you go there then you got another one and you go there you go there you go there and you go there so basically this idea of adding from time to time a small Gaussian actually actually is a procedure which allows you to iteratively flat flatten your free energy and iteratively build a bias which compensates your underlying free energy so more rigorously you choose a collective variable s of x and you bias the dynamics with a historically dependent potential which has this form here here we have an example of the free energy which is at the beginning unknown plus the sum of these Gaussians which iteratively become more and more flat so here I have the sum of my free energy plus the bias potential defined by this equation which is flat which actually means that for large time my bias potential defined by this equation here is an approximation of minus the free energy okay so here is an example of a normal molecular dynamics and metadynamics applied on a system undergoing a chemical reaction which in a normal carparinello molecular dynamics would be absolutely impossible to simulate and here well and this is well really I am finishing yeah maybe with this movie where still you see nothing happening for the moment but now all of a sudden you see that the guy is able finally to do some stuff and performing a list a sequence of transitions in which the system actually jumps from one free energy minimum to the next and well exploring the conformational space in a very efficient manner so here you see that now now the system is actually performing a transition to a molecule which is called naphthalene so it's almost gone and now it goes to naphthalene we wait another second yes we wait okay now it's gone now it's naphthalene and with this okay well here of course I could show you other examples on systems more interesting systems on which we have applied these things we have studied many many systems with these approaches for example the folding free energy landscapes of a protein or the the free energy landscape associated to the nucleation of fibrils here yeah here I would show you many many examples but actually I would like to close with a slide where I show what in my opinion are the open problems where which are still at the center of the research well not only for this specific method but for many many methods so the first important problem is how to find automatically the best collective variables here here if you don't if your system is very complicated it's very difficult to to know in advance what is the best variable on which you should add your bias or pull your simulation or rise your temperature so this is the first important problem a second important problem is that using these powerful and unsampling approaches you actually observe many transitions you are actually able to explore very complex conformational landscape and an open research topic at the moment is how can I analyze automatically these complex conformational landscapes so both these two problems actually call for a combined approach using machine learning artificial intelligence and data science tools so this is basically well the reason why I personally more and more you trying to use approaches from this thing simply because they give you tools which allow you to address this problem which otherwise with classical method would be more difficult to address well with this I'm finishing my presentation and thank you very much for your kind protection thank you very much Alessandro for the very nice talk and so as usual now we are open to question let's see if in chat okay yeah there is hi can I ask a question yeah sure oh who is he I'm Uriel Morson yeah sorry yeah just for the question please raise your hand otherwise I cannot see sorry I don't know now go on please okay okay thank you Alessandro it was very interesting and this is a very naive question I don't know much about these methods but it's like two questions in one one is does the meta dynamics preserves the statistics like ensemble statistics like a canonical ensemble or I mean are you violating that or is it important to preserve it or not so if I can answer immediately this is a very important question it is not in the sense that it is a dynamics which basically basically adds a history dependent potential to your system therefore therefore the probability distribution of the system is of course not anymore canonical but during the years many people have done a lot of work and it is now this probability distribution is actually now very well understood and even if it's not canonical in the initial variable of the system basically people are now able to extract the original probability distribution original canonical distribution as doing basically a post-processing of the biased probability distribution so the answer is yes this dynamics perturbs the probability distribution but but there are results which in some conditions are exact results which allow you really to basically compute back the correct probability distribution from your meta dynamics trajectory very nice okay this then this answer my second question was how to obtain the free energy from from this method so well basically from this method is the free energy is naively speaking is minus the bias more rigorously speaking is minus the time average of the bias so basically in meta dynamics the bias becomes a dynamic variable itself is like having another dynamic variable this dynamic variable is a field because it's a function and in order to obtain an estimate of the free energy would take the time average of this bias so this is also not a rigorous answer because the rigorous answer would require having a blackboard and much more time but just to give an idea you take the time average of the bias okay so this is true even if the statistic is not preserved even if the statistic is not preserved this is rigorously true if the dynamics in the collective variable is a diabetically separated from the dynamics your other variables or in the limit of infinitely slow depositions there are technical details thank you very much sorry Alessandro we have a few questions yeah one from denis what collective variables did you use for your study well an enormous amount of collective variables the thing is that this is an art so if you go to plume the plume there is the plugin which implements all the rare event methods including meta dynamics you will find a list I think of 50 collective variables but just to give you an idea you can use like coordination numbers which describe the bonding pattern you can use the energy the energy is a total energy is a very good collective variable you can use the radial distribution function you can use the stress you can use whatever you want any possible function explicit function of your coordinates can be a good collective variable for studying your system so this is where the fun of our work is somehow no it's choosing the correct collective variable for any specific problem yes there is another question by from Lorenzo parlante is asking in your opinion what is the best method to define a realistic transition path between two states and could you recommend some tools or software for biological systems okay so here the most if you really want to define the pathway the transition pathway in a rigorous manner in my opinion the correct tool is transition path sampling the approach developed by in the group of David Chandler Chandler the lago and so on so here so this thing is something which is totally agnostic it doesn't require choosing a collective variable it simply does important sampling in the trajectory space with no prejudice and then you analyze a posterior these reactive trajectories they this group they developed their tools if you browse on the web for their names you will find the tools which allow you to do this thing cleanly so this method on the other hand doesn't allow you really to explore your configuration of space like we are doing in metadynamics is more a tool which if you know reactants and products where the system is going coming this allows you really to allow it rigorously how the system is performing the transition okay so maybe one last question from Alice Romeo she is asking what's the difference with gaussian accelerated md in which you automatically add the boost potential without pre-defining collective variables um well i i think it's so any method so i don't think that what you say is possible in the sense that what you can do is to add a gaussian acting on all the coordinates for example all the dihedral like it is for example done in not adaptive force bias in floating in so i don't remember the name of the approach but okay so but this approach does not allow you to compute the free energy because the bias it is acting on too many variables uh does not really converge because the the size of the space which you are exploring uh depends on the number of variables on which you are put in the bias so if you are biasing if you are simply putting a gaussian on the coordinates then you are biasing a space which has a dimension 30 000 and you will never be able to fill with gaussian 30 000 dimensional space okay yeah sorry i think we uh stop here thank you well of course let me say that if you have other questions you are more than welcome yes exactly to me uh private email or i don't know if we have a public chat associated with this so in any case my mail was on my first slide you can see it on the record if you want okay thank you very much alessandro again so we move to the next speaker and i have to ask you to accept the invitation to the breakout room hello sir can you hear me professor rhodriguez can you hear me yes okay you you can start the test please okay shall i start uh sharing my video yes can you see me okay and then my my screen can you see that yes can you activate the full screen yeah well okay right yes okay okay so we are already streaming so we are ready to start i'm gonna start recording i kindly ask the moderator to tell me if we can start in seven minutes hi can you hear me yes yeah i think it's fine so the only thing is that the first talk will be in the separate room so if the if the third and fourth speakers are already here so arie i think we'll join later yes if angelo is already here maybe we can try his screen sharing oh yes because professor micheletti does not want to be recorded okay right uh but we expect some more participants right we have 30 participants now yeah yeah but uh i mean if the so now we are not in streaming right yes we are in streaming because we are streaming all the day i see in a single block on youtube okay so if angelo is here maybe we can test his connection okay sharing let me check no he's not in yet okay so i think we can wait a few minutes and then uh around two maybe if angelo and adi didn't show up yet we will move to the other room and then start with the talk of okay so i'm back in a couple of minutes i need to i have another assistant sorry okay okay professor rosa is in so christian so angelo so we can test we can test your screen sharing angelo so that uh we are sure there will be no problem afterwards you are muted can you see it yes doesn't work jovanni yes yes yes can i do stop sharing yes of course later it will not work professor busi yes i'll answer them right away as soon as people arrive because even before there are a lot of delays i will tell them in the breakout room okay perfect so at this point since adi i don't think that if i understood well angelo and adi will arrive later okay so maybe we can directly open the breakout room and do the projection test with christian micheletti okay i would have expected if there were some problems to defeat the participants but if it is automatic we can already open it i think no no i directly manually okay by hand they arrive okay i go please all join the breakout room oh here can you hear me you hear me now is that me or you hear me now is this is started can you hear me now okay yes so can you hear me now you hear me now okay sorry guys i was spoiled with my assets okay good perfect now uh i see you can hear most of you yes perfectly okay sorry for that okay um that's great so excuse me can you hear me yes i ctp i cts id services okay yes i can hear you so are you the host no no i got some host password in case but so because i think we are supposed to record just a second let me claim the host and promote you okay sorry just a second i apologize we have several activities at the same time and oh i really understand so just uh if i may ah okay i can claim host yes but you don't have the host key yes i do actually this you have it yes i have it okay try to do it by yourself and perhaps you received some instructions with the host key about recording to you actually not so that's just the problem so now i have i am host and i can't record but i just want to know what shall i do now so if you have to record then you'll have to put the files on a google drive uh to link share with you we will provide you uh this uh link and we will kind the ask you to upload their files and after this we'll provide upload them on youtube okay nobody wanted to have records will be uh record uh last setup where it's going to on locals on local computer please never record until later so you confirm me you confirm me that you did not receive any google drive share link it will be provided to you as soon as possible okay anyway you can hear me first question is quite simple does any one of you have chemistry before for me that's all it's totally new you yeah it's beautiful for most of you it's totally new you have physics that suppose of the atmosphere nothing is that correct i do a full adjustment because i see all the things that's not the i like it a little bit of like what's very curious i'm assuming that this is everything okay good so i did not you can register him we have to send the google drive and put it inside the file let me check the last thing you were saying sorry you know this uh covid is killing us so it's the last year we did all remotely so other questions it's like is it working is everybody happy okay so no there are two breakout rooms i don't know why okay let me do this i i'm gonna close all the rooms and chemistry okay now i guess everyone is in the main room not yet not yet there are still some people i think that there were two rooms by mistake it would be important to have a single breakout room otherwise we make gonna recreate the room and i'm gonna open it okay you should receive that alex your microphone is off okay great yeah i think the transition was very smooth now everyone is in the room so i think you can continue can you see my slides yes yes okay perfect and so i was saying that indeed what we want is that the computer made for us this task that it's to identify the groups in our data and we want to do it even if the data is in high dimensions and if the clusters our groups are odd shape we will see later what it means but before going there i want to say why clustering we are in a atomistic people works what i will not say event let's say and why we are talking about clusters let me show you three cases that are not all of them but maybe kind of general one case it's the analysis of a molecular simulation in this case our data it's it's games this is a protein so it's a convert sampling of this protein that has something like 10,000 100,000 coordinates and from this simulation what we usually want is to obtain the kind of information that it's in there right that it's exactly the conformers and to obtain the conformers can be done by clustering without with few knowledge of what's going on in the protein another test case it's the chemical chemical databases for instance i have four well this is four but usually it's millions of components databases and when i apply clustering i applied inside a pipeline that allows me to reduce the number of compounds that i want to test against a given target for instance in the case of jack databases another case that it's also has biological applications it's games when analyzing sequences this is a protein but it can be also applied to RNA DNA sequences where from just the sequences i can group them and and these groups can be useful for instance in the right i have shown an example for a from a paper in which they perform a clustering in protein sequences in such a way that they identify which clusters are more likely to have proteins that have a given property in this case where mood killers something like that i don't remember exactly but the idea is you cluster all the proteins and you have in the underlying hypothesis that similar proteins proteins that belong to the same cluster would have a high probability of having same properties and so when clustering i took this slide from my lessons in clustering you first start from data and then you usually perform some kind of feature transformation that allows your data to adapt better to your target and then you perform the clustering usually then you validate the clustering and interpret them and usually from this interpretation it's where the knowledge appears but this project it has some this pipeline has some feedback because when validating you can came back to perform a different clustering or even can make can make came back to a different feature transformation but let's start from the beginning for the data samples one thing that you have to have in mind it's how many objects are in my data set why one thing is the computational power let's say if you if you have millions billions of data points there are some kinds of analysis that you cannot afford instead if you have hundreds it's much easier but in the other hand one have to take into account that for some methods there is a kind of statistical sampling needed you can think about converts molecular dynamics you need many points for converter dynamic simulation so also that it's something that you have to take into account at the moment that you decide that your clustering method more important even it's how is your data described for instance in the case of the atomic coordinates that I put at the beginning your features are the atomic coordinates real numbers right in the case of the chemical library and your data your features are the structures some kind of descriptors but also the three destructors both 2d and three destructors in the case of sequences usually you just have the letters okay and once you have cleared this data you have to deal with the feature transformation and feature transformation usually implies three steps or may imply three steps one is feature selection one is dimensionality reduction and the other one is distance computing let's start for feature selection what I mean with feature selection well here I just explaining you what is that there are many automatic techniques but it goes far away from the purpose of this talk and the question is that for instance in the case of atomic coordinates I can wonder if all of them are irrelevant for my purpose if I'm interested in the folding of the of a protein probably I shouldn't include the water molecules maybe yes but probably not the same occurs with the side chains I mean for instance people working on folding again usually just include the main chain atoms but if you are interested in other kind of analysis probably you need to include the side chains or even you can be and you can need to introduce the water molecules Ali would explain as later why in the case of chemical databases and the question is that which descriptors are relevant for my problem and this is much more difficult to to get because and there is a lot of expertise for instance you can think about the log p measure of solubility the pharmacophore profile that it's a kind of 3d descriptor connectivity indexes that depends on the topology of your molecules and so on so far but there are so many that you usually need automatic tools for choosing them in the case of sequences what may happen is that in you have to clean somehow your sequences in order that avoid the introduction of not significant segments mostly let's say in in when you have monomers against dimers or something like that it's probably you just need to take into account some monomers regarding the dimensionality reduction it's a feel by itself I guess going to introduce some hints but I don't want to do the talking dimensionality reduction and dimensionality reduction it's okay independently of the data that I have I can try to reduce their features to few features in the in this case I have just two features and this reduction can be meaningful or not meaningful depending on the methods and on the system there are many methods that allows us to use to perform dimensionality reduction this scheme came from an already kind of old review from 2009 there are much more methods now aga days but well I would not explain them but just to say that it's an open still an open problem and why because it can be really easy or can be almost impossible in the case that you see in the screen you can see that this line it's kind of easily transformed in 1d and this can be done by principal component analysis that it's a technique based on the covariance methods and it's a technique that works in let's say in linear environments let's say in hyper planes however if you have something like this 3d point distribution you with pca you cannot reduce the dimensionality from 3 to 2 it will not work and it will not work because this surface is twisted and it's kind of topologically complex and however people have been working on that there are methods that work pretty well on non-linear projections but the question it can be even much more complex if you see these distributions and I say that it's just a 2d distribution of points but this distribution can be summarized in a line if you want but it's difficult it's one dimensionality but to put it in a single line it's topologically impossible okay so sometimes the projections have not are not easy to do and this can be seen in two cases for instance this is a simulation from the results from an all simulation that I made nine years ago and in this case pca works fantastically and projecting the coordinates of the backbone in 2d with pca allows us to identify two free energy minima that allows us to identify two conformers of this mini protein it's a peptide that falls okay instead in the case of biline head piece we try also to project with isomap the coordinates of this it's bigger protein it's 32 degrees and the question is that by projecting with isomap you can see that okay the ground truth it's the color code that tells us the degree of folding and you can see that the folded and unfolded points configurations are mixed and this happens because isomap it's not able to project in two dimensions all the complexities that have the configurations of the folding trajectory okay therefore in dimensionality reduction we try to project into the of three dimensions but okay if it works it's perfect you can almost finish there and if even if it doesn't work in two three d if you project in let's say eight dimensions probably it will simplify the analysis posteriori however if we don't do it properly we can have an important information loss and if the dataset lives in a complex manifold it's not easy it can be difficult or almost not even possible to perform meaningful projections the last point of feature transformation that I want to comment it's distance computing and the question is that most of the clustering methods relay in some kind of quantification of similarities the most it means that for clustering we need to quantify how similar or dissimilar are two points okay two data points in the case of proteins would be configurations in the case of the chemical database would be compounds in the case of sequences would be sequence okay dissimilarities for thinking about clustering can be interpreted as distances and that's important because it allows us to reasoning in clustering terms and the question is that the lower the distance the more similar are two points that are in top of the order would be identical two points that are really far away would be really really different but our definition of distance depends also on the data type that we are dealing with there are some general distances that are pretty used like minkowski distance that it's a generalization of the Euclidean distance you can see that with d equal to i with p equal to two this recovers the Euclidean distance or the cosine distance that just take into account the angle between the vectors describing our data there are distances that are defined for molecular systems like the rmsd or the diagonal distance that defines its the Euclidean distance taking into account the periodicities of the dangerous angles also for binary molecular descriptors we have the jacar distance we have this hamming distance that allows us to compare two protein sequences once they are aligned the hamming distance it's increased by one if the amino acids are not the same or set to zero if they are the same so there are many many distances I just put here some of the most used in our field and finally we can start talking about clustering real clustering okay as I told you before well the clustering is relatively easy for humans but not always it's easy I mean now I'm going to show you many clusters obtained from this distribution of points what would be this that it's kind of trivial but you can argue that all of them could be correct so at the end the clustering result depends a lot in our definition of what is a cluster and of course we have to define a cluster according with our target when we perform clustering and this is the reason one of the reasons for which there are so many clustering methods okay however they can be somehow summarizing three types one is flat clustering that it's it performs just a hard partition of the data it's what we have all in mind when we talk about clusters it's just a sign it's data point to a group and I put here some widely used methods for perform this flat clustering and fancy clustering it's a bit more complex because the assignment to a group it's not univocal but it's a you give to each point a degree of membership to its cluster here there are other methods that employ fancy clustering and finally there is hierarchical clustering that generates a tree instead of a single partition and this tree it's you can see a kind of hierarchy I mean for people that have seen let's say nature programs where let's say the families in the animals or the taxonomies taxonomies are kind of hierarchical clustering okay let's see what are these points let's say in flat clustering everything it's easy let's say I have this partition where points in red belong to cluster one points in yellow belongs to cluster two points in green to cluster three and so on in fancy clustering I have a kind of palette of colors right I would say that cluster that this point the point that I'm signing with the arrow has a 90 percent of degree to cluster one nine percent of degree of membership to the cluster two and a very very few degrees of membership to other clusters this other point it's almost in the half between these two clusters so it has a degree of membership to each of them and also to the third one this one it's kind in the middle of many and so on so far okay hierarchy clustering makes something more complicated but by generating a kind of hierarchy you can recover the clustering instructor okay but this clustering instructor it's difficult it may be not easy to transform in flat clustering for instance in this case the two clusters are easily recovered but for recovering the five clusters I have to do a kind of strange cut right however there are some uh there are other differences in our clustering for instance instead of inside the flat clustering category I would like to remark two clustering methods I would say the partition methods that allows you to put together points that are similar my density based methods allows you to follow the profile the density the priority profile of the points distributed in the in your data the difference is that I mean there is not a correct clustering it depends on what what is the utility that you are going to use this clustering for according to your target you may want to use a partition method or a density based method okay for instance a typical case of k-means clustering of partition methods it's k-means clustering it's probably the most used clustering method employed and it works pretty easy let me explain you quite literally the first thing that you do for instance in this dataset is to randomly pick case centers you have to decide how many clusters work in this method you randomly pick them and then you assign each point in your data to its nearest center once you have this clustering partition you can recompute the centers that somehow will move okay and then you can iterate and by iterating the clusters I mean somehow you find an optimal and the linear partition of your data the only thing is that this method depends of course in the initialization in this case you can see that for this initialization there are two clusters that are divided by two while there are two clusters that are merged due to the fact that my initialization was not perfect okay different method it's based on density on density I would want to explain you its density peaks clustering and in this case like in all density based clustering the first thing that you have to do is to compute the density for each point and the density it's nothing else that a quantification of the number of points in the neighborhood of a given data point once you have compute the density that it's kind of trivial compute you compute the distance from the nearest point with higher density and we call it delta and then you plot this delta as function of density and what happens is that the points that are outliers in this graph happen to be centers of putative clusters by picking these centers once you have picked them you can try to assign its point for instance this red point by following the profiling density and arriving to a given center once you arrive to a given center to you assign all these points to the same cluster sorry therefore if you do it for all the points you finish having a full partition okay okay I don't want to explain you much more methods of clustering also because I don't know how much time I have Giovanni yeah a couple of minutes okay so I just want to finish with some applications and I can okay let's back to the applications that I told you before imagine that you want to cluster the billing headpiece by using a mix of density methods with hierarchical clustering you can obtain something like that that translates in this free energy profile okay this free energy profile you can obtain it because you have the densities by using that then feeding a Markov state model with this output you obtain a five states Markov model for billing headpiece okay another application it's in the field of chemical diversity let's say these people just make a hierarchical clustering based on the scriptors of the compounds mixed with the fungal they are fungal they came from mushrooms mixed with the genetic analysis of the of the mushrooms and they managed to obtain this nice hierarchy that explains how the diversity on these products was worn finally in the case of protein families we work in with the pure family and we obtain that this clan that it's a clan the clan of poor can be divide in many families according to the the clustering results are in the left while the ground truth is in the down and you see that even not exactly somehow the clusters let me show it let's say all this part of the dendrogram that comes from the cluster reproduces this family this part of the dendrogram reproduces this family almost perfect and so on so far let's say the color code it's the purity of a given cluster in architectures coming from these families so just having shown that clustering can be employed for many things in our field I just want to thank you and I will be happy to replay any questions okay thank you Alex so now and thank you also for being in time so now we have time for questions okay so there are questions on the chat I will read them so one question is can be hierarchical clustering associated with decision tree in machine learning uh no I would say it's a different thing I mean decision trees are based on single features let's say but while in which for each feature you have you take a decision okay let's say if a feature is bigger than a threshold you are in a in a class if it's lower than the other threshold you are in another class while here the hierarchical clustering it's dealing with the all the features at the same time it's taking us in the distance so it's a bit different so there was uh Uriel asked raising the hand but I cannot see anymore so let me see on the chat so Isabel Luisa wanted to ask a question so yes go ahead thank you for the interesting talk I was wondering about the feature selection for example if we want to check the peptide confirmation from a trajectory is it useful to use many features like RMST hydrogen bonding and also dihedral angles or is it better to stick to one well what we do usually it's to use many of them and then compare the results of the procedure okay if your features I mean if according with our feeling it's that the all the these features should give you comparable results otherwise you are having some problems yeah so this is a kind of sanity check for our methods we're taking many different metrics and we see they are giving you the same result okay that makes sense thanks okay then we have a question from Daniel who is also raising his hand I will read the question then Daniel just stop me if I don't read it correctly why the center points happens to be the outliers I think it's referred to density peak class okay the idea is that the key point is the delta definition in which you find the minimal distance to a point with high gas density okay so the question is that the maximum even points with high density have low the low delta if they are near another maximum right so at the end you are finding maxima in the density distribution I don't know if it's clear yeah yeah thank you thank you Alex. Then a question from Andrea can you explain how to perform a validation for clustering? Yes but it's a bit long the question is that I decide not to include it first well mostly they are okay let me short make a short introduction to that there are two kinds of validations that you can perform there are internal validations that depend on the structure of your clustering and external validation in external validations you are comparing with a ground truth and you have metrics like normalized mutual information for instance for comparing with ground truth in internal validations you check some characteristics of your clusters but the problem with this internal validation is that um till the moment they heavily depend on what do you think your clusters should be which not it's not always the case so the problem is that many cases when you compare internal validations in cases where you have a ground truth it doesn't happen that the best clustering is the one with the better internal validation index this is a bit delicate okay then there is a there are two questions from Suman the first is how to find or choose an optical metric to classify a given set of data because the efficiency of clustering the data by this matrix can defend can depend on various factors and the second question is how to pick an optimal machine learning technique like support vector machine or macro state model to explore a particular problem should it depend on the complexity of the system okay well for the first question I would tell you my experience is that the best feature selection is done by humans and let's say human experts are the better ones in choosing features and then once you have these features you adapt your metric to these features however there are many let's say many techniques that allows you to automatically choose some features all of them are relayed somehow in a kind of ground truth so what you do usually it's trying to reproduce a ground truth that not necessarily it's in your system but maybe in a similar system with some features and then you just pick the features that better reproduce these ground truths once you have that you can assume that in your system will happen the same and take the same features then of course the distance will depend on these features okay then the second question was how to pick an optimal machine learning technique to explore particular problem it's very general question but that's a really really general question I mean I also because there is not one method I would suggest you to really change methods in the same system I would start with an supervised machine learning and then if you have a ground truth also try to use some supervised machine learning techniques then it really depends on your data I mean and really depends on how big is your data and how complex are them in general for quite big data sets I would go for neural networks for instance but in other data sets it may not be the case it's really the case dependent okay and then there is another question which is what from Srabahani what kind of clustering would be useful to predict if we are on an unstable manifold of the phase space? An unstable manifold of the phase space okay I would a density based clustering would give you an estimation of the free energy in which with this free energy you can check if there is I mean as far as you have sampling also the most stable states you can check it if not there is no I mean you need that your data have some kind of trace that there is another state more stable otherwise you cannot learn it from your data okay good so I don't see any more questions on the chat so I think we can continue we can thank Alex again and then we can continue with the next talk so if Alex you aren't share your screen great and then Ali should be able to share his screen okay all right can you see my screen yeah we can see it we cannot see your face uh hold on well okay it's not working now okay so I mean we can hear you clearly yeah okay okay yeah okay good so the next talk will be given by Ali Hassan Ali who is also from the ICTP so please Ali okay let me all right thank you very much Giovanni and I apologize for my video not working so let me thank the other co-organizers in this activity especially Angelo for for putting this together and I also want to apologize for perhaps some technical hurdles that we've had today we're all in a learning curve with respect to virtual activities so what I want to tell you about today is regarding a problem that I've been interested in for for many years and that is the role of water and the goal of my talk today is to try and convince you that water the solvent around biomolecules in particular plays a very important role in in many things and so just to get you warmed up and to wet your appetite no pun intended um by the way if you have any questions at any point in the talk please feel free to interrupt I'm happy to not finish my slides today um so you know aqueous solutions are present in in all our daily lives uh I live in Italy and so in Italy I'm inspired by things like pasta and cappuccino and you know there's some very basic fundamental processes in physical chemistry involving boiling your pasta or making a good cup of coffee that involve things like you know the thermodynamics of surfactants how they interact with soap bubbles and how water interfaces with these processes so just to build your intuition a bit and to calibrate you on a bit of some numbers we can ask ourselves the following question which is how many water molecules for example are there in a cell and if you take the E. coli which is shown here in this bacteria which is shown here in this picture you can sort of count the number of different types of molecules that you would find there so roughly you'd have on the order of 10 to the 6 proteins you'd have roughly on the order of 10 to the 7 ions and all of that is surrounded by a big big big bucket of water and so the uh the number of water molecules in an object like an E. coli is on the order of 10 to the 10 and so the one thing I would like you to take away from this talk is that in all these models and simulations that we show and we discuss and when you think of processes in biology you always have to remember that of course these objects are very well hydrated and the water plays an intimate role in coupling to tuning the thermodynamics and the dynamics of the biomolecule so let's get into something a bit more specific and ask ourselves well what's inside a brain okay so these are images of on the left side of a brain of a normal person on the left and on the right here is the brain of someone who suffers from something called Alzheimer's so this is a neurodegenerative disease which leads to brain atrophy and neurodegenerative diseases have been you know mapped at the molecular scale to the formation of things known as amyloid fibrils this is depicted here in this spaghetti like a chain of mesh of proteins and if you dig even deeper and look even closer what you will find is that these proteins or these amyloid fibrils are made up of things known as beta sheets so these are it's a secondary structure of a protein which is held together by hydrogen bonds now all this stuff all these beta sheets all these amyloid proteins are surrounded by water molecules and so one of the questions that I've been interested in trying to understand is how water plays a role in stabilizing objects like this amyloid beta sheet so it's actually really nice to maybe go back a bit in history and to see what people have been talking about this and thinking about and if you go back you know three decades ago to one of the first papers by Michael Levitt who you heard today got the Nobel Prize in Chemistry many several years ago they performed their group performed the first simulations of a protein in water and compared it to simulations that had been done in the gas phase and you know it's very interesting to see here how you know the time scales of how MD simulations have been have evolved from three decades ago to now and so you know on those three decades ago doing tens of picoseconds was a was an enormous achievement and it was and one of the the messages that came out from this paper which is published in Scientific American was that you know water plays a very important role even though it makes it much more complicated to simulate a protein in in tuning the structure of the protein and you know there's been a lot of work over the last three decades from both the the simulation side theory and also experiments in in trying to understand how water plays a role but there's still a lot of interesting open questions in the field so this sort of leads to a motivating a bit of what I'm interested in and so I'm interested in creating aqueous solutions on the computer so I like to take different types of molecules so things like the amyloid protein that I showed you earlier on ions like the proton I'll tell you a bit about this and hydrophobic molecules like methane and inserting them into water and trying to understand how these molecules respond to the fuel of the water and vice versa so how does water respond to the presence of these molecules and secondly does water just play the role as being a spectator in all these processes so this probably doesn't need much introduction to you you already saw a good introduction from Alessandro Lio this morning so I'm interested in in in trying to design computational experiments where we try to understand how the complexity of water affects a different process and so we use a different types of flavors of molecular dynamics some classical when we don't have to deal with quantum mechanics for example things like bond breaking, bond formation and the real goal and this is what I want to try and show you today is to try to infer or come up with smarter water-based collective variables reaction coordinates okay and so in order to motivate this a bit I want to show you take you back several years ago it's a movie that never gets old for me and this is a very interesting problem in chemistry which forms you know the basis of pH so there's this dynamic equilibrium between neutral water and its ionized products the proton and hydroxide ion and this dynamic equilibrium determines the pH of water and so these were ab initio MD simulations that we ran many years ago and I'm just playing the movie now where you're going to see some interesting things happen in the movie you have the excess proton here on the left which is the hydronium mine so it's positively charged on the right side here you have the hydroxide OH- which is negatively charged and this is surrounded by a bucket of water and there are some fluctuations going on here where you see you know the the proton here is is is exchanging with its partner in a strong hydrogen bond and something interesting is going to happen here at the end of the movie where there's some fluctuation in the liquid which leads to these three protons jumping over at the same time and neutralizing the two ions and well the way we we studied this problem many years ago is the typical thing that chemists do which is you you take out the VMD you load the movie and you try to look with your eyes and try to identify where the fluctuations are happening what are their the relevant coordinates and I'm still a big fan of that but what I want to try and convince you today is that there are a whole range of length scales and time scales of fluctuations in in liquids like water which are very difficult to see just by using your eyes and so the combination of using chemistry-based approaches with the type of approaches that Alex introduced in his previous talk I think have a lot of prospects of understanding new physics and new and new chemistry so the the the reaction that I just showed you is not necessarily biological per se but there are many processes in biology where proton exchange plays a very important role and this is an example of one where it's well known that the amide proton of the amide bonds of proteins when it's exposed to solvent can exchange its proton with solution and this is thought to be catalyzed by the presence of hydroxide ions and this becomes a it's a very important experimental technique in order to identify which parts of the peptide backbone are exposed to solvent and which ones aren't so again that for this type of a process where you have some large fluctuations involving your protein and your water they become very challenging to just study by just you know using your eyes and typical chemical intuition-based approaches and so what I want to try and do today is tell you two stories if I get to it the first one is related to how water acts as a lubricant for biopolymers and some very recent efforts at using some more data science approaches to studying water networks and if I get time in the second part of my talk I want to tell you a bit about this problem that I introduced you at the beginning which is the optical properties of amyloid proteins and some very interesting questions which you will hear about in one of the talks tomorrow on the importance of electronic excited states in biology. Okay so let me get started and get to the problem so you know you've seen a lot of simulations today of different types of proteins and the typical thing to do when you have a protein let's say you have a protein in its folded state and a protein in its unfolded state is to try to come up with you know some order parameter that allows you to distinguish between the folded and the unfolded state and one of the assumptions of course that one makes when you do this is essentially you're looking in some reduced dimensionality of protein coordinates which I didn't notice p and you're essentially integrating out the solvent degrees of freedom and so I am interested in trying to understand how does that solvent modulate the free energy landscape of these polymers or proteins and so there are lots of different ways in which one can try to interrogate or understand how solvent you know affects the protein as you know as mentioned in some of the talks one of the first things you need to do is to try to come up with a relevant reaction coordinate or an order parameter or a collective variable that allows you to identify in your solvent degrees in your solvent space some changes that are going on and so I don't have time to get into the details but one of the things that we've found to be extremely useful is and that appears to give us much more information into the underlying free energy landscape of the solvent is by looking at topological properties of the water network around the protein and so the idea is extremely simple it's been used previously in other contexts that you know water is made up of a directed interactions because of the hydrogen bonds and you can map this to nodes on a graph where you have outgoing and incoming edges where the directionality of the edge corresponds to the direction of the hydrogen bond and if you imagine a protein and here this is a segment of this amyloid protein you can think about lots of different water wires or paths that connect different donors to different acceptors so every donor for example here the donor is an mh group and the acceptor is a co group can act as a source and a sink of hydrogen bonds and therefore different types of water wires so we've been studying the role of these of these water wires in changing the free energy landscape of proteins and I just want to give you an example of how it can change the landscape so this is some simulations of a some md simulations of a six amino acid segment of an amyloid protein and the free energy landscape is shown as a function of the distance between the two termini on the x-axis and on the y-axis is the radius of gyration and what these different minima here on this free energy landscape correspond to are different states or confirmations of the protein involving different amounts of hydrophobic contacts for example in this case in this case here in state a and in state c you have more hydrophobic contacts whereas in state d and state b the hydrophobic groups tend to be more exposed to solvent now the question is well where did the solvent go how does the solvent affect the free energy landscape well so in order to answer that question you need to know or you need to figure out which degrees of freedom you should look along the solvent coordinate and so again just to give you an example of how the solvent can affect the free energy landscape what i'm showing you here again is is is another reconstructed free energy landscape of the end to end distance between the two ends of the this amyloid protein on the x-axis and on the y-axis this is a a reaction coordinate that basically counts the number of paths that you can have between every donor and every other acceptor in in the protein normalized by a certain factor and what you see here i want you to focus on this oval region is that there are several minima along this solvent coordinate that correspond to fluctuations in the solvent coordinate but where the salt bridge is basically constant and so in in all these other if i go to the previous slide all these in the solvent coordinate is is basically collapsed in this figure over here and but when you when you projected out you see clearly that there are some other small minima that appear along the solvent degree of freedom so now in this system you have four amino acids actually six amino acids and there are already over 50 different water wires that can form between every donor and every acceptor i say this because just to give you a sense of that of where the the problem of dealing with a lot of data comes into the picture and so in order to to dig deeper into this and to understand a bit better about how the solvent affects the fringe and landscape proteins we had to break it down and go down to an even simpler system and so this is a system that has been studied a lot in the in the literature which is trialanine and trialanine can be classified into structures that are let's say more alpha helical versus ones that are more beta sheet and in this system they are a total of about of 16 water wires so there are 16 different paths that you can form between every donor of a hydrogen bond and every acceptor of a hydrogen bond and so the question that we were interested in asking was well you know can i look at the the water network around the protein this case is a small peptide and see is there a difference in the secondary structure alpha helix versus beta sheet and and so for this system i guess it's not too much of a surprise if you look at the the distribution of the the water wires that connect or that thread the alpha helix confirmation versus the beta sheet you see that in fact there's there's a gap between short paths and long paths for the alpha helix this has to do with its specific constraints of the alpha helix whereas in the beta sheet because it's more open you get the shoulder here in the beta sheet configure second structure that's not there in the alpha helix okay so so this is just looking at the the water wire distribution around the secondary structure but you know we can do a bit better and so i started talking to alessandro lyo who has developed them together with alex rodriguez who you heard from in the previous lecture who developed some very interesting methods on trying to handle and cluster high-dimensionality data and so one of the the techniques that they've developed is this intrinsic dimensionality estimator based on the two nearest neighbors of every data point and so i don't have time to get into this but basically by using the two nearest neighbors of every data point you can show that the cumulative distribution of this ratio can allow you to extract something known as the intrinsic dimensionality what this physically corresponds to in the case of our water wires or water network is it tells us how many independent directions do the water wires fluctuate in and so we we applied this to this alpha trialanine system so we have 16 wires we extract the intrinsic dimensionality of the alpha helix versus the beta sheet and you already see that there's a there's a difference between the two so this is already interesting because the intrinsic dimensionality is telling you about in the in the water space what the difference between beta sheet and alpha helix is and this sort of makes sense right because beta sheets are more open whereas alpha helix is more coiled and so you might expect something like that and then you can also construct free energy landscapes dendograms like the ones alex showed you and what you see is that if you look at the alpha helix versus the beta sheet this is along the solvent coordinate only you see that the alpha helix free energy landscape along the solvent is much more rough it has a lot of ripples that are very faint but it's markedly different from the free energy landscape of the water around the beta sheet which is essentially flat so there's only one minima and so what this is telling you is telling us is that basically the solvent at least at room temperature in a very faint way introduces some roughness into the free energy landscape of alpha helices one more thing we did was to compare how our beloved water wires compare with standard ways of looking at solvent around biomolecules so things like coordination numbers or essentially densities or geovars and what you see here is that the coordination number is rather dull and boring it doesn't really tell you about the underlying complexity of the free energy landscape and this again sort of makes sense because the thing that's contributing to creating this roughness of the free energy landscape is the orientational correlations of water dipoles around the protein and this is something that's averaged out in the coordination number but it's something that you get when you use the water wires and so you can now go in and this is why I think this combination of chemistry based approaches and data science is extremely interesting you can go in now and try to understand what these different clusters correspond to and what you learn is much more nuanced so what you learn is that the origins of the roughness of the free energy landscape is not made by very strict rules of this is purely hydrophobic this is purely polar context this is purely water what you see here is that there are very very subtle differences in the water wire distributions the number of backbone contacts and the number of side-chain contacts which in this case correspond to the number of essentially the hydrophobic interactions and so all these various things contribute to the coupled free energy landscape of the protein and its surrounding solvent okay um so uh Giovanni how am I doing on time uh yes you have uh I think five minutes okay fantastic so in the five minutes I will just tell you um uh something very brief uh but something that I find very interesting and and that is uh so I just mentioned to you a couple of indicators of what may cause the origin of roughness along the solvent coordinate there are also other factors that contribute to the roughness and something else that does is is understanding how empty regions or empty space looks like in in liquids and so if you think of a thought experiment where you put a hydrophobic molecule or an ion in water what you do is if you're interested in the thermodynamics of that well you will sit yourself in a box of water and what you will do is you'll wait for some time before you can form a cavity and this cavity is an excluded region of volume it's a hard sphere essentially where there are no water molecules inside here and by studying the statistics of the formation of these empty spaces you can learn something about the salvation thermodynamics of inserting hydrophobic molecules in water and so this is something that has been studied to death by many people in the field but typically people have usually mostly only focused on you know nice spherical objects and we all know that biomolecules are not spherical cows and so for a while now I've been working on and thinking about how realistic regions of empty space look like in liquids and I won't get into the details but so this is just a a sample obtained from a molecular dynamic simulation of how realistic regions of empty space look like in water and so these these are distributions here you don't need to worry about the the mat just focus on the water ones where these distributions tell you the probability of finding a normalized volume of void it's normalized by the volume of a water molecule and what you see here is that there's a there's a peak at small values sorry sorry I should point to the water and there's a tail to larger values which correspond to these types of voids that look like this so these are very branched dendritic shaped voids and if you stare very carefully you might be inspired to look for interesting resemblances of this to a basketball player and this guy here on the left is a small dinosaur and if you're even you look even harder what's curious about this is that these dendritic shaped voids look like small polymers small molecules chemical molecules and so this is something that was done in in collaboration with former postdoc Nergis Ansari who's in the audience and one of the things we asked was well can we find a shape of empty space in the liquid so without a polymer that resembles the shape of that you find around a polymer and in fact you can and you can translate this into something more thermodynamics which is the following which is what is the salvation energy of solvating a hydrophobic polymer like this one here and by the same time you can also ask what is the the free energy required to create an empty space like this in water and what's interesting is that you find that you know these two numbers are are essentially consistent with each other and so they are these what rare fluctuations in water itself that that create shapes that are consistent with the thermodynamics of solvating small hydrophobic solvates okay so I actually don't have time to go through all this so I'm going to skip it and would rather take questions oops so afraid my computer is frozen okay okay let's go to the end and so I basically I didn't get time to tell you about the first one but what I tried to tell you about today is how essentially water plays an extremely important role in in coupling to the fluctuations in in biological matter it affects the thermodynamics and the dynamics and there's still a lot of interesting open questions in the field and I think there's a lot of interesting synergies between combining chemical intuition based approaches as well as data science approaches to learn new chemistry new physics so with that thank you for your attention thanks Ali thanks a lot for being in time so now we have time for questions so just write on the chat if you want to ask a question you can either write the question and I will read it or you can ask me to speak okay so there's a question so from Davide so what could be the outcome of the fact that little polymers can occupy the void left by water okay what could be the outcome well so there are two things one is it's a there's a question of just thermodynamics right so yeah so let me let me answer the first part so you know the way typically in in the physical chemistry community the way people typically have thought of salvation of hydrophobic molecules is that you need to there needs to be a huge reorganization of the solvent of water in order to facilitate the presence of the hydrophobic molecule and what we're what we're saying is that it's it's actually not so much like that for small molecules it's that you know the intrinsic fluctuations in the liquid can create shapes that can facilitate the presence of those molecules without having without paying a significant energetic cost okay so that's that's the first that's a implication yeah there's another question from Davide which is maybe ready to ready could it have an outcome in the drug discovery field could it have an outcome in the drug this is a well a difficult question maybe I think that so the question is in in the drug discovery field what would be interesting to think about and wonder is when you have a drug interacting with a protein how are the dendritic voids and empty spaces around those two objects talking to each other okay I think this is is not understood you know there's been quite a bit of work trying to understand how dielectric properties for example between two proteins or a protein and a drug drug change when they come into close proximity but where I think this has a possible relevance is you know really trying to understand how these these fluctuations around the drug around the protein may couple to each other okay then there's a question from Lorentz so that is with the aim of reducing the computational cost so what do you think about the hybrid salvation I mean for example a shell of a lot of water around the soil to all surrounded by coarse-grained water molecules um so I think it it really depends on what your questions are and what you're interested in and so I'm generally not a fan of coarse-grained models I understand the need to do them you know what I would say is the following I think the let's say the building evidence from experiments in theory is that water around biomolecules can can feel orientational correlations up to about 15 angstroms okay so if you think that these types of long-range orientational correlations will affect the let's say the process that one is interested in then I would say one has to be very careful about how you choose the thickness of the hydration shell okay but you know as I said I don't think there you know I can equally buy the argument of you know if you want to simulate large complex systems it obviously makes sense to do some type of coarse-grained so I think it's it's really depends on your question okay then there's a question from Andrea in atomistic simulations how different water models for instance tip 3p or spc affect these properties and or analysis yeah that's a that's a fantastic question and so basically the bad news is that tip 3p which is the model water model that's used in in many biosimulations is quite horrible for for water alone okay and now it depends again it's sort of similar to my previous answer it depends a lot on whether one is interested in thermodynamics or dynamics for thermodynamics you know I think you know choosing the the the water force field with the appropriate protein force field you sort of more or less are able to converge similar results when you look at the properties of the protein now on the other hand if you look at the if you're interested in understanding the properties of the solvent around the protein then it can be very sensitive okay and it can vary from one water model to another so so yes results can be sensitive to the choice of of your water model particularly the dynamics for the things that we looked at which are these topological properties they are much less sensitive because it's you know most water models have more or less gotten you know the the mostly tetrahedral networks okay with with some defects and so and so far so that's more or less correct okay then Sudarshan would like to know what tool did you use to get the hydrogen networks around alpha helix and beta sheet ah so I I once had a very good PhD student who who who left for North Korea and he wrote all sorts of things for me in C++ but there there are lots of tools available hours a much better much more efficient ones by other people who are developers so yeah okay then there is another question but it's it's a duplicate basically what is the basis for the selection of water potential models yeah I think this is uh it's a repeat so yeah yeah yeah good so any other questions from the audience or comments okay there is another question from Marcelo so is there a linkage between the void of water molecules and the polarization of them if so how would it be how would it possibly be connected with the molecules you've shown yeah that's a that is a great question and so in fact they are we have someone unpublished or ongoing work right now that examines the the extent of charge transfer between water molecules closed to avoid versus being far away so you know you can think of a void as as a as a sort of an enclosed a short length scale region of analogous to an interface okay and essentially there is a polarization at these interfaces and around these voids and where I think it will play a much bigger role is when you start to introduce charged molecules so if you have a molecule with a you know well it doesn't actually have to be charged yeah something with a dipole moment a quadruple moment may couple in a non-trivial way to the the void and its corresponding charge polarization but that's a good question yes okay good then there's another question from jamal that is what kind of interaction will it have in empty spaces uh sorry which kind of interaction will it have in empty space i don't know jamal can you yeah could you clarify what you mean okay here jamal so okay good so maybe we can thank Ali again and then proceed with the next talk so the next talk will be given by angelo so i think you can share your screen can you see it yeah clearly so the last talk of today will be given by angelo rosa who is also from the physics and chemistry biological systems at so yeah thank you Giovanni thank you all for listening especially for asking very nice questions i'm very happy about that they said it's sort of an experiment for me this event so it was actually it was very nice what ali said at the very end because he said i don't like very much course grade models and actually instead i do and so that's a nice way of sort of connecting to the previous talk and also to connect it to the talk which christian micaletti has has done in fact the the first part of my talk it was very short but it will illustrate a little bit sort of a few things about polymer physics in general and why actually you can use course grade model to learn a lot about the physics of polymers okay and why they make sense i mean this is probably a trigger for many of you but since this at the origin was meant to be a school so i preferred to keep also some pedagogical opportunity to say something so briefly i mean i'm sure here everyone knows what a polymer is so basically it's a it's a big macro it's a macromolecule it's a big molecule and is made of out of a polymerization process so you just can put one after the other some minimally units which conventionally we can call just monomers and so they exist in different i think they exist in different architecture and also what is very important in different states so if polymers are quite diluted in dilute in your solution so you have a sort basically you have a gas the properties are very similar to the ones of a gas then if you start increasing concentration you go to a liquid like state and then if you increase the concentration even more you got something which is pretty much like a solid states for instance rubber basically is made of a polymer so join together and but it looks like a solid at least on some time scales and length scales of course now we we have biopolymers and we have seen a lot this morning also this afternoon so biopolymers are proteins nucleic acids and actually nucleic acids it will be part of my talk and the second part and we have chromatin so chromatin actually more chromatin than DNA will be the second main actor of my the actor of my second part of this talk and the chromatin is basically is a is a is a is DNA plus some proteins so but I mean to to to to all extend is is a polymer itself so now why I mean what is polymer physics so polymer physics is basically is the study of is a polymer conformation so if you take for instance a very simple polymer which is polyethylene the single monomers I mean the two single monomers when joined together they can exist into two different actually in three different states two Gauss according to the orientation of the bond in one trance which has a smaller energy and depending on how these bonds are arranged I mean they I mean a long polymer chains can attain a a high number of conformation so it's an ensemble of conformation we don't have only one unique conformation and so the start the polymer physics I mean the object of polymer physics is the study this is the sort of the statistical mechanics of starting from this conformation as you can imagine based on the n is the degree of polymerization of the polymer chain so these polymer conformations that grows rapidly are basically the increase exponentially fast and so then you can use the tools of statistical physics to study any polymer problem I mean where physics enter as an actor so there are the basics idea were firstly formulated in the I mean between the 30s and the 60s of of the previous of the 20th century by many authentic over like Kuhn, Eugnes, Schochmeyer, James, etc while let's say let's say more statistical stuff like the study of polymer solution polymer melts and the role of entanglement we have listened to you stalked by Christian where we talked a lot about entanglement were formulated by many people like Edwards, Declasdo and especially the Gen who got actually the Nobel Prize for his work on on such kind of stuff so let's say you know if you want to formulate a polymer model for sorry a physical model for for polymers you start from something very simple and there are basically two let's say two simple polymer models one is the freely joined in chain where basically the joint between two two two monomers is completely free to to to rotate into the space and another model is the freely rotating chain so now this angle theta between one bone and the next one is a fixing space is not allowed to to take any any value between zero and pi while you have still some freedom orientation in the azimuthal angle phi of phi and which can still move between zero and two pi and now what you can ask is what is the typical size of this polymer and you can do calculation etc in this case and you can see that in the first case basically it's just a it's from the the free joint in chain is just like a random working space so just it increases linearly with number of monomers by some pre-factors which is the average bond length between two monomers well in this other case the chain is slightly I mean in general depends on the on the value of this angle theta it's more or less stiff but still it increases like n in terms of the number of monomers so now all these two models actually uh let's say they belong to the same universality class in sense that you can define a typical length scale for each of the two model which is called the Kuhn length which is a measure of how uh let's say of of the rigidity of the polymer of the local rigidity of the polymer and then everything I mean the typical size of the polymer can be recast in this universal form so and where nk is total number of Kuhn lengths of Kuhn segments if you want of of your polymer so in this tells us very important things that somehow these laws are sort of universal I mean you can still have basically the same model described the polymer chains regardless of the microscopic details and and this is oops ah yes and and this tells us a lot I mean about what polymers are basically polymers are fractal objects and if they are fractal objects of course this is very nice because now we can employ to study them all the ideas that comes out of fractality on the and the self similarity and basically when employ things like those in order to study models now for real chains now by real chains I mean the following in the previous in the previous slides here I have basically I mean any two monomers can just occupy the same portion of space right there are no correlations so two monomers can overlap together this is of course not very realistic and while we need to exclude those conformation which where any two monomers can share the same portion of of space and in this case so it comes I mean something which becomes very more difficult because now all these kind of models cannot be solved exactly and so in order to get an idea of what's going on we have to employ either mean field theories like the ones that were formulated by Paul Florin another Nobel Prize winner actually for his work on on on the physics of polymer on statistical mechanics of polymers and so you do so if you apply this tool you know I mean you can get that the typical polymer size now is no longer increasing like the square root of the number of monomers but it's different power law means meaning that the polymer is more swollen or you can employ as I said since polymers are fractal objects you can employ something very sophisticated like like basically renormalization group and sorry I just want to okay like renormalization group and if you employ renormalization group it means that you can basically use all the tools that were derived in the in the 60s by guys like again the gen basically and the croissant and in order to what you do basically in you can have your polymer chain then you construct sort of super monomers in such a way that you make a coarse graining of of your polymer chains so what you do you coarse grain your monomer and you also coarse grained interaction and if you do that you can apply iteratively this this procedure in such a way that you arrive to an estimation of how the chain grows inside with the number of monomers and the result actually becomes pretty accurate because you get an exponent which basically confirms the mean field but it corrects it because now this is rigorous and at the same time it tells you that an important thing that basically you don't need too much information on the let's say on the local interaction but what you need is some sort of simple model because then only interaction are renormalized so then you get the right answer so that's why somehow you can translate all these ideas also in computation you can use especially for the large-scale behavior you can use a coarse grain model and you get study big system and one typical so and then I show I go rapidly to an application of this idea where I can employ actually where I need actually to employ a coarse grain system because the scales are too big and the system would be too large that a fine grain or a full atomistic simulation is impossible or full atomistic approach is impossible to do so the the system yeah sorry okay sorry the system that I have in mind is that the problem of chromosome folding so what is the problem of chromosome folding so so far Cristian also has mentioned here in this morning so you have DNA but in cells I mean DNA is quite complicated since that you have not only bare DNA but you have DNA plus some proteins these proteins are called estons and the estons sorry the DNA is wrapped around the estons and they form a neck a sort of necklace this necklace is called the 10 nanometer fiber because basically the let's say the average diameter of this fiber this fiber DNA plus protein is about 10 nanometer then what it is observed that in some specific condition this 10 nanometer fiber is folded into a super fiber let's say which is called the 13 nanometer fiber because again the diameter of this fiber is around 30 nanometer and then I mean under some condition these things take the form of chromosomes some of these let's say these x shape chromosomes x shape that you can find in test books okay so so that's a basically the biology of chromosomes now what is interesting about that is the following it was observed now more than actually the ending of 19th century by these two guys Karla Raab two german biologists Karla Raabol and Theodor Boveri that actually chromosomes are not sort of randomly organized inside the nuclei of the cell but they take a very specific shape so that at the time was a really generic instance that they were studying that from you by using optical microscopy so the study nonetheless I mean they could predict something they could observe some things but I mean it was not very quantitative in order to get some I mean a more quantitative picture we need to go to the 70s I think where these two german guys actually two german brothers they invented a technique which was very clever in my opinion so what they did they used some laser beam on the nuclei of some cells I don't remember which cells were where eukaryotes so cells with the nucleus and they beamed the nucleus of the the nuclei of this cell so they said that the following so if the hypothesis of let's say of Raabol and Boveri were correct so namely if chromosomes are not just randomly organized but they have a structure inside the nucleus so then we should observe the following so if you imagine that the chromosome are just mini chromosomes just like a tangle of structure of the DNA of fibers and if you beam them so then sometimes you destroy I mean the laser is so strong that destroys DNA so it makes a breaks so I mean if you do it just in some random spot then sometimes you eat the chromosome sometimes you eat another chromosome because they are all mixed like a spaghetti dish okay instead if they are organized then you should I mean and if you beam then again then you should only destroy a few of those just a very small number and actually that was the outcome of the experiment okay that was done back in the 70s so this picture of non-random organizational chromosome was confirmed actually by another technique more more sophisticated actually which is called fluorescence which is called fluorescence in situ hybridization where basically you can design specific probes for selected sequences for selected DNA sequences on chromosomes you can sorry you can design probes with different colors according to the sequence you want to probe and then in the end by using high fluorescence microscopy you can basically see chromosomes inside the nuclei of the cells and so this is the picture you have here on the right on the bottom corner is a sort of re-elaboration of one picture that they observe by high fluorescence microscopy through fish and here each color corresponds to different chromosomes meaning that the chromosomes actually they are not like spread inside the nucleus but they are organized in some specific portion of the nucleus that this guy's called chromosome territory and actually the the the presence of territory is quite important because it correlates with so it's not random inside the cells in sense that they found some correlation between big chromosome and small chromosomes sometimes big chromosomes so in I mean most of the cases big chromosomes are found towards the periphery of the nucleus as small chromosomes toward the interior and they are functional so in sense that if you destroy this kind of organization or if you damage this kind of organization this is associated to potential diseases and also to to the death of the cell so then this is for instance another observation they made that actually mammals which are let's say which are very similar or very coming closer and from the evolutionary point of view and but they have developed for I mean for reasons different habits like for instance one mammal is diurnal one mammal is nocturnal the chromosome organization in in those cells for instance retina cells and is different and the response to this to to to how these mammals have evolved in the course of their life and and these are this is another technique that was developed this is actually much much more recent than fish and this is called the chromosome conformation capture technique so these techniques allow you to detect how much somehow frequent the DNA interacts in the nuclei of the cells and this is done in this way so you use formaldehyde in order to make cross links of DNA inside the cells some enzymes which are called restriction enzymes that cut all this fragment and then you do a bit of the say of post processing of this fragment you know basically you amplify this fragment and you detect the frequency of how I mean how much frequently these these fragments appear okay and the idea is that fragments which more frequently interact they should appear more frequently in your in your statistics now of course I mean you have a lot of these contacts between fragments which are very close in sequence I mean basically in the fragments of DNA which are very close in the sequence of chromosome interact frequently and this is this appears in this contact matrix so where the diagonals is very populated but there are also non trivial contacts instead of that contacts which are so there exist contact which are between DNA fragments which are very far in sequence but they are close in space okay and this concept tells us that there exist a non-random organization for chromosomes so and this tells us I mean if you want these compliments what high sea oh sorry what fish has told us that there is no normal organization of chromosomes so now can we understand that that that was basically the purpose of my investigation of what I will present now can we understand I mean by some just simple physical mechanics the presence of a non-random organization of chromosomes inside the nuclei of the cell so now the problem is interesting from by many point of views and indeed it's because of the following so each so if we take just an ordinary human cell so the ordinary human cell contains about six billions base pair of DNA so it's about one two meters of DNA inside each cell and so the cell is contained inside the nucleus which is only 10 microns of diameter so you have let's say you have like I mean if you want to make the proportion it's like putting stuffing I mean a 100 kilometer slope inside an ordinary backpack so and as also Christian has pointed out before so there is a lot of problems with that because you create a lot of entanglement so that's why the problem is so interesting from the physical point of view so if you want is another proportion it's the same distance as between the center of the sun and the Uranus and the nucleus is just the so I mean there is a really huge gap in land scales so in order to better understand the problem from I mean to cast the problem into a physical problem we just go back to what happens during the cell cycle so the basically the chromosomes are very dynamic so we have two phases during the life of the cell one phase is called the mitosis and it's where chromosomes take this peculiar shape like this X shape and then you have interface and it's during interface that chromosomes behave as I said territorial so they form this territorial organization what it happens in between during mitosis and the chromosomes when they enter interface is that they start to open from this X shape they have during mitosis and do some dynamics inside the nucleus and then form territories so you know if you want to to to to to get a clue I mean what's what's going on then you can do that actually it's what we did at that time with with Ralph Everest who is also going to to give a talk in this conference this event was the following so basically we simulated the let's say the transition from the mitosis to interface in order to see if we saw I mean by using a polymer physics model in order to see if we could get something about the formation of the territories an idea why I mean why territories form in order to do that we use now I mean as I said a very coarse grain model because I mean as I said that each chromosomes this is for for the typical mammalian cell and each chromosome here it's about 100 million base patches it's an enormous system no way to do that let's say by using the standard simulation so before like using water expertise so many things like that so we need to do something very coarse grain and the level of coarse grain we used was at the level of the 30 nanometer fiber namely if we do if you do some mapping means that we used a chain of beads for model one single chromosome and each bead map to 3000 base pairs so if you do then the proportion that means that each chromosome here each chain is about 30 000 monomers at the time we use the four of these objects which are totally they are all the same they they it's a normal polymer so there is no chain there is no information from the sequence it's it's very basic so in the end the old simulation contained about 120 000 monomer particles and the chains were arranged let's say randomly inside the box with simulation with periodic boundary conditions and otherwise we arranged the chains as I mean accordingly by using a very simplified model for the mitotic organization and let the simulation go what we used then was a standard Brownian dynamic simulation of a generic as I said a generic polymer model it's called the the Kramer-Grest polymer model basically as I said is a Brownian dynamic simulation so you have a Langemann equation with some potential between the monomers with noise with zero average and standard and delta correlated in time and in the monomers so that the force will let's say it's very simple and it's something very standard using polymer physics so you have a chain connectivity of course this is very important where you use what is called the Finet potential which is I mean if you are unfamiliar with that it looks a bit strange but basically this potential was done in such a way that two monomers I mean two bonded monomers cannot go too far we don't use harmonic potential because because the presence of this logarithm term basically if I mean the distance becomes too far then this potential may vary so that's why we don't want that because that's a crucial part of the story then we have some standard bending stiffness for the chromatin we just tuned this parameter in such a way that we matched some experimental observation from from the literature we imposed the number of monomers and the sides of the box inside of the periodic box in such a way to mimic the original I mean what the average density of DNA or chromatin inside the cell and then we imposed hardcore repulsion between monomers so basically when these two monomers fluctuate and they do bump into each other they cannot cross and that's why it's so important actually to use Finet because if you use a potential which is a for bonding interaction which is just harmonic basically you can basically what happens you can break topology because the monomers in principle can go quite far apart in principle if the the potential is not too strong and then you can have a chain I mean two chains can pass through each other we don't want that okay so that was our minimal ingredient so if you do that basically you let the simulation run under these you're starting from this initial conformation and you end up with something which looks like that so basically each chain maintains I mean from the visual point of view maintains its identity and as you said it's very similar to what can you observe in inside the cell so you you see territories but can we be more quantitative than just than just pictures so yes one important thing here is how which is I think is a bit sometimes underestimated but it's a very important point what is the typical time scales you can do in the this simulation because this is of course constrained simulation so basically you need to fix in somehow I mean some way basically diffusion coefficient of your of your monomer in order to do that we did the the following so in the literature we found some data on where people were monitoring the mean square displacement of a single gene on some I think it was east chromosomes in time so they used the microscopy in order to to monitor how these things displace and they basically then measure the mean square displacement in time of of of these logs so then what we did we actually simulated these the same system using our model and we just fixed the time scale in order to match the experimental observation actually here there is a point what we did basically which was only a shift in the x scale without shifting on the y scale so that means that we had the complete control of the of the of the spatial scale there is no fit on spatial scale there is only fixing the the time scale and though and so by this problem so if you if you want I can give you more detail later so by this protocol we were able to show that with this setup you can draw you can basically reach I mean at the time it was like about three days in real time so that's was how we fixed the time scale of our simulation and that was quite important because then you see that you can reach macroscopic time scales and so these all these things make sense and then we can compare to experimental data so that's these are these two pictures are the following I'm going to explain a bit more in detail so here the lines are the results of our simulations the symbol are the results of experiments so in the top panel so experiments are the mean square spatial distances between two loci on on some chromosomes so sorry I I don't know if I can yeah so the different colors here sorry it's not indicated in this figure but the different colors means different species okay so red colors is east saccharomyces cerevisia and here blue or cyan is man homo sapiens the gray the green one is also mo sapiens and the brown is drosophila melanogaster and the lines are the results of simulations for the same quantity namely the mean square spatial distance between two loci on the same chromosome as a function of the genomics distance between them this is it says a standard quantity you can measure in polymers to to basically to read the statistics of your polymer chains and as you can see here that the results of simulations which are given by the the lines match match the results of of the experiments so we were able to provide a quantitative so basically the our model is able to summarize in a quantitative way the outcome of experiments from the further structure another of course I mean one set of experiments is not enough so we we saw by the same simulations what it happens to when you measured the how how to chrome sorry how two loci the same two loci the same pairs of loci interact in space by comparing to results of this other technique chromosome conformation capture that I mentioned at the very beginning and and also here the result is is in agreement with the experiments so I guess Giovanni is appearing so how much time do I have one minute one minute okay no I'm I'm done and yes and what is important here two three minutes is okay no no it's okay I can finish here it was just a part of showing the results of this simulation so and what's what's important here is also I mean this kind of behavior you observe here because it's it's not by chance so we have an explanation a quantitative explanation why observe I don't have the time to to go into details I mean you can contact me in case you are interested or we can discuss later in the in the questions time so the kind of expanse you observe here which is that special distance is increased like two over three and the contact frequencies the case one over l tells you that basically this is a no standard polymer systems it's not gaussian it's not self-aware in work it's something completely different we have a quantitative explanation and quantitative mod mod quantitative model basically for first principle why these things are like these well if you want I can tell you later on another day but this something's very interesting because then allows us to do some specific mapping and to construct a more coarse grained version of this system and to simulate bigger system otherwise we can do the opposite and and we can do also some fine graining by starting from same model by going to smaller scale because now that we have a sort of intermediate scale from 13 nanometer to something larger you can do also the opposite and don't then go below by using some very straight forward fine grain and that was we did it a few hours ago sorry I don't have time to go into details but the fine grain basically allows you to to to study smaller scale in a very systematic way and also to explain how certain genes yeah in a very in a very simple way basically this is all a mechanistic effects so at this point so I just want to yeah this is the explanation blah blah what I wasn't mentioning before but I just want to conclude by acknowledging yeah the two people that were really involved in this work first of all Ralph which is going to talk in a few days I think on Wednesday and Ana Maria was a postdoc here in CISA I left a few years ago and now he's working in molecular partners AG in Zurich and of course I want to acknowledge many computational resources I use in these years so thanks for your attention and thanks Giovanni for chairing the session thanks Angelo so now we have time for questions so as usual just write your question on the chat so just a question I don't know if we were able to see the arrow no yeah oh yes okay I can see it now this is you know just because I didn't I don't know I didn't know so questions or comments from the students it's very important curiosities maybe I will start with a question so how expensive are these simulations well I don't know I don't know I know but yeah it can be very expensive actually well okay the first simulation I did or now but it's now I'm quite told it's about I think I spent about a month or some class there for simulating these these you can still see the screen here right yeah so when I did the simulation when I was a postdoc in the resident so that was pretty expensive I think I spent about a month for doing the simulations and yeah if you want to relax okay first of all if you want to relax completely this system this is impossible okay so then you you can relax some up to some length scale and then okay and that was the point of why these things work actually because it actually is not relaxed that was the time to explain that into details but then you can do as I said you can construct a protocol which is in order to to let's say to to use a sort of multi-scale modeling and that's this much like expensive and you get the same results but then you get the complete let's say this confirmation from basically with a few days of computation so by then you are using Monte Carlo so you are building another model which maps into these and by this no it's not micro dynamics Monte Carlo so you don't have access to dynamic of course then from the same structure that you have built by Monte Carlo then you can use you can do the dynamics if you want but then yeah many analytics a lot yeah I see then so there is a question on the chat from Fahad asking if you could explain the L to the 2 3rd low yes so actually as I said so I was a bit well I was not really careful here but it's a bit so because I didn't have time so but the fact is the form so if you so the idea is the fact is fun so you start from these conformations right which are obviously are not in equilibrium and so then you let these things relax to some point whatever this point is and then you end up with this okay but then again as I said before this confirmation not the equilibrium sense that they are still keeping memory of the original conformation so basically here what you see here somehow is what you have put into the initial state so then you say okay but this is a bit disappointing right because I mean so what's that and the fact is the following so and that explains also these exponents so suppose you do the same this kind of sort of experiment let's say get on an experiment so what you have what you have here is you start from some conformation which is not in an equilibrium is this kind of roles if you want then you know that just by entropy as sooner or later you should end up into this kind of conformation where everything is completely mixed just because of entropy this is something that is perfectly known I mean where polymer have to go but before doing that I mean before reaching this state you have to go into something which resemble more like this where sort of you have some partial let's say relaxation of the structure of the original structure but you don't have the relax the topological relaxation of the topology namely these things these objects at the very beginning are not entangled let's say are not linked with each other and so they maintain this alinkness let's say for a long time right so if you it's the if that's the case then you can just say okay I don't okay then you can estimate how much it takes to relax this topology is very long time but then I can say okay then I don't I don't care about the fact that these chains have some ends so I can just forget about the ends and I can I imagine that my system is not a system of linear chain unlinked and entangled linear chains but are a system of ring polymers where rings I mean that I unlinked and unnoted because basically if you want it's it's it's really the same thing and so then I let these things to relax okay if you do that then I can say this system of unlinked and unnoted rings have something to do with the simulation I did before let's say for the for non-equilibrated linear chains and if you do these experiments you base the same simulation I mean same setup the same model you end up with something like this so basically equilibrated rings okay behave like let's say non-equilibrated unlinked linear chains so this is the same measurement I showed you before so the mean square internal distance is between two loci as a function of if you want of the genomic distance so now if you do that now we have a reference system which is an equilibrium because it's a but it's not linear chains it's ring polymers and then you can now there is a lot of literature on that so now a system like this has to be a completely different from the equivalent system or linear chains they should be more compact and they should display this precisely this exponent namely they must be compact if you want they are space feeling because this sort this exponent tells you that ring polymers are space feeling and they and the the contact frequencies between loci has to decay they has to decay like 1 over l and not like 1 over l to for instance 1.5 which is the typical behavior of a random walk what you would expect in between contacts on on an linear chain if the chain was a random walk except for the walk would be even worse would be even more steep than that so that if you use a model for rings which are naturally if you want a tutorial because they are naturally compact so then you get an explanation for this kind of experiments which is quantitative in this decent equilibrium system then you can do thermodynamics statistical mechanics so this and now there is a lot of literature on on the probably roughly mentioned with that I don't know so you know if I answer the question there's another question by the same person it is do your simulations agree with loop extrusion theory yes they the good question there is no loop extrusion imposed here and the I think my answer would be no in the sense that this is something that actually this pertains to some to a non-equilibrium mechanisms which means that I don't expect that to become compatible with that but of course you can impose that it's it's it's pretty straightforward because now we have if you want is a sort of ground model then you can impose loop extrusion then it's another another question is if the structure you get by loop extruding by using loop extrusion on some initial structure for some time does it look like this one that I'm constructing here maybe this I don't know but it's an interesting question actually I we are thinking about to normally in this direction but so that's another point but I agree that is yeah now loop extrusion is quite popular subject so I agree it's an important point okay good so I don't see any more questions in the chat so if there are no more questions I think we can thank Andrew and all the other speakers again and then I saw Adi appearing maybe do you want to say something no okay okay good so thanks to everyone and yeah the school will continue tomorrow with the advanced lecture so see you tomorrow morning thanks everyone for participating ciao bye