 process. Okay. Now you see my screen? Yeah, I see your screen. Perfect. So, yeah, so yeah, the next speaker is Uriel Morcan from, from my CTP. So the, the, the stock is a matter, the title of his talk is a matter of correlation. So the floor is yours. Please go ahead. Thanks a lot, Raul, for the introduction. And thanks a lot to the organizers, both Alice and Edgar for, well, for giving me the opportunity to share with you some of the work I have been doing recently. Today I want to talk about a topic that I have been very interested for the last couple of years. Typically in atomistic simulations, what we find ourselves doing a lot of the time is trying to make sense, trying to have a qualitative picture from an ensemble exploration done with MD simulation or Monte Carlo simulation. But beyond having a thermodynamic or a quantum characterization of our problem, what chemists usually seek is to have a qualitative picture they call a molecular mechanism, which is a sequence, a step by step sequence of fundamental evidence that give rise to certain process they are interested in. This process can be a chemical reaction, it can be a phase transition, it can be conformation and exchange, many, many things. But just as an example of what I want to say here, in the screen we have liquid water, okay, and there is a very important process that is happening here, okay, I won't tell you what it is, but it's a very important and it's a process that is essential for life, but I just want to show you that just by inspecting an ensemble or trajectory from molecular dynamics, it's really difficult to extract a qualitative picture of what's going on. One typical approach that chemists use is the chemical intuition, which is really powerful. What they do typically is they compose the problem into well-known previously established interactions and then instead of studying the fate of all of the particles in the system, they study the behavior or the trajectory or the evolution of this interaction. This is an interesting reduction of the dimensionality of the problem and very, very often this allows chemists to come up very quick with a qualitative picture of what's going on. But there is no guarantee that this type of intuitive procedure will work and this is particularly true in the case of liquids or macromolecules where the global processes are a consequence of an holistic combination, statistical combination of these interactions and just by looking individually at the interactions is quite difficult to get the picture that we want. I find incredibly fascinating how lately the data science approaches are percolating into the atomistic simulation realm and very often providing pictures that are incredibly good in terms of interpretative and predictive power and many times they replace the previous existing intuition-based methods that I just mentioned. This is what I would like to talk about today. The system that you have on the screen is called CRISPR-Cas9. This is a protein that catalyzes a very, very important reaction. This is a protein that has been discovered recently and the reaction that it catalyzes is it recognizes a sequence of DNA and then it cuts another part of the DNA. This type of process can allow to have an addition of the DNA in living beings. This is a very, very interesting technology that lately has had a lot of attention but it is a quite, quite complex system as you can see many, many atoms. In order to understand it, in order to have a qualitative picture, in order to tackle a couple of questions that are open in the field that I will talk about a little bit, what we did was borrow a concept from social network. The people that make money with social network, the people that are interested in our private lives, what they like to do is to cluster ourselves in groups of people. In that way, what they can do is, for example, find groups of people that have different political opinions or groups of people that share some idea or work environment. We borrowed this idea in order to study this protein and to decompose this protein into communities that now these communities, what they have, the physical reasoning behind these communities, these are atoms that have a correlation that is much larger within the community than between communities. With this type of analysis, we reduced the dimensionality of our problem a lot. Now it's much easier to have a qualitative picture of what's going on, how this protein works. In particular, one open question is, this protein has a problem, which is that sometimes it makes errors. It cuts the DNA when it shouldn't cut DNA. An open line of study is how we can tune or change or mutate this protein in order to avoid these errors to happen. Of course, if we put this to treat a patient, we cannot afford to have these type of errors. We try to tackle this question. What we know is that these three communities are the one in charge of the recognition of DNA and for the activation of the catalytic process of the protein. Just by studying, something I didn't tell you is that the black lines between communities are the cross-talk between the dynamical cross-talk between the communities. The ticker, these black lines, means that these communities have a stronger interaction. We can now introduce mutation on the system in order to tune this cross-talk between communities, and this will affect the catalytic activity of the protein. Sorry, a question. How do you calculate the interactions to find the thickness of the lingers? Thank you for the question. What we do is we use two-body mutual information of every particle in the system, and then these lines between communities are the total two-body mutual information between the two communities. It's like a correlation between the communities, a dynamical correlation between the motion of the communities. Does that answer your question? Thanks, yes. Perfect. Another thing that we did is, well, there's plenty of algorithms to find shortest paths coming from graph theory or from computer science, and we were also interested in that because, well, in particular, we were interested in finding the paths or the amino acids involved in the path that communicate two distant sides of the protein. Well, this we did with a big algorithm, but there's plenty of algorithms to do this. The reason why we were interested in this is because this protein has two important sites. The site where it edits the DNA, it cuts the DNA, this is a catalytic site, and the site where it recognizes the DNA. This is a site in green and red, respectively. There is something that must happen in order for this protein to work, and it's a communication between these two sites. We wanted to tackle the question, how this site exchange information. In order to do that, we used this algorithm of finding optimal paths. In our case, the optimal paths were basically paths that maximized the correlation of the atoms that are involved in the path. In particular, we studied this optimal path through this domain of the protein, which is called HINH, which is one of the most important domains of the protein. This we did in collaboration with Shulia Palermo, with Victor Batista. We experimented a group from Brown, from George Leasy, and we find a pathway that communicates these two sites that also very interestingly, it correlates very well with the experimental pathway determined with NMR. These are the solid balls, and the transparent balls are the computational pathway. We found the pathway, and now that we know this pathway that communicates these two sites, we can propose different mutations on this pathway, on these amino acids that belong to this pathway, in order to switch to tune the communication between these two sites, of course, tuning the activity of the protein. One example of this is... Sorry, yeah, you have four minutes. Perfect. I'm allowed to finish. These are two examples of mutations where we know that the activity of the protein gets disrupted. What we show here is in these mutations, we show that the pathway that communicates these two important sites of the protein is completely disrupted. Now we can understand why this mutation was making the protein inactive, or was disrupting the activity of the protein, and this is because it disrupts the signaling pathway between these two important sites. I will wrap up here. I have a little bit more to tell you, but let me wrap up here. I will go to the conclusions. In the conclusions, I would like to show you very quickly a lot of the things that we did, and a lot of the people that participated on this. This is another case where we developed a method in order to understand very simply another type of problems in biochemistry. This was done with Victor Batista, Ivan Rivalta, and Christian Negre. Well, this is the project I just mentioned. This was done with Victor Batista, Julia Palermo, and George Lisi. Also, we used these types of methods in order to find water channels in a very, very important protein, which is the photosystem 2. This was done with Victor Batista and Crystal Rice. More recently, we are working with Ali Hassanali and Ricardo Franklin on what Ricardo already mentioned is a soap-based artificial photosynthesis. We are using the same methods in order to understand those kinds of systems. We are also working with Hatareh Sisi and Ali Hassanali on the understanding of the optical properties of BSA. I think Hatareh will talk about this. Also, we used similar techniques. Finally, we used these for a very, very different problem to understand electron transfer in microbial nanowires with a very, very different physical principle, but with exactly the same data science methods. This was done with Nikhil Malvankar and Peter Dahl. Thanks a lot to everyone. If you have any questions, I will be able to answer them. Thank you very much, Kuryel, for your nice presentation. Leili has a question, so please go ahead. I had a question about mutation of the protein. Sure. How do you know that when you mutate some specific part of a protein, folding in other parts is not important? Did I ask my question clearly? Folding, you mean folding of the protein in other parts? Yes. For example, everything is included. We run a molecular dynamics of the new system that is mutated. We see what happens with the elastric pathway. We see what happens with this pathway that we determine. So, in principle, if there is one process, like a conformational process or something like that, everything is included. Thank you, thanks. You're welcome. Okay, we have time for another question. No? Can I ask a question? So, I was wondering whether the way you generated these correlation maps using mutual information. Yeah. You said that the higher the mutual information between these two units, then higher the interaction strengths. So, like, mutual information has statistics of the entire dynamics. Is it just the interaction? Can we decompose into the level of just the interactions? Can we conclude that? Of course. I mean, we start at the level of just interaction. But wait, one thing. We shouldn't mix interaction as physical interaction as potentials with this mutual information, which are statistical correlations. Yeah, that's all my questions. So, this is not interactions. Let me show you what it is. I have actually a slide here. This is the two-body mutual information. It is made of the difference between the marginal channel entropy and the joint channel entropy. Okay. And basically, what this does is compute the correlation, the dynamical correlation between the fluctuation of particle i and particle j. And we do this for all the particles in the system. And then we can come up with a total mutual information between two communities, basically as the sum of the total interaction between the two communities. Right? Yeah. So, this is like a strength of correlation between the two units. Exactly. It's not the wrong interactions between the two units. Yes. Yes. He's there. Thank you. Sorry. May I ask a question? Sure. Just I would like to know, is there any relation between the linker that you have and the parameters in the elastic network models as relation on elastic model that usually people use to find the elastic behavior of the system? You know the elastic network models to find the correlation between the motions of the particles? I actually don't know these elastic network models, but basically this is the correlation between the atomic fluctuations throughout the molecular dynamics. So, this has not, this doesn't have any harmonic assumptions. This is the standard way of computing the correlation between any two particles in the system. So, there is no model in terms of the way we compute the correlation. The model is after. The model is in order to analyze that data. Okay. So, we should stop here. Thank you very much again, Uriel, for your nice