 And our speaker today is Victor Babst. Victor is a theoretical physicist by training, having obtained his PhD from the Colenarmale in Paris, where he looked at quantum disorder systems and the relations to quantum computing. After a post again, Frankfurt on the mathematics of phase transitions, he then joined Google DeepMind five years ago, where he has worked since on data efficient reinforcement learning, transfer learning, and graph networks. His talk today combines the physics aspects and the machining aspects. He's going to tell us what graph networks can teach us about phase transitions and grasses. So Victor, please, whenever you're ready. Thank you, thank you, Philippe, for the invitation and for the introduction. So yeah, so this talk is about this work that I did in the last year, I mean a bit, yeah. Last two years I did mine, with in particular my colleagues and I want to mention Thomas Keg and Yesh Kharagrabska-Bavinska and Craig Donner and also help for a bunch of other people, DeepMind and Google. And so this work was about understanding how much neural networks can tell us about glassy systems. And so first I want to start this presentation by explaining what the glass transition is, because I assume some of you may know what it is and for some others it might be less familiar with it and I want to show you how big it just is. So what is a glass? A glass, but simply it's just a liquid that has stopped flowing. So if you look at the structure on these three schematics you have on the left crystal and on the right liquid it's very clear what the difference is that the liquid is disordered. And if you look at a glass, again a schematic view it looks just like a liquid. So you can't see the difference. If you just look at the structure, but obviously the dynamics is very different and a glass, if you think about a piece of structural glass in your house it's something that doesn't move and that has a rigid response to pressure for instance. And the glass transition is not something which is sharply defined as a usual phase transition but it's defined experimentally as to when the viscosity has increased by a 14 order of magnitudes. The viscosity being some measure of how fast does the liquid flow. And this happens over a small change of temperature. So you change temperature by 30% or less and you have this huge change in some of the response parameter of the system. And so what are the questions that we are asking here? So, and why is it an interesting problem? So one question is why does it slow down so dramatically? And in particular the transition is non-neuronious in the sense that it's faster than exponential in the inverse temperature. And what is going at the mixed-click level that causes this slow down. So as I said before, if you take a naive view then you don't really see a difference in the glass and the liquid at the microscopic level. So can you find structural markers of this transition? Another way to look at this problem is to think in terms more of microstates. So if you're used to the physics of phase transition and there you would ask, is there something happening at the level of these states? Maybe at the local minima of the free energy that explains this change of behavior in the glass. And finally, one of the interesting question is how does the dynamics unfold? So we've said that the dynamics was getting much slower but can we understand how the glass starts moving when it starts moving? And on this picture, for instance, this is a picture of the, I think in the 2D glass of the heterogeneities in the dynamics where you can see the red particles that are moving more than the blue ones and you can see that you have these domains of regions that start to move together in the coordinate manner. And can we characterize these heterogeneities and can they help us understand that the slow down in the glass dynamics? So why do you care about this problem? First, it's a fundamental theoretical question. From a number of price, the Sun said in a famous statement that this was the deepest and most interesting unsolved problem in state theory and the nature of the glass and the glass transition. And it occurs in a wide range of systems and I'll try to give a quick overview of now. So the first example, the most natural one and one I already mentioned is molecular glass, structural glass. So this is something that human have been doing for a long time, you know, 2000 years before Christ. And in which case, you know, what you normally have in mind with glass, it's just a glass of depth silica and it's a system which is naturally occurring on Earth. So glass can form in a special condition of pressure like when you have a very high pressure on the material and you have, you know, several sources of natural glass and can form with many materials. So one fun fact maybe is that if you look at water in the universe, the muscumon structure of water is actually some glass, amorphous ice, some of the crystal ice, amorphous ice. And for instance, that's what uropodism and adjuvator is made of mostly. Another example of glass transition is ironing. So, I mean, yeah, some of those things I just learned, you know, when preparing this presentation and trying to explain people what the glass transition was about, but yeah, I think those are kind of interesting to give an idea of, you know, how common this transition is. So ironing is a glass transition of the polymer and so basically what happens is that when you hit the polymer, you get above the glass transition so the fabric becomes soft, you know, okay, it's not a liquid, but it has this property of being, you know, having a fast dynamic and so you can impose pressure on it with the iron and then when it cools down, you go through this glass transition and it stops moving and the movement of the polymer chains is arrested. So the example is jamming. So in jamming, you don't have really a notion of temperature anymore, so this is describing systems of hot particles in the interaction. You can think of sun, for instance, and there are the control parameters, so as I said, it's not the temperature anymore, but it's the pressure. And so as you increase the density, so as you increase the density, the viscosity increases extremely quickly and that's what happens, for instance, if you put the sun castle and you pack it together, you see that the sun stops flowing under the main where people have tried to connect the phenomenology with glasses or with jamming is traffic jams, for instance. So this kind of collective behavior, so you can have this phase transition between a phase which is very fluid and a phase which is completely arrested when you change the density by a small amount. And I think this was a nice video illustrating jamming where, so this is in corner creative machine labs and they were creating some robots with coffee beans and so you would, yeah, so this is illustrating a jamming phase transition. So you put the beans in this balloon and then the robot will be able to change the density just by tiny bits and that's really make the, you'll see the hundred completely rigid and you should see it grab objects. So yeah, that's an illustration of how far, like the robot is changing, I think the density by five, the volume of the balloon by five percent of less, but then the response is completely drastic. Let's move to another illustration. So people have looked for a glass transition in biology and in particular in said migration. So yeah, I mean, I'm not going to pretend to be, to understand these things very deeply, but basically you observe different states of biological tissues. And I think one appealing aspect of the glass transition is the fact that you have this control parameter that changes by a small amount and then the response which changes by a large amount which is something that is pretty appealing if you think about modeling this kind of system because you could imagine that in the body you'd have some control parameter so that you don't want to change it by all the magnitudes of value. You want it to change it by a small value and then you want the behavior to change a lot which is something that could be relevant for operating development, metastasizing cancer. So this is basically about the organization of the cells and whether the cells are in a phase which is fluid or not, also they'd like. Another example going more to computer science is optimization problems. So there the idea is that you look at variables, binary variables with constraints between them and if you do that and under some conditions on defining these problems you can create a statistical system which is analogous to a physical system with the main difference being that you, I mean you've swapped the continuous variable that you have in a glass that describes the position to binary variables but then if you do it in the right way you kind of go from glasses to spinning glasses which was also an interesting area of physics in the 80s. And so here the idea is basically you have these variables and you had constraints between them and you had this concern at random and you can control the density. So the control parameter is the density of constraints which is related to how hard it is. So the more concern you had you add the less solutions you would find but then what people found is that there was a very similar phenomenology. In these problems between a regime where you have little concern it's easy and then as you add them at some point you don't have a solution anymore but in between these things you have a glass transition and this is where the hardness of the problem also appears if you try to solve it with an algorithm. And so this has proven to be relevant for consensus section problem exam for instance random satisfiability. And on this problem also more theoretical progress has been made as well because this is a problem which is simpler to tackle than the glass transition in three dimensions because this is essentially an infinite dimensional problem. So not everything is understood there but the existence of a glass transition is better understood. And finally as an extension to the previous point there has been interest in seeing whether this could also apply to some of the machine learning questions and in particular also this paper was looking at the training of neural networks in the regime where you don't have many parameters so where they are under parameterized. And so you're trying to fit the neural network is trying to fit some objective but it doesn't have enough parameters so it's kind of struggling to represent the constraints if you want and so if you look at the so these two plots we're showing you know on the on the left are glasses system so glass and on the right are the the weights of a neural networks and this is measuring basically how much the system is moving some measure of that and the colors show the sodons the yellow show you know you have this plateau where the things stop moving and basically what this was showing is that you know you've had potentially some similar phenomenology between the two okay so while the last slide was talking about you know glasses and machine learning this is like this talk has nothing to do with that this is about machine learning applied to glasses rather than the opposites and in particular this is about graph networks or propensity prediction so I first need to tell you what propensity is and what we're doing I'm not going to spend a lot of time on the physical system and I'm going to describe it on this slide so this is a we take basically you know the standard system to describe a glass in a computer simulation or for a theoretical study so the other thing I should say is of course I mean because as I described the glass on the screen is very ubiquitous and there is this belief you know from statistical physics that the precise details of the system don't matter too much you know as long as you are within the university class so as long as you have short run interactions between between the particles and some you know conditions some weak conditions then you know it doesn't really matter what you do so you can just pick this canonical system which is you know particle is a box with an anatron's potential and the idea I guess one of the details that you need to have two types of particles so this is why it's a mixture because if you have only one type then you know you have a crystal which is easier to find and the system might crystallize which is not what you want so you put two types of particle with radius that don't go very well together and then what we do so we can so there's a notion of temperature you can simulate the system at different temperatures you can equilibrate it and then you can run the dynamics of the system starting from an equilibrated configuration and so when you run the dynamics of the system with molecular dynamics you have some some randomness and so I guess here the thinking is you know it doesn't necessarily make sense to predict exactly how you like what the so we want to make predictions on a future given the starting state but you know because there is randomness that might be difficult and so what we do instead is use this propensity which is the mean displacement of each particle under you know different random velocities so you start from a fixed configuration and then you sample 30 initial conditions for the velocities you unroll your simulation 30 times up to a given time in the future and then you average how much each particle has moved and that's a way to you know smoothen out the random fluctuations and just retain you know the important information and the information that can be more easily predicted which is about you know this particle is mobile and typically will move a lot or this particle is really stuck in this kind of initial configuration won't move a lot whatever the the the random velocity what you see on these plots so the the top left panel is showing you the plot of the propensity of the average propensity as a function of the temperature and so the different colors the different temperatures this is as a function of time so what you see on the red is the highest temperature and so there you have for short times the ballistic regime where things move linearly so the displacement is proportional to time and then you go you start having collisions then you have this inflection and then you find the diffusive regime where the slope goes from one to one-half and this is because particles are colliding together and this is a liquid so this is the you know the high temperature regime which is not the one that we are that's that's relevant for the glass transition and then as you lower the temperature and you go to the blue curves so still looking at the top left what you see is that this plateau which appears where you know this this propensity stops increasing and then at the end for the long times you always recover the diffusive regime and this has to be like that the system is diffusing over long times but you have this this intermediate phase between the ballistic and the diffusive regime where where the system is apparently not moving and this is so this is the regime that we are interested in in and where we want to make predictions and so the bottom left figure is a measure of the length of this plateau if you want and what this shows you is that as you decrease the temperature you know the the length of this plateau increases and and this is the part that is you know what with this curve you can fit with various flows and you know what's the exact flow that you should use to fit this is an open question but that's faster than an iron use so that decreases that increases very fast with temperature which also means you can't simulate systems you know at very low temperature because they just become impossible to equilibrate at least in this setup and so to summarize the results we got we so so so so our problem was this prediction of starting from this for this configuration can we predict the configuration in the future and what we'll show is that you know we we outperform the the previous methods and that also we could analyze the working of our networks and try to extract a growing correlation lengths from it and I think that's what is interesting in in you know when you think about applying machine learning to to physics problem because this goes be this tries to go beyond you know a regression problem or a classification problem and go to go into you know can we interpret what the networks are doing and can we draw some insights from them so in terms of related work had been working on this question and most of the work that had been done so far was trying to do to do some correlation again between you know the the starting configuration and how much the particle would move but that was using simple techniques that were based on heuristics so you define features by each particle and then you use some classifier and in the future I'm also going to compare to the traditional approaches for solving the same problem so you know obviously people in physics had looked at whether they could find good predictor of the dynamics you know can you say whether a particle will be moving or not based on some physics based approaches so one you know one natural ideas to difference that's a potential energy so you could look as whether you know a particle has a high energy and maybe it wants to relax more so maybe it's going to move more so that's a rough idea of the potential energy methods and then you have you know more more involved methods that still rely on the energy function of the system and and on trying to understand you know what are the modes of relaxation of the systems and can does this say something about how much of the the system relaxes so one thing to say before you know going to results and and our method is that you know this when we compare to these traditional methods I think we have to bear in mind that I mean these methods are bringing something more than just the predictions they're bringing you know some insight into the system because if you say I'm proposing to use this particular you know form for the relaxation and I can show it correlates with the ground truth then you are you know potentially putting you know that can give way to a theory why the machine if you have a machine in black box you know under certain conditions you can hold it's going to be more predictive so it's not about who is the who has the best predictive power but it's also about you know the shred of between having a good predictive power and understanding what the method what the method tells you so let me talk about about graph network and more generally relational networks again I don't know how much you know the identity is always not familiar with those I'll try to remain at a relatively high level so relational networks are networks that use a relational inductive bias to perform their computation which means that they are performing computation over an explicit or implicit graph of objects so this is to be you know to to be compared for instance with convolution networks that will operate on images or you know RNN or LSTMs that would be a part or that will operate on sequences or you know perceptron that will operate on you know non-structured inputs so so relational networks operate on objects and on potentially graph of objects and they try to model the relationships within these objects and they try to be and they should be invariant in particular to limitations of the nodes and of the edges and there is a family of models in in this there's a variety of models in this family and I'm not going to describe them here but they are trade-offs basically what I just want to mention is that they are trade-offs between you know how much graph structure you put in this model so you can have models that are fairly implicit where the models you know will should discover the edges by itself basically so you take you know these are the objects and they should be invariant under permutation but that's what you say you don't specify a graph and you have methods that are more expressive where you can really say specify which interaction should be present in the system and forces the the model to to precisely model these interactions and the trade-offs also come in in terms of you know which one is easier to turn which one use more memory and so on and so forth and the more bias you put in the model might mean that it also gets you know harder to train and the model I'm going to present in this talk is on the right of this panel so it's a more structured model and we will see that the graph structure that we put is explicit so we are enforcing some graph structure based on our you know bias about the system in terms of you know previous applications of these models to physics and chemistry again just want to give a few examples one of them was to apply them to molecules so graph networks apply to molecules and this is a paper from three years ago where the authors showed that you could you know get gft-levels predictions for for a class of molecules by doing message by doing message passing on the molecular graph so you would encode you would use the molecular graph as defined by the the formula of molecule and you would pass it to a network the network could do computational node and edges of this molecule which would recover the predictions from gft but much faster you can also use these models for simulation so this is a recent example where these authors were simulating the behavior of water so you start from a you know an unincorporated condition and you just let the water relax and the model they are using is again a graph network so this is somewhat similar to what I'm going to present now in the sense that the model is relatively similar and it's also asked about simulation the energy mostly stops there and you can see on this video again that the results are relatively impressive so this is comparing the ground to us with the the simulation I don't know which one is which actually and then there are also so I mentioned in the beginning of this talk that you know glasses starting from structural glasses you know could also be put in an energy with with such with constant satisfaction problem and first of the systems and this is an example of so so and I mentioned you know satisfiability problems so so these these papers were an example of a pink measuring directly to these problems so can we use machine learning to solve you know so these ones are about I think one is about CSP the other was about satisfiability problems and so this was also you know some inspiration for this work in terms of thinking you know okay can we do the same thing but for physics system in three-dimensional rather than for this satisfiability problems so what is our architecture and how do how do we prepare how do we predict propensity so this schema is a raw summary of what we do so we start from a three-dimensional input which is the position of the particles in an equilibrated system at a given temperature so we have this box where we have about 4,000 particles in it we calibrate these particles and we take the so we take one equilibrated configuration and this gives us a 3D inputs from which we create a graph input by connecting particles within the threshold so we connect particles that are within two sigmas sigma being the typical length of the interaction between the particles and so this gives us a graph and this is a graph for each particle has roughly I think 20 neighbors and so on this graph what are the features so on the nodes you have the type of the of the particles if you remember I said that you know you have to have two types of particles for this to work and you know not to get a crystal so on the nodes we just have the type and on the edges we have the relative this the relative vector between the two particles so the difference in x, y, z between you know particular i and particular v and the edges are bidirectional so you have one edge in each direction and then this will be fed to a growth network so I'm going to explain on the next slide a bit more how this works but this is a a network as I said which expects to take this graph as input it's going to make a computation that's aware of the graph and it's going to make you know to update the features of this graph it's going to update the features which are on the edges and on the nodes and so in particular the output of this network you have a number of it on each node right and then what we do is you grasp this numbers to the true propensity so starting from this 3D input if you remember we also had run these study simulations for a given amount of time and so what we do is so we do these simulations 30 times we stop them we average how much each particular has moved and that gives us a level and then we try to progress the output of the network which is shown on the right there to this quantity photos network operates so one step of graph network propagation is made up of two steps so first for each edge you look at the adjacent nodes and you update the set of an edge based on the two nodes which it connects to and the previous set of this edge so for a for a given edge I take the previous set of this edge so it will start with this 3D vector but it will be updated to something learned and I take the two nodes which are you know at the start and at the end I concatenate all these vectors I feed them to so networks so an MLP and that gives me a new vector for this edge and then I can update a node so for each node what do I do I collect the previous node set and then collect all the edges which are incoming into this node I sum the embeddix because there might be different number of edges so I cannot just you know and this needs to be invariant on the on the permutation so I have to reduce this so I do a sum and I feed that to an MLP and again this gives me a new node state and then so it is update after I've done these two updates what happens is that you know each edge has a new state based on previous node states and each node has a new state based on the previous edge states and so if you think about it you can see that when you do one step of this you are propagating information at distance one so each node after one step of what I just described we'll know about its neighbors and the state of its neighbors and so then if you apply this network several times and you could apply the same network or you could apply different networks but in any case the information is going to propagate in a graph so you know if you apply the network three times then you can show that each node starts to to be so the state of each node will depend potentially of the state of nodes at distance up to three from it all right and yeah and so in practice that's what we do in practice we have this graph network that we apply several times and you know the more we apply it the more the information can propagate in the graph so in terms of yeah so I'm going to move towards showing the results and so I want to define the time scale that we look at so as I alluded to before this is the so this is the the curve of the displacement or the propensity as a function of time and so as I mean now yeah this slide is maybe not in the best place so so you first have this ballistic regime and at the end it's diffusive regime and we are interested in this glacial regime which is in between where things apparently don't and okay so now let me show that some results so this is showing as a function of time how much you can predict the quality of the prediction basically so we look at the correlation between the predicted propensities for each particle rest as the ground truth all right so if you predict exactly the right propensity then you get a score of one if you do a random prediction you get a score of zero you could have a negative correlation and what you can show on the top of this plot is that there's a gray area so you can't actually which one exactly because of this you know that you still have some randomness in your nuburns you only run certain simulations so you can't get beyond the fluctuations of this number so that's why you know actually the best you can do is this gray line and and the solid dashed line is the typical time scale of the glass relaxation so this is showing you know the typical time scale that you care about so this is the time scale which is in the middle of this plateau okay and so on this plot the three curves I'm showing are standard physics methods so potential energy d by water factor and soft muds and so you can see that they get some positive correlation with the ground truth but that is pretty weak so yeah I think that's understood there and then this was so the green curve was the method that was previously proposed from the machine learning literature so this SVM which is based on handcrafted features and you can see it performs you know better than the the previous methods one advantage of the SVM is that you can do some analysis on it because it's based on handcrafted features so you know it's relatively easy to figure out which features matter and then you can you can do you can relatively for something from it so it's not deep learning at the same time it's not you know extremely powerful you can see that the results are not that much better than the previous physics best method and now this is the methods that we tried so we tried in orange standard conversion network I'm not going to talk about that and in blue this is a graph network and you can see that you get quite a substantial gain in performance there compared to the previous methods a few things to note on this curve you can see on the very short times you can see that the graph network is capable of completely computing the dynamics right so the very short time is not interesting for the physics and it's not you know it's not difficult it's not supposed to be difficult neither because you know each particular has moved only a little bit within its cage so you know that's something that you expect should be computable but it's nice to see that the method does it and can do it and that's you know an indication of the final this method is well suited for to describe physical systems specifically and that the dynamics is well described by these pairwise interactions that the graph network is representing and then you can see this you know this drop in performance and then the performance increases again as the time increases which is something that yeah which is something which is you know maybe was not completely expected and I think is an interesting observation the fact that as you increase the time scale the predictive power of the system does not diminish but actually increases and which which yeah which we would not have a you know I mean you can make some post-explanation but I don't think we have a strong reasoning for why why that's the case but there could be some I mean that could be saying something about a reduction in inflection so as the time increases or the fact that when you get closer to the diffusive regime you know a picture the prediction of density becomes a bit easier because you maybe you are yeah when you get closer to the diffusive regime you know the dynamics should be better described in different densities so that could be a little explanation and these are propensity maps that we obtain with our methods so on the first row you can see so again from left to right you go from short times to large times so the the right most colon is is a diffusive regime and the left most colon is a very short times and then the glassy like the interesting glassy regime is the third column and the second column is by it's the hardest format that's to predict and so you can see the glassy regime is the is an intermediate between a regime which is diffusive where you see kind of pattern that you might used to if you've looked at you know the heated question before and a regime which is more heterogeneous and so the the column the first row is the ground choice and you can see the second row or method is very close to the ground choice visually and this is compared to what was you know the best physics method I mean you know the best physics method before which is a soft mud which only works for the the second column so sort of time dependent method okay so now in the third part of my talk I want to try to explain what kind of analysis we can do on a network and what what this can entail and so how can we go beyond beyond a quantitative analysis of our network so one thing that we can do I mean so we did several of these things in our paper but I'm just going to present to you them now the one thing you can do is train minimal models on data to understand the baseline performance on the input data and I think there's something you know that yeah I'm going to say every machine learning paper should do that something I think you know is very useful to do in general and really gives a good sense of you know how much performance you're actually obtaining so in this case you know what that could mean would be you know train a simple linear regression on just you know counting the neighbors at the distance and you can go you know somewhat far with that so it's actually you know it's giving some decent correlation another thing you can do is so modify the inputs and simplify them and see you know whether the performance is and if you see that some part of the input makes the performance proper logic and you can set that this part is important for the production of the model and you could extract that this part is you know relevant to understand the physics of the model you can train variation of the model with architecture and different hyperparameters and again try to identify which parts are critical and the last thing you can do is take a pre-trained model so on some inputs and fix it and then modify the inputs and see how the prediction changes and have some kind of silencer analysis so I'm going to present things about these two that the second and the fourth options in the end of this talk so on the first one on ablation studies so this is the same curve as I showed before on here I'm just doing the baseline methods of the SVM and no method in blue where we use all the features and then we try to remove some of the features from the input and we train model and see what the performance would drop and so the first thing we did is remove the vectors and substitute them with distances so instead of filling a 3D vectors between two atoms it does fit the distance and so what you can see is that for the very short times you have a drop in performance and again that's not surprising because if you want to predict dynamics on very short times and since we know how network does that very well you know this is showing actually leveraging some geometrical information about the angles between particles and needs to have that to get to the optimal performance but what you can see is that as soon as you increase the timescale a bit more and in particular when you start to reach this not really bless you with gene that you are interested in then you don't need this vector information anymore there are two interpretations to that one is that those are not relevant for the physics so that you could have a simplified model of the physics that would just be described in terms of distances another interpretation is that our network is just not powerful enough to leverage that and so just drops this information but it could in theory you know do better by basing all the features a small caveat here is also that there's a solid hypothesis which would be that you know it's reconstructed the angles from the distances because of the way we define the edges that's you know possible for the network to some extent I don't think it's doing that but that's the solution and then we had a yeah we had another interesting baseline which is the screen cover we removed everything so then we don't feed the distance anymore we just have a graph and so all we have is all the network has as an input is we know which node is connected is under a certain threshold to any other node which defends the connectivity graph and you can do surprisingly well with that so you know as the at very short time so you don't do much but as you increase the time scale you start to do better and this is very consistent with the picture where the dynamics become more diffusive and so if the dynamics becomes more about the density rather than you know the exact details of the distance then you would expect this kind of picture and you can do better if you modify the threshold so if you take a smaller threshold of course you can you are more fine-grained in terms of the estimation of these densities and you can actually you know do better than the previous you know machine learning instead of DRS even in the intersegregation with you know such a simple description and then the second part of analysis we did so now we are taking a trend model and we're just going to modify the input so we've pretend this network and it's making a so once you pretend you can apply to any input of course and so what we're going to do is we take a particular node so we fix a node and we're going to look at the prediction for this node so the central node is on this picture and we are going to remove all the nodes at a distance greater than D from this node so what the network would need to take as an input what it sees during training when it's trained is all the nodes up to a certain distance D and this distance D is defined by the you know the threshold that we use times the number of times you apply the network as I explained earlier every time you apply the network you propagate information you know one edge away and so you can you can compute how many like what's the input which is required for the prediction at the center and so we just remove some of this input and we want to see how does this change and so the thinking here is that if you understand how the network is doing so the network you can think of the prediction on the on the central node as coming from the prediction on the on the nodes just one edge away which themselves have had the embeddings updated based on the on the nodes two edges away from the central node and so on and so forth and so the question here is really you know how much does the prediction for the central node depend on the things that are far away versus how much does it just depend on the on the nearest neighbors and the information that comes from further is just washed out right and these two things could happen the network could just be using the information which is coming from very close and and any other correlation would disappear or it could be propagating information at a further distance and so this is what we see so this is at different temperature I think this slide is again a bit broad let's look just at the bottom right there which is the lowest temperature and the green curves at a very short time so what you see here is on the very short time you can remove things you know and on the x-axis you have the number of shells that you're considering so shells being you know many edges away you keep atoms and the y-axis is the performance the higher the better so what you see on a green curve for the very short time is that you know as soon as you have two sets of atoms and a performance it's very good any plateaus you don't get you don't you don't gain anything by adding more shells and you don't actually that much even by going to one shell which is showing that the network in that case can just rely on the very immediate neighborhood of a particle to make a prediction which is very consistent of course with a short time you know behavior where the particle is going to move around and maybe bump into hits neighbors once and that's all so it doesn't need to know anything about the neighbors you know three hops away and what we see on the and then the other two curves so the orange one and the blue one are longer timescale so the blue one is this very long time regime the diffusive regime and the orange one is in this interesting glassy regime and there you see that you know you have a drop in performance as you should just consider for instance the first shell and you don't do any any any good prediction so it doesn't help you to know how close you are from your neighbors because at this time scale that's not you know that's not relevant anymore it's relevant these things are more global information and from information that comes from further away and you need you know you start to at least three shells to do a good prediction in the diffusive regime and in the glassy regime the curve is even like remain slower for longer and so you you know the performance keeps improving until you reconsider particles from far away so that was a pretty drastic experiment because we were removing a lot of particles right we were removing older particles after a certain distance so more infinitesimal change you can do is to to change the features of the graph at a certain distance so we keep all the particles the same and we look again we look at at a at a certain distance D or a certain number of at least D from the central particle and we are going to extend all the edges between the the shell index by D and the next shell so you know the idea is to push all the particles a little bit further away at a given distance and we want to see whether this affects the prediction or not and again there are two possibilities if the network is making the prediction results but if the networks needs to know about things that happen far away then that will change the results and this is in terms of physics this is similar to a susceptibility because you're looking at the infinitesimal change in a in a in a quantity at a given location based on the change of the of the system at another location so the difference maybe with the usual susceptibility is that it's not a true point susceptibility but you're looking at you know some kind of point to set correlation where a full set of atoms change and a full set of distance are slightly increased and look at the the response in one point and what you can see on this one so this one has sweet temperatures let's start maybe with the the lowest temperatures to the temperature we have the thing which is the most which has the the most classy behaviors on the bottom right again and what what you see here is the response of the system on the y-axis as a function of where you apply the but this perturbation so on the x-axis from closest chain to like chains further away the green curve again is the is the very short time prediction and there you see the same as before if you modify things you know beyond the second change doesn't change your results at all so you can do this modification the networks just ignore it because this information is completely washed out as you apply the networks several times what's interesting there is the orange curve which is the one in a glassy with gym where you have this peak in the in the change when you go to three, four four chains away and so this is showing that the network you know even if you do a small change within the third and fourth channel this is going to impact the prediction at the central particle so this information is going to be propagated by the network it's not going to be something like the perturbation is not going to decrease it's not a perturbative change it's going to stay there and to be propagated up to the the central particle and this is more this effect is more pronounced for this glassy regime that's for the longer time scan so the blue time scan is diffusive regime so this curve really you go from green to orange to blue right so you have this kind of peak in the correlation when you are in this intimate time scan and then we looked at this so this was our lowest temperature but we also looked at higher temperatures to see how would the effect change and you can see that that very high temperature I mean a high temperature on the on the top left the effect is far less pronounced and you know you have a little peak but almost nothing and it increases as you as you decrease temperature and this is of course so why are we doing that so this is one way in which you could you know try to define a correlation length and you know what this is hinting towards is that you know I mean this is suggesting that it could be a correlation length around the distance corresponding to three or four shells in this instance and that this correlation and so and this correlation length you know would appear as you lower the temperature and then you know what would be interesting is to study how this correlation length varies as the temperature is decreased uh in our case yeah we didn't really go beyond that because we I mean we start to have final set effects so that was one reason that our system is uh is constrained in the box so you you know you can't do like we are like this correlation like it is starting to be at the order of you know half the box length so we can't exclude you know find a boundary conditions effects and and so on and you would want to put lower temperature as well and to have a you know greater range of temperature so we'll be able to make it to do a plot of this correlation length as a function of the temperature but this is a good indication that there might be something there that the network could be picking up basically so the idea there is that you know there is maybe as I said if you come back to the the question in the beginning were you know we know that dynamics gets much slower but we don't know how this relates to the structure and we you know we don't know how to define a correlation length that that captures that we don't know what to measure on the structures to explain the difference in uh in dynamics and one way to look at this result could be to say that maybe this network could be starting to you know to extract something from the structure that explains the dynamics that you can you know this isn't this result is very destructive based now it's based on the network and it just gets you know one single time input and so maybe the network is reading from this input something about the correlation length maybe it's the combination does is implicitly extracting a correlation length conclusions I think explain why broad networks were well suited for the modeling of particle systems and one thing to emphasize here is that we did not know directly a simulator because we are learning very fine in the future right the time scale that we learn is a time scale because I've collided you know thousands of times so it's not that we learn the details of all these collisions and this is also why we are averaging we're using the propensity and averaging of the velocity so we are learning some constant version of the simulator if you wish which and the fact that we use these broad networks and the structure models allows us to get some qualitative insight into their inner workings and to correlate with physical quantities another aspect that I've not shown here is that all systems can generalize over physical quantities so we've tried to vary the temperature and see you know can you train a model to give a temperature at a rate and apply it at another temperature and we've shown that this was indeed working so there is a range where things will generalize and as you lower the temperature generalization gets better which was also a good sign because you have this national for university class of class and so that's where you want to be and so I know some physicists are keen to apply this to system that are too slow for them to study and yeah yeah there are some natural systems so to which you'd want to apply this and go and go further than what we did and finally what we try to to do with this work is to show that the machine can be used not only to make quantitative predictions but also to gain some qualitative understanding of physical systems and yeah with that I'm going to thank you for your attention great thanks a lot well thanks to you for the nice talk so we have time for questions if anybody wants to ask something please either type it in the chat window or raise your hand and then we'll take it from there okay if they're not then it's my chance wonderful or Peter you have a question yes so I I very much enjoyed this talk and it reminded me of another talk we had in our seminar last year on the structure formation in the large-scale universe and that was work by a few people from Shirley Ho's group at the Flatirons Institute I wonder if you had seen that work because it similarly looks at a box of particles and their evolution I guess the particles are much larger yeah they are sort of like maybe galaxy size but the ideas of pursuing the dynamics and the distribution of the matter after a fairly long time intervals maybe somewhat similar I think they used very different networks than you used but I suspect you know much more about it than I do so but I thought I'd raise this as a discussion topic yes I think I know what you're talking about it's an interesting work so yeah you're right I think they were using I mean I hope I don't say something wrong because I don't remember the work completely but if I remember they're using some convolutional networks and I suppose that's because if I remember they're working on densities already so I think there's a question of so we describe a particle system so that's why we use this kind of you know network operates on objects but if you have more objects and depending on the scale you look at you could also look at density so we could you know make our system continuous by dividing our box into you know subunits and compute the average density in each subunit and try to learn on this and I think that's what this other work that you're mentioning is doing if I remember so and they use units which is a convolutional network with a bit of a feedback I think and they do sort of track them more or less as particles so one of the beautiful things is that it was quite difficult for them to project the predicted straight motion but the rotation of these large structures around each other was captured very well by the model and if I remember the model is learning a correction on some simpler model of the dynamics right that's correct the it it learns on top of this straight motion because the straight motion was difficult to to put in the model and so it's I think that's called the self of which or so model and then they they modeled a refinement of that that includes the rotations due to gravitational forces you know it just has a superficial similarity and also the problem scale is similar I think they were talking about exactly the same number of particles in a in a box that that you describe so I think one one thing one question about our work is that we don't predict the the so it's we're not really learning a simulator because we don't we go from this you know we go from this particle so this propensity which is some average notion of where is the particle going to be in the future but it doesn't allow you to sample a configuration in the future what we do right so you get some you know average position but why yeah so I think one interesting question is whether or not there is a way to go back from this you know average positions to a real configuration so could you could really reconstruct a configuration that looks you know reasonable that has you know the the right quality mass under the Boltzmann distribution or from this average prediction dynamics I think that would be that's something I have been thinking about which I think is difficult because in our case I think it's hard to realize the concern so you know where each particle you know wants to be but when you start when you try to reconstruct you start to put particles you know nearby and then you know you start to have concern or you have too much particles close together like then and it's very high and it's very hard to resolve this constraint which I think comes back to the the you know the interesting difficulty of this program which is that it's a very concerned physics program where yeah but I think that's and that's another interesting difference between these works so while I as I understand it the system you can unroll it many times in the future so if you have learned something that goes from there initial time to the final time you could have played a second time and you know go further in the future if you find the so they work correctly while in our case we cannot do that because we don't get a configuration we get some average over you know the velocities and some average configuration if you want and maybe let me latch on to that point so you said before that you train your model to predict the propensity at some future distant time and and in a sense you're completely sidestepping the entire time evolution as it occurs right so do you have some do you have some so do you have some intuition of what the model internally is doing like when you apply it well does it have some analog of what would normally be called a time evolution or is there really the possibility that you can predict the propensity without looking at it you know as as an evolution through time but you you know directly have the answer in the sense yeah uh yeah I think I think that's a yeah that's a very good question and I can't answer with more than you know my way to think about this and yeah my thinking is that the system is not doing the time evolution but that's just you know I don't have any any evidence for that except that you know maybe one evidence would be to say that the fact that the system get gets the same performance when you just fit it the distances instead of fitting it you know the the vectors and also remember that this system never sees the the velocities so it's always going to predict something which is average over the initial velocities so you could argue that the system might be doing this average you know implicit like the system could potentially you know reconstruct all of this and prefer you know I think the system you know was large enough you could imagine it's something velocities or it's you know creating a an array of velocities and then doing the dynamics for all these velocities and averaging the output and would be doing the dynamics implicitly but I think that you know all the evidence is not going in this direction it would be more natural to think that the system is doing something different and I think that's where you know there's this nice duality in the when you pilot graph network several times you can think of it as doing the dynamic I mean one way to think of this of this repeated application is that it's doing the dynamics and you're going further in time so each time you apply it you go from one type step to the next but another one another way to think about this is that you're doing some constant propagation for instance you could think that network is doing something about you know reasoning about mobility and whether you are particularly stuck or not in in position and that when you played several times you do an update on this constraint so you could think about a rule like you know I'm stuck if all my neighbors are stuck or something like that and then you would apply that right and then and that would converge to something and I think that's one of the nice thing with the graph of network is that because we apply the same network several times you can see that as as a time evolution or you can see that as a sort of fixed point equation where you will converge toward the fixed point yeah I see thanks okay there's a question by Juan in the chat do you want to type it in or do you want to okay go ahead I'll say it hi so what I was wondering whether you show these pictures of dynamic heterogeneity from one of our papers actually and the key there is that you have these blue and red that you showed and that's you have particles which are fast and ones which are slow yeah so does your network perform better on the fast one or the slow ones when it predicts good question yeah that's a good question I'm trying to remember whether I don't remember us doing this analysis although that's yeah that sounds like an analysis you'd want to do I don't remember I mean one thing to note is that the network because of the way it's trained so our network is trained we saw you know standard L2 less on the propensity and so it it has a prior for predicting things close to the mean so the details of the distribution are pretty so yeah I can pretty sure that how many of them still presenting oh I don't have this picture actually but yeah what you would see is that you know one thing to note is that the when the correlation is good you know the the extreme events tend to be that the network will tend to to not capture completely the tail of the distribution and it would tend to put its prediction closer to the mean than they should be which is why you know but which is why I mean when you look at the correlation you don't you don't really see that but but in terms of the error yeah that's a good question I don't think we have an answer to that I don't think there was a very clear trend I think when we try to look at this we didn't see you know anything striking but there could be something and and because I would have expected that it should work better with what's low you know have these low clusters the blues regions in that in those pictures and so the particles haven't really moved from where they were before so your network has not be formed because one thing that would happen on long times is that's the structure that you started from the connectivity structure you started from over long times you know these systems are ergodic right so the structure that you get much much later is so different that it has to be a limit to how long you know how long you can predict for is that's more or less correct as a statement yes it is but I think the what happens is that the network is looking sufficiently far you know the network is unrolled we weren't rolling a network several seven times in the process I was showing so the network can capture information which comes from relatively far away and so I believe that the regime where the graph would be you know too different for the network to make a good prediction would pretty come you know pretty after the times that we studied so I mean that could be an issue that's correct but yeah but I don't think our network would necessarily be hitting that I mean again if you take the view that the network is doing some iterative computation you could imagine that the network is you know capable of propagating some updated information about where each particle is and so you know that I mean I guess I'm saying that that's not necessarily that's something that the network could potentially be dealing with I think what's happening I think in our case it's probably something which is you know of fairly low effects that doesn't necessarily improve the results on the time scales as we look at okay thanks thanks for the question okay anything else any more questions if not then I think I should say thank you again on behalf of everybody and on behalf of those people who have asked questions it was a great talk I've learned quite a bit so thanks a lot to you Victor and I think see you next week yeah thanks for organizing bye bye-bye