 in spin systems, so thanks Santiago, take the lead. That's it. Okay, this is work done with Alessandro Laio at CISA, and here's a little introduction to intrinsic dimension. You can hear me correctly. I have this one. So this is an example of uncorrelated data in continuous R3. So the lack of correlations makes data occupy all the space, but it's usually the case that correlations make data be uniformly distributed in some manifold that has a particular dimension here in this case too, which can be lower and possibly much lower than the full embedding dimension of the space. And so this particular manifold is curved, but it can be also topologically untrivial. But the important thing here is that the intrinsic dimension is smaller than the full embedding dimension, in this case three. This is another feature of the intrinsic dimension, so if you look very close, if you zoom in too much, you might see that your data is approximately two dimensional, but actually if you zoom out, then you see that actually your data is in a line, right? So the intrinsic dimension at large scales, it's one here, but at short scales might seem like two. Okay, I'm going to speak about so-called binomial estimator to be able to estimate this intrinsic dimension from data, and I'm going to first introduce this paper, developed in the group recently, and then we are going to generalize this paper. But in order to estimate the intrinsic dimension, we have the following setup. So we are going to assume in a genetic domain D that our points are uniformly distributed, but we don't say which D we have. We are going to take a set A in this domain, and we are going to assume that we can assign a volume to this set A. And the strongest assumption or setup is that we are going to say that the probability of finding NA points in A, it's a Poisson distribution with mu equal to rho times VA. So in this sense, rho is the density, right? Because it's the average number of points over the volume, and so with this setup, we are going to see how we can estimate the intrinsic dimension without knowing rho. And so I would like to, I see, I would like to use the board a little bit. So given this Poisson distribution, we are going to consider actually the probability of observing NA points in A conditioned on observing NB points in B. And I'm going to say what is A and B, no? So given a point in your data set, we are going to take a ball of radius R A and another ball of radius R B in this point from the data set. So I, it's in A, that is a subset of B. And so we are going to compute this conditional probability. And we can write it like this. So we are going to write the joint over the marginal, but we are writing this. P of NA in A times the probability of having NB minus NA in B minus A. Because if the sets are not overlapping by construction, then the probability factorizes, the two random variables are independent and just the marginal below P of NB in B. And so this is Poisson, this is Poisson, this is Poisson. And our hypothesis will be that, that rho is constant and is constant throughout all the external ball, let's say, and so rho is the same in A and B in such a way that when you simplify everything, then you just get NB binomial NA and P to the NA, one minus P to the NB minus NA. So this conditional probability, it's binomial, that's why the name of the estimator, where P is just VA over BB and rho doesn't appear in any place. And so given this, this is what I just said, just in case, so we are going to consider maximum likelihood estimation setup, the usual case, so having NS independent observations, and so the likelihood like this will be factorized. So all the binomials for every data point, and if we do maximum likelihood estimation, then we are going to arrive to this formula, eight, this equation eight, which actually has a very nice interpretation. So the interpretation is as follows, whenever you see in your data set that this quantity, this expectation value of the number of points in the set A that you can measure or you can estimate from your data set counting how many points are at the given inside the ball for a given radius, then you count and you estimate this, you count and you estimate this, and whenever the ratio between the two expectation values matches the ratio of two volumes with the corresponding radii, in some D hat, then your data set, it's uniformly distributed in a D hat dimensional space, which is kind of the interpretation of this equation, see. What is the I index on the left-hand side of six, and what is the little d with respect to which you take the derivative in seven? Good, the little d without the hat is the true intrinsic dimension. The hat is the estimator of that, and the I is the index of each data point. On the left, they should be no high, no high, right? Or it's a big, okay, it's the n symbol of, so I'm slightly confused. Yes, sorry, you can put back, for example, depends on all the n, A, I. But so one sample is a set of points or it's one point here? It's one sample, it's a, yes. Yes, you have your data set, and you pick one. At this point, you take a ball, and you count how many are there. But this is point I. Please interrupt me if there are any more questions. Through the volume, into the function L. Exactly, L depends on the volumes, and the volumes are taken in some domain D, D matcal D, and so here you have inside the D. But this is completely generic, so we are not taking any space in particular for the moment, we are going to choose that later. So in particular, if your data is real valued and it's in RD, then the volume that you have to choose is the volume of a sphere in RD that is proportional to R to the D. And the omega is just the integral of the angles, just that. But we are going to work with spin systems, so we want to deal with discrete variables. So if you have variables that are in Z to the D, then actually you shouldn't use this volume, you should use a volume that basically counts how many points are in a given set in Z to the D. And so this is an expression that goes by the name of Erhardt Theorif polytops, it comes from there. I'm not going to justify it, but I'm going to explain why there is a diamond there, just to understand a bit what we are doing. So we are going to stick to the case in which we are working in Z to. And so we have a data point, a point in the lattice, this is just for the space, this is a geometrical argument. So we have a point and we have to define volumes in this place. And so the volume will be the number of points in the set. And so the number of points at distance are equal to zero is just one. Let me put it here. But at distance one, you have these four points and at distance two, you have using the L one metric in Z two, you have all these points. Well, this one. Because the distance two is the graph distance, let's say, or the L one distance is not this one. And so the number of points at distance one is four, the number of points at distance two is eight, if you count them, and the volume for a ball in this place with radius two is 13. So if you put these numbers in that equation, you get a 13. Is this clear? Then the last piece of theory that I'll introduce before the sum results is the following, what they call model validation. So we're claiming that given our set of hypotheses, our construction, these conditionals are binomials, okay? But we have data and eventually we can count and take the empirical probability of NA in A, just counting in every point, and take the, we can estimate the empirical probability of NB in B, the external ball. And if everything is correct, then we should see that these two quantities are equal in equation 12. So this empirical probability should be the sum of all the conditionals. This is the law of total probability. And if we don't see an agreement between the two sizes computed in the data, then we have a problem. And so, this is the first test that they did using this estimator in the simplest possible case, which is a uniform density data set. And they saw the following. So they worked with two variables on the left, just two variables taking values between zero and 50. So the configuration space, let's say it's a plane or a chunk of a plane or 50 by 50. But they took periodic boundary conditions to avoid edge effects as much as possible. And they see that this orange line, it's a perfect two. So they get the correct number in this case using this algorithm. And they check that actually if they compute the cumulative distribution function of the number of counts, then they see a good agreement between the empirical, that is orange, here below, and what they call the N model, which is the sum of the binomials. So everything seems under control. You get the two and you see that these two curves are exactly equal. They do the same in a slightly more different case, which is equal to six, but it's exactly the same. Now the data is uniformly distributed in a solid cube, in a solid discreet cube in six dimensions with periodic boundary conditions. And again, they get the correct number. But a less trivial example can be the following. So here they take fractals, a chunk of a fractal. This is called Koch curve, and this is called a Sierpinski triangle. And so these fractals are continuous. They are constructed with continuous variables, but they basically take an image, so they discretize the fractals. And then basically they can use this algorithm, which is meant for the discrete variables if we pick this setup, right? So they wanted to see if the algorithm can handle fractal dimensions and it can. Eventually when you enlarge the scale as much as possible, where scale here means the external radius, right? That tells you what is the scale in which you are seeing the system or the data. See? Exactly. Yes, the scale is the radius of the external ball. And so if your balls are too small, then you can see, so that means zooming too much, then you see that the data is more or less two dimensional because of the wiggling of the curve if you look at this one. And if you make the scale longer, then you start to see the true dimension of the system that it's this quantity. Okay, any questions? Yes, it is. Fractal dimension. Just they compared to know to see the similarities or not between the algorithms. But this is what they did. So because it's not a curve, it is called curve, but it's a factor because it's not a curve. So what we want to do and what we did is study the intrinsic dimension in spin systems. So we are going to consider this lattice in particular the honeycomb lattice is completely relevant for the results. We are going to put binary variables in each one of the vertex and we are going to take the energy to be the first neighbors anti-ferromagnetic coupling. It's called, so here there is a plus and so the variables want to be exactly opposed to its neighbors. So every spin wants to be opposite to its neighbors. And we are going to consider just a standard setup, the equilibrium probabilities. So the probability of having a particular configuration, sigma bolt symbol is just this one, the Boltzmann weight. So the exponential of minus beta energy where beta is the inverse temperature. And we know for this system that in the thermodynamic limit there is a second order transition between a disordered phase and an order phase. And the transition is characterized by exponentially power law, two point correlation function, scale invariance. So it is a very well known transition. And in particular our simulations will be just for 2000 spins, which is a pretty small system. But yes, because it's not frustrated. If you just make a plot of the lattice you can perfectly put an anti-ferromagnetic state without any problem. So here is plus one, minus one, minus one, plus one, minus one. And when you close the cycle, then it's again, ah, CC, now if you sum all the spins you get zero. But it's the magnetization in sub lattice A minus the other. Because in sub lattice A is the left. And so if you sum all the lefts, they are all up. And all the others are all down in this step. And so this is plus or minus one. You have two ground states. That's the Z2 symmetry of that Hamiltonian. So this is our system. But we want to use this algorithm that works with the distances. So what we have to do is compute all the distances between all our configurations. We did that. And this is the first thing that you have to see in order to choose the scale. Because nobody tells you at which scale you have to look at the system. But if you compute the distances, I have to explain first what K is. So these are the empirical probabilities. So the histogram of the distances. The distances between configurations. So if you have just three spins, and this is one, zero, one, and this is one, one, zero, the distance is two, okay? This is the distance in configuration space that has nothing to do with the distance in the real space of the lattice. That is irrelevant for the moment. But this K that I'm defining there, it's the following. So we have the whole system and we want to look just a chunk of the system. A chunk defined by a value of K. And so given a central spin, that can be whatever because we have periodic boundary conditions, we pick a spin and we take a chunk in the following way. We consider that spin and all the spins that are neighbors up to order K. So these three are the first neighbors, but second neighbors are this, this that is length two, this and so on and so forth. You can take the neighbors of order K. We use the Floyd algorithm to know what spins are those because in this lattice is annoying, but you can pick them and just count the distances between chunks of the system. And so you see the following. When you look at a little chunk taking K equal to 10, you are gathering basically 166 spins. Doesn't matter. And you see this peak at high temperature. So this is in the disorder phase. Yes, actually the L1 distance that coincides with the Hamel distance for this setup. But yes, exactly. We have many configurations. Each one of, let me call it N of K spins that depends on your K and we compute all the distances that we have in our dataset and we make the histogram. And the important thing here is that for example, if I take K equal to 14, then if I want to use the scale D equal to 50, then I will see exactly zero counts always. And so I can't say anything about the system. So actually this histogram tells you which is the scale at which you have to look at the system. It's not free to choose basically. So what we do in order to be able to move K as we want, it's defining RB to be the quantal of order alpha B. That means the point at which the cumulative distribution function of this PDF reaches a value of alpha B, which is arbitrary. And we take it more or less like a half or less. So a half is exactly at the peak and a bit less is a bit at the left. And we take RA to be just a fraction of RB. Again, this is arbitrary and a reasonable value seems to be C equal to a half. You can move alpha B a little bit and C a little bit and you will get the same results as we do. But these are reasonable parameters. Okay, then if you load the temperature a little bit, what you see is two peaks because near the transition, you start to see the transition is around 1.52. This is 1.5. So near the transition from the left, that's the transition in the thermodynamic limit. So near the critical point, you start to see the two ground states of the system. And so you get two peaks because you have the distances for the configurations that share the same magnetization that is low distances, so the left peak. And you also have the distances between configurations that have the opposite magnetization. So that gives you the high D peak. And eventually, if you load the temperature more, you just see one peak because the simulation peaks one ground state, breaking the symmetry spontaneously. And you only see one peak for low distances because the system is correlated and all the configurations share a similar magnetization at low temperature. So it is more or less order. Okay, then what we do is a monitor. Our estimation of the intrinsic dimension that is D hat normalized by the number of spins at the given K, we plot that versus temperature. And we see a minimum around the transition. And this has to be compared with this plot done by Diego Mendes Santos, Marchero Dalmonte, Alex Rodriguez, and collaborators in SETP here. In this case, they looked at the square lattice in the ferromagnetic case, but it's completely irrelevant the difference because the transition is the same for the two cases. And they see a minimum doing exactly the same but computing the intrinsic dimension using 2NN, which is another estimator developed in the group of Alessandro, that actually is meant to work with continuous variables, but they use it nonetheless and to see what they get. And they get this and we get this. So the nice thing here is that they did a thorough study of this minimum, how it moves when you increase the scale, the L, the number of spins of the system. And actually they can, for example, with that scaling differentiate between second order transitions as this one and for example, Berezinski-Costerlitz-Thouless transitions. That is the transition of the X, Y model in two dimensions. Doesn't matter if you don't know the name, but in that case they also see a minimum, but actually from the scaling of the minimum with the size they can differentiate between the two scenarios. And what I want to say here is that actually these two results, actually it is well known that D hat and the intrinsic dimension completed with 2NN are lower bounds of the 2D that you don't know. And we can see this very evidently here because this is the normalized D hat. And in the paramagnetic phase, this number should be close to one, not exactly one because there are correlations in the data, even in the paramagnetic phase, but those correlations are exponentially decreasing with the distance between two spins. And so this should be close to one, something like 0.9 and it's 0.06. And for this case, L squared is more or less 20,000. Yes, and actually they get an idea that is 200. So it is in fact a lower bound. But what you can do here actually, it's do a scaling and here we plot D hat squared versus the number of spins that should be N of K basically for different temperatures. And we see a perfect line asymptotically with the number of spins for D squared. That means that D hat it's basically the square root of N, at least the scaling, right? And so this is very nice. We see the same scaling using to N, we checked. And we have analytical results using this framework that actually only with geometric arguments can predict that the estimator will give you actually the square root of what you want. But those results are not yet a formal proof so they are not here at the moment. But hopefully they will be soon. But so what is the problem? Why do we get the square root? The problem is that icing spins are discreet, yes. So it is good to use a discreet estimator but they are not in Z to the D. They are in 01 to the D actually. That means in the corners of a hyper cube, a hyper cube that doesn't have bulk. So all the spins are in some sense, in the edge, in this configuration space. That creates spins and go. Less mutes. No, participants, otherwise I continue. Okay, so I was saying, so we have to actually don't use volumes that are for this setup because we are not in this setup. We have to use volumes for our setup, for icing spins. And so we define this. So having a configuration with only three spins or without spins, doesn't matter. You can ask, how is a volume defined here? And we are going to say that the volume is the number of points in the set. Because it's discreet. And so given a configuration, whatever configuration, you can ask how many points are at distance zero in this space. And there are exactly one because you cannot do anything. It's the same configuration. At distance one, you have to flip one spin. That gives you humming distance one. And distance L, in general, the number of points at distance L is the binomial L because you have to pick L places in which to flip the spin. And so the volume will be the volume of a ball in this place. Using this metric will be the sum with L prime from zero to L of this quantity. That gives you how many points are in a ball here with this metric. And so you can compute this and you can get another, again, a hyper trigonometric function that is this one, which is different from the Z to the D case, the arguments. And you get, again, this binomial factor. And actually you can make a normal approximation of this sum if you think of it as the cumulative of a binomial with P equal to half. P equal to half. Then you can write this as a cumulative of a Gaussian. But it doesn't matter. The thing is that we have a nice analytical expressions to work with this framework. And we can actually go again to the phase transition, exactly the same phase transition, but using this that we called the humming volumes. And actually we see another shape for the transition. So this is the same. This is d hat normalized versus t. And we actually now see that in the disorder phase, this quantity is close to one. And up to finite size effect, it doesn't matter. If you take the k equal to eight, 10 to 11, all they give you more or less the same. And additionally, yes, sorry, I should have said if they are all disordered, the intrinsic dimension usually it's the number of parameters that you need to describe the system, the minimum amount. And at infinite temperature, they are all disordered. They are all independent. And so if you have n spins, you need n variables to describe the system. Moreover, if they are all independent, then they occupy all the space because there are no correlations. And then you need the full embedding dimension, which is n for the system. So just as another remark, here we computed, I computed the standard deviation of the estimator taking a hundred realizations of the system in the following sense. So each time I take a realization, that means I take 50 samples at random from the dataset and I compute my d. Then I do it again and again and again a hundred times. And so for different number of samples, each time I get a value of sigma. And we see that actually sigma is biggest exactly or around the critical temperature of the system, which is the expected behavior because the system has maximum fluctuations exactly there. So it is the correct behavior that it should have. And I forgot to say that this step behavior, it's also, we claim, the correct behavior that the d should have because in the ordered phase, the system is highly correlated and basically the intrinsic dimension, it's like the opposite of correlations. When you have low correlations, the intrinsic dimension is high. When you have high correlations, the intrinsic dimension is low. This is the step that you should see if you do things properly. If you see the two ground states, then you need one parameter. If you only see one ground state, then you need zero parameters because you don't have a bit, you have just all zeros or all ones, exactly. Okay. And here starts a section that I call the thin shells in the following sense. So the thing is that when d is large, when d is large, this implies that we saw, as we saw, sorry, large distances. So when you put more independent spins, the histograms move to higher distances. And so I saw you do it. What did you do? I think you should have moved there because that's, that might be me here. Do it again. I'm not able. I'm not able. Maybe you have to, this one, I think, yeah. Thank you. Okay. So the larger the intrinsic dimension, the larger the scale at which you have to look at the system. And actually we have an hypothesis that says that the density should be constant in all our balls. But the balls are larger and larger so that the hypothesis might be compromised. So what we did here is just taking, instead of balls, A will be just the set of configurations at humming distance exactly a radius rA. So instead of a ball, A is just a surface. And B has to contain A by construction. So B, we take it to be the same surface and a little surface with a radius of rA plus one. So if you do this, actually, you have to compute what I called P there. And the good thing is that it simplifies immensely. So instead of hyper-triconometric functions and gammas and all that, you just get this, which is the scale over D plus one, the true D. And the fact that this quantity is trivially less than one tells you a nice thing that is that the scale at which you look at the system shouldn't be too big. Of course we don't know D, we have to infer D, but this tells you that you shouldn't go too far with your radius in general. So using this, we can do maximum likelihood estimation again and we have a different equation for the D hat, which is just this one. So D hat is the scale at which you look, re-scale by these quantities that you measure, okay? Minus one. So it gives you this and you can go and see what you get with this little modification of the algorithm. And we did this actually for the ferromagnetic icing chain that it's not too different. So the main difference is that it's a one-dimensional system but you just put the spins in the chain and they are this time coupled ferromagnetically because there is a minus J and you can think of J being one. And so the spins like to be now parallel to its neighbors but they are at a finite temperature. And the good thing about this system is that it is analytically solvable. You can compute exactly Z even for finite chains. So this is a nice model to test many things when you are learning how your algorithm behaves. So that's why we picked it. And so this is the histogram of distances for this particular chain simulated with the Metropolis algorithm again as usual. And so for a 40 sites chain we see this. And I have to make the comment here about this in particular. So more or less behind 10 and above 13 we see mainly no counts. So we won't be able to say anything if we are to the left of 10, more or less and to the right of 30. And so we can use this Finch-Hells estimator against in this case to the scale for a given temperature which in this case is just infinite. So we get a plateau at 40. And this is the correct value that it should have. And actually this is not a trivial result in the following sense. So the space of configurations have two to the 40 configurations. And we have 3000 samples. And we can say that our estimation is 40 with confidence. So this is actually a very nice result having only 3000 samples. We don't get the square root, we get the correct value. But this is for infinite temperature. When you put finite temperature, that means that the density is not constant in all this hypercube. So each point in the hypercube has a corresponds to a given configuration. And each point it's a, for example, this one may be the plus, plus, plus. This has energy minus three, right? So each point has a different energy. So each point has a different probability of being occupied. So there is a shape here, but in n dimensions. And so that induces that we don't see a plateau. For the moment, we are just seeing a local minimum. We see an asymmetry between short scales and large scales that is not that surprising if you use the result, the heuristics of the last slide. And we are working in order to understand this plot exactly. For example, we don't know why all the curves cross at 20, but this is what we're doing now. And the good thing to notice here is that below 10 and above 30, so the curves gives you, for example, a value greater than 40 that makes no sense at all. So when the algorithm doesn't have counts, it gives you infinity, basically. So this is the model validation again, taking the purple curve here around the minimum and here in this completely wrong estimation, which is 10, the radius Rb is 10. And so if you put the radius to be 12, which is more or less in the minimum, you see an approximate agreement between the two cumulatives, the empirical and the one that follows our model. And in fact, if you put the radius to be smaller, then you see that always systematically the binomial CDF, it's above the empirical CDF of the counts in the shell that I call A. This is it, I'm not sure about the time, but okay. So the conclusions and perspectives are more or less this. So we are using, we saw how the binomial estimator works for data that can be continuous, can be discrete in Z to the D, or can be icing spins. You have to use a different metric each time. We saw how to use this to spot phase transitions on spin systems. What we are doing now, what we have to do now is to explain exactly why we see this behavior with the scale. We would also like to study the dependence with the length of the chain and also looking at chunks of the same length chain as we did with the honicum. What we can also do is brute force calculations for small chains in order to have the exact probabilities, all of them in order to compute the expectation values analytically instead of measuring them to avoid any possible problem with the simulations. And what we would like to do after this is actually see if it's possible to take data that might be continuous, describe it in binary, so with spins, and then use this estimator to get the ID in the binary representation of the continuous data. This, if it's possible, would be very nice. And with this, I'm done. Okay, so thank you very much, Santiago, for the nice talk. Are there any questions for Santiago? Yes. Thanks, it was very, very nice. I was surprised that with so few configurations of the spin system, you can estimate this non-trivial quantity so precisely. And what it makes me wonder is, did you try to investigate the same kind of experiments in glassy systems where in spin glass phases, it's well known that it's very hard to estimate quantities like complexities, counting the number of metastable states and these kind of things. And I wonder if this could be tried, maybe with a small number of samples as well, you can estimate these kind of quantities. That would be great. It's not done yet because we are trying just to take the simplest possible cases in order to understand fully how the algorithm works. Once that's done, then we can see more interesting systems. Of course, that would be our... So curious. I mean, a lot of realistic data doesn't binarize so well, right? So I wonder if, would it be so hard to redo this with, instead of binarizing, where you give it four different values, like you have a discretized spin? That would be more or less like a chunk of Z to the D, right? Yeah. That is okay. You can do that, but it might happen that you see the square root of the true D. So we would like to use spins because we see the good number when we estimate on the known case, right? Because you have to know what volume to use, otherwise it's just a lower bound. That's the... The thing is that the calculation that we have, it is, I have a scale argument to explain why you should get the same scaling as for icing spins. So if your system has four colors, four states, then one variable occupies a space that is not a full line. It's just a chunk, right? So you have four states. And when you put some variables, then you are forced to go to larger scales, right? As I showed. So when you put large scales to measure the system, then your four, it's the same as a one. So you get the icing scale. That makes any sense to you? Not entirely, but maybe we can talk about it afterwards. Thank you for the great talk. So I was wondering, because as an example of a system that could be easily vinerized, like typically with like neural systems, you can say, okay, so a neuron is either active or it's not active, so you have a zero one. The problem with this type of network is that you don't really know the underlying structure. So you don't know which is neighbor of which. So do you have any idea on any possible extension of this method that could use not like Euclidean space, but some type of, I don't know, like network or like correlation distance to apply to this, maybe like more biological data that could still be vinerized? I'm not sure if I understood, but your distance is between the physical realizations of the bits or is in the hamming space, in the hamming hypercube? Yeah, so that's the thing. Like you can, sometimes what I have seen done is since you don't have really, like you don't know which neuron is neighbor of which, you can still like say, okay, I'm gonna take a distance between neurons as like neurons being more close to each other when they are more correlated. So you can take distance as a measure of like correlation, so to speak. Yeah, so I was wondering if you could maybe like somehow estimate a dimension of your neural manifold using, I don't know, some variation of this. Might be, yes, yes, because that distance, it is more or less like the graph distance that you don't have in that case, but the other distance that we use, so our metric with hamming stuff, it is for the configuration space that is the same, no matter the graph of the physical realization of the bits, right? Okay, yeah, thank you. So I think it might work. Okay, cool. Okay, so if there are no further questions, let's thank Santiago once again. Thank you.