 Yeah, yeah, sure. All right, all right. Nice. Before we start, just the additional details on the afternoon that I promised. So we'll have a few talks. And then there will be the talk on Wednesday. And then at 5.30, we'll meet down there at the bus stop for our hike. And I should say, really, that even though we both hike in the new program, it's a massive reverberation. So we'll go up here. We'll see the garages of Miramar. And then we'll just walk along the city side. And then we should get there at about an emotion. And it's like the young people that we walk like the old people. So I'm very happy to introduce Francesco, the first speaker of the next. Thank you for the introduction. Hi, everybody. This is the session after lunch. So I'll try to keep it as light as possible. And yeah, as Marco said, I want to explain the title a little bit first. So I will talk about matrix factorization, but we will introduce a method to analyze, let's say, algorithms provided that there are some that are supposed to work in a certain scheme, a certain framework that we call the summation. With some techniques that come from the analysis of neural networks of associative memory, whose maybe most celebrated example is the Hopefield network. So I just have a couple of introductory slides about it. The Hopefield network is a recurrent neural network that is tasked with the memorization of p-patterns that are n-dimensional vectors, this bold sine mu here. And we say that the network works if, I mean, provided we give it sufficiently, well, an input that is sufficiently close to one of the p-patterns, it is able to recall it in some sense that we will see in the next slide. So this memory effect is, in particular, it can be given through this Hebbian matrix here, this Jij that you can think of as the weights of this connection in this complete graph on the left, between the neurons. And this matrix is such that if you are lucky enough, which means if there is not too much noise in the problem, then the patterns that you want to recall, if they are more or less orthogonal between them, and also here there will be a thing that we discussed yesterday with Spencer, this almost orthogonality here will play a central role. So if they are almost orthogonal and there are not too many, the patterns will be the ground states of this energy. So this is the way you are supposed to. I mean, you must use this fact to recall them. So how do we do it in practice when it is possible? Well, one can run the so-called neural dynamics. So you generate speed configurations starting from a certain initialization that has to be good enough. And in the end, hopefully, you will convert somewhere that will look like a pattern, more or less. Or you can run simulated annealing. It is another routine that aims for the ground states of a given energy. So if the ground states correspond to the patterns, you have found one provided you start from a good point. Or again, you can run message passing algorithms, like approximate message passing. And there are some experts in this room. So I should be careful. OK. And you run this AMP's algorithm. This is, in practice, what I did to test my computations. On this, Boltzmann Gibbs measure, which is proportional to this Boltzmann factory, to the minus beta energy. And when beta is very large, this low, sorry, inverse absolute temperature, then the idea is that you will give a probability weight much higher to the ground states and exponentially smaller to the higher energy states. So all these three methods share a common ground. They're, I mean, same phase of the same problem, more or less. Different phases of the same problem. So they all aim at sampling the Boltzmann Gibbs measure at very low temperature, more or less. And secondly, as we require, they all work, provided that we give the network a good initialization, which is sufficiently close to a pattern. Otherwise, they don't work. OK. So there is a phenomenon that occurs here that is due to this near orthogonality, which is not exact orthogonality. So when we are in very high dimension, when these patterns are n-dimensional vectors with n going to infinity, by the law of large numbers, you can indeed expect that if the components of this pattern are IID drawn from a certain distribution, then they are almost orthogonal. Typically, this scalar product is a delta. So it is diagonal. But it can have some off-diagonal terms of sub-leading order, order 1 over square root n. And this is what pattern interference is, basically, because it is true that these contributions are often diagonal contributions are small, but there are many of them. So when you sum them up together, you obtain something finite that you cannot neglect in this precise scaling. And this poses an intrinsic limitation to the number of patterns that you can store in the network, because it does not infinite memory. Well, let's say the parameter that controls how many patterns you can store is this alpha, p over n. And the interesting regime is, of course, when p and n scale together to infinity, which means that p over n is a finite number. So this has also a nice geometrical interpretation, if you want. Well, any sample from this Boltzmann Gibbs measure that you have, but in general, any vector in Rn, can have an extensive order n projection only onto a finite number of patterns, because its norm has to be controlled with n, more or less. And the other projections must be necessarily of sub-leading order. But again, there's many of them, so when we sum up them together, we obtain a source of noise in the end. This will be explained a little better later. So now we finally turn to the problem. We are interested in. I'm still interested in. Which is high-rank matrix factorization. That I formulate here as an inference problem, and let me just say, let me mention that this is in a very stylized version. The more general version would be asymmetrics. So this psi matrix would be different from the other factor, but here we consider the symmetric case for the sake of simplicity for now, because this is what we can treat. So the problem is the following. You generate this matrix that I call psi, and it is not casual. It has p columns of n-dimensional vectors, so it is an n by p matrix. And its elements are IID drawn from a certain centered distribution with some finite fourth or sixth moment. Regular enough, let's say. Alright, and this ratio p over n will play a crucial role also later. Then you take the product psi-psi transpose, which is at most, let's say, roughly rank p matrix in theory, and you add some Gaussian noise. This will yield these observations y, and the task in the inference problem is to recover not only the matrix psi-psi transpose, but psi. So one of the two factors, in this case they are equal, but one of the two factors, given just the observations y. So one of the most common approaches to this is the so-called base optimal approach. So you put yourself in the base optimal setting in which the statistician that is tasked with this reconstruction knows everything about the generating features of this model. So he knows exactly how the y's have been generated. He has the exact model through which the y's are generated. So these yields, I'm really running quickly on this because of time constraints, but these yields usually what are called the fundamental limits. So in the best possible scenario, which is when the statistician knows everything, he has to know, except for the ground truth psi itself. Otherwise the task is meaningless. So in this case, as it is reasonable, the statistician makes the minimum mean square error possible in the reconstruction of psi. And the problem is that in order to access this quantity, this minimum mean square error that constitutes a fundamental limit, you need to compute this object, this free entropy phi, that is a very complicated matrix model with quencher disorder that to my best knowledge, nobody knows how to compute yet. So at least if the x's are not, well, if the prior here, if the peak size here are not IID gaussians, then there is no solution to the limit of this quantity when NNP scale together to infinity. NNP goes together to infinity with a finite ratio alpha. All right, so why is it interesting? Well, okay, I will fly quickly again on this because I'm no expert of this, it just helps me giving a bit of motivation. So in its asymmetric version, which would be the official one, let's say, matrix factorization is in played in image and video restoration, for instance, and in painting, so it is usually used to fill in matrices where you miss elements or to reconstruct them. And in this setting, one usually imposes that B is sparse, so it is a very sparse matrix in such a way that it creates very, let's say, combinations of few elements in A and the elements of A have to form an overcomplete basis in order to conserve the complexity that is stored in Y in some sense. So we also, okay, it is also used in the recommendation systems again in order to find missing elements, I mean, the most likely missing elements in the matrix, but you can also see it as a high-ranked version of spike models, so I thank this morning's speaker for the assist. So when P is finite, actually this quantity here is computable, it is well studied, it can be computed by rigorous and non-rigorous means as you like, and the fundamental limits of this problem are well known. Here I sat at just a reference, but there are people in this room that have contributed significantly to this, so I apologize if I'm not cited everyone. Okay, but now what happens when P and N scale together? This question stands. And I want to quickly mention also this related problem, which is denoising. It's not exactly matrix factorization because denoising requires the statistician to do slightly less in the sense that you need to find just the matrix side-side transpose and not the factor, per se, okay? So there is a way to do it, even when the rank of the hidden matrix is full, that is this rotational invariant estimator, for instance. And it is not optimal in the sense of information theoretical sense, of course. It is not always optimal, I should say. And so the rotational invariant estimator works as follows. You assume that the best estimator you can produce diagonalizes on the same eigenbasis of the data, Y, okay? And once you have taken care of these orthogonal matrices of the eigenbasis, you just need a cleaning procedure for the spectrum that was started by Boole, Nele, Bouchot and Potters, okay? And it is this one here. I will not enter into the details, of course. But I wanted to mention because later we will test the performance of our procedure in matrix denoising because if you do matrix factorization, you have also an estimator for the side-side transpose, okay, for the entire matrix. So test it against the rotational invariant estimator, okay? With this matrix mean square error, which is simply a Frobenius norm of the difference. Okay, so now, we have seen that the base optimal limits, the fundamental limits for this inference problem are still not accessible, are out of reach nowadays. So we introduced, together with my supervisor, a feasible, suboptimal but feasible approach that we call decimation for reasons that shall be clear in a while. So this approach as though being suboptimal, it has the advantage of being completely analyzable from a theoretical point of view. So it works as follows. Instead of having a matrix model that would, you know, force you to look for the entire matrix side, it's an N by P matrix, we look for one of its columns at a time, okay? So we split the problem in P separated problems, okay? So, and it works as follows. Assume that at the first step, okay, you have this data matrix Y, you need to produce an estimate of XI P, the last pattern, it is without loss of generality, we can think of it as the last XI P. And we denote this estimate by eta P, and we will see that we obtain this estimate by sampling from a certain suitable measure. So once she provided we are able to do that, it is also not clear. So provided we are able to produce this eta P, what we do next is we build a rank one contribution, this eta eta P transpose, this is a rank one matrix, and we modify the observations by subtracting this rank one contribution in this case. So then what we do is that we sample, again from the same measure, where the data that of course, that of course affect the measure itself, okay, from which we are sampling, where the data are replaced with Y1 this time, okay? And then from this new measure, we sample again, and again, and again, and again, okay? Till allegedly, I mean, we hope that we are able to do it P times so that in the end we will have P estimates of each of these columns of the matrix I. So we stack them together and we have an estimator in the end. Okay, so to be more precise, how does the Rth decimation step look? So let R be the number of estimated patterns already, it can be also zero, which means you are the first step, you know nothing yet, okay? So the modified matrix of observations will look like this. So we have subtracted R rank one contributions, and you sample the R plus one estimate from this Boltzmann Gibbs measure, so e to the minus beta, a certain energy, and this energy is this little monster here, and this is the ugliest slide I have, I think, so later it will be easier, hopefully. Okay, it is important to display it for different reasons. Well, the first one is that it's rather easy, because the trace of Y are against this X-X transpose, basically, and these are n-dimensional vectors, this X-X transpose, okay? And there are many contributions, but the most interesting ones are the blue and the red. So the blue one is the Hamiltonian, or the energy as you like, minus the energy of the hopeful model. So here we have the connection with the first part of the talk, okay? So this term is responsible for an effect that is desirable, but also for one that is not desirable. This term favors improbability because this guy here goes at the exponent. This favors improbability, those X configurations that are well-aligned to patterns, because it has a plus and it has some of squares, okay? On the contrary, we have a third red term that comes from the decimation itself, okay? And this term repels, okay? So if it penalizes improbability, those X configurations are too similar to what you have already estimated. And this makes sense because at each step, you don't want to fall again in the same energy valley, okay? You want to sample something different and get an estimate of all the metrics. Yes? No, no, no, this is just... It's a theoretical object. Thank you for the question. Later there will be an algorithm that does not exploit the... Well, I'd say that all the information you need to compute your theoretical analysis is the likelihood of the model. So you need to know that Y is xi xi transpose plus noise. And this, the statistician knows, okay? So for the theoretical purposes, this is allowed. It is not allowed for an algorithm. Good point. Okay. So the effect for which we can... I mean, we can blame this hopeful Hamiltonian for his pattern interference. Unluckily, that he's inherited by this model and there's nothing to do about that, unfortunately. And this is the greatest limitation of this approach. Okay, so you can see... You can think of the decimation procedure also as an action over the energy landscape. So what you're doing is that you're trying to look for the ground states of this energy, okay? This cost function, call it whatever you like. And hopefully, again, if the patterns are not too many and the noise levels are, you know, kept under control, the ground states, this eta mu here, look like a pattern, hopefully. So what happens when you decimate is that you lift in energy the eta mu, the state, the configuration that you have just found so that in the future you will not fall there again with ground state search algorithms or also with other algorithms. Okay, so it is clear at this point that in this problem there are three noise sources. The first two are kind of already discussed. We have the initial Gaussian noise and this is by definition. Then we have pattern interference because of this hopeful light Hamiltonian. And thirdly, and maybe this is the least obvious one, it's the decimation itself. The procedure introduces artificially some noise in the problem because this rank one contribution that we are subtracting step by step are not exactly the patterns. They are blurred version of the patterns and you can quickly convince yourself that if this eta has nothing to do with the hidden matrix with one of the columns of the hidden matrix and you subtract this contribution you are actually increasing the rank of the hidden matrix which is not good. You don't want this. In fact, the ultimate goal of decimation is to hope that the estimated sample is similar enough to one of the patterns so that when you subtract the corresponding rank one contribution you are decreasing the rank of the hidden matrix so that you are bringing the problem towards, let's say, the feasible, easier and already studied version. It's not clear if whether the decimation corrupts itself or not at this point. It will be clear in the end. Okay, so we have seen that at each decimation step we have a different energy because it depends on this modified observations. So every time you have subtracted an additional rank one contribution and the energy and the model itself changes. There is a sequence of P models in total provided you can get to the end. Of course, the accuracy with which you can hope to retrieve the future pattern, the R plus one pattern depends on, in a way that has to be clarified depends on the previous steps, on the previous retrieval accuracies and what you knew, what you know already on the previous patterns and on how good your reconstruction has been so far. So how can we access this retrieval accuracies step by step? Well, the trick is the replica method. So we need to compute this free entropies here, the sequence of free entropies, this phi R plus one. Okay, and we do it with the replica method and we use the replica symmetric answers to compute it. And the reason why we do it is that this retrieval accuracies that in our gyrogon is simply the overlap, the expected overlap between the pattern that you want to estimate and the typical sample from the ground truth which is a measure of the quality of reconstruction. Sorry, not sample from the ground truth. Sample from the... the Boltzmann Gibbs measure. Errata. And all right, so this overlap turns out to be an order parameter for this model. So if you manage to write this free entropy through a subtle point, like we did with the replica method, then you can obtain this retrieval accuracy with the self-consistency equations. So the stationary parameter M here that achieves the... at which the maximum this free entropy is attained has to be considered as the R plus one through retrieval accuracy. And the other thing that we can notice is that this free entropy depends, it should, on the collection of the previous retrieval, on the whole history of the process, okay? So there is no hysteresis in some sense. Okay. There is also another interesting thing, and it is this second... well, this term on the second line here, it looks like the free entropy of a Gaussian channel, more or less. For those of you that are familiar with it. And the interesting thing is that or also if you want the free entropy of a gas model with a random magnetic field, where everything is decoupled, and well, where the variance of this random magnetic field is tuned by this parameter R, and if our computations make sense, this parameter R must comprise all the three noise contributions we have discussed so far. And it is indeed the case, okay? So this R is the sum of three contributions. The first one, R A, this, of course, you obtain the variance of the noise, the variational potential with respect to this parameter. The first contribution, R A, is the noise... I mean, the variance of the noise that is due to the initial Gaussian noise. R B is the pattern interference contribution. This is very similar to what presented for the Opfer model. And R C is the novel contribution here, because this is the decimation noise. And in fact, as you can see, it involves an integral over the history from zero to t, where t is the fraction of already retrieved patterns, is R over p, of this ugly function. So this takes into account that, okay? So these two main questions still stand, in particular the first one. So does decimation corrupt itself too much? Now we have the tools to say yes or no, because we have the theoretical analysis, okay? And provided it starts with the low enough noise levels, is it able to retrieve all the patterns or it stops at a certain point? And yeah, the second question that we will see later is there an efficient way to implement it? Because this is a theoretical analysis, as Mary Lou was pointing out, it is not an algorithm, okay? So the first question, the answer is yes, fortunately. So it turns out that decimation does not corrupt itself and in fact provided that we can start at low enough noise levels, so with alpha, small, and delta, the initial Gaussian noise, low enough, then decimation is not only able to retrieve all the patterns, but the retrieval accuracy increases along the procedure. So you see here in this plot, we have the red curve, that is the curve predicted by the theory of the retrieval accuracies. You have the first retrieval accuracy here is this blue dot and it is obtained with an approximate message-passing algorithm initialized with an informative initialization, with a warm start, as we said this morning. And you see that the next magnetizations are higher and higher, okay? So the retrieval on the next patterns will be better and better, which means that the benefit that you gain by subtracting this rank one contribution is more than what you lose by introducing artificially this noise during the procedure. So the procedure can get through. At least for all the tests we have run, that are with the easing prior, this plot is with easing prior, plus minus one, rather marker if you want, with rather marker sparse prior, so introducing a little, well, also an aggressive sparsity as we shall see later. And uniform, this behavior, also with continuous variables like uniform, this behavior is confirmed, okay? So question number two, is there any algorithm able to do it, to actually make it? The answer is yes to this specific question. We have formulated this ground state oracle that is made up of three ingredients. So the first one is a simulated annealing routine to look for the ground states of this energy that can be very rugged in principle, okay? The second one is, of course, decimation. So once you have converged somewhere, that you accept as a pattern or as a column, you subtract it, and then you run the simulated annealing routine again. And the third one is a restarting criterion because, as I said, since this energy landscape can be very rugged and very difficult to explore for the algorithm, it is often the case that it gets stuck in meta-stable states, which is not what you want. You want the ground states, okay? So you need a criterion to discern whether the object you have converged, the configuration you have converged to, is really a ground state or not. So what we do is that we compute the related energy and we test it against the energy of the ground states that is predicted by theory. When this energy is higher with this card, we are found and we start all over again, okay? All right, so how does this ground state oracle, we call it oracle, works? Well, it works pretty fine. It is not as striking agreement with the theory as the previous AMP was, but it needs no informative initialization. So this is, again, for Rademacher prior. So this ground state oracle, at least the simulated annealing routine is particularly simple with Rademacher prior. And, okay, and here, so the number of the dimensionality is 1300 and alpha is 0.03. So it's not really high rank still, but it is extensive strictly speaking. And, yeah, so the ground state oracle manages to get through, okay? Especially sometimes it reconstructs exactly the pattern with no error. When the noise levels are low enough. So how does it perform in comparison with the rotational environment estimator? It performs, again, provided that the noise levels are low enough so that it can actually start the first step. It performs better than the rotational environment estimator. And this we could expect already because the rotational environment estimator is a purely spectral estimator. It does not take into account at all the prior structural information about the signal, okay? Whereas the summation does. The prior is explicitly inserted there. So you see that the red curve of the rotational environment estimator is worse here than the two other curves obtained through the summation, okay? I don't want to stop too much on this. How much time do I have left? 15 minutes? Wow. Okay. Okay, maybe I tried it too many times. I don't know. It went pretty fast. Okay, okay, okay. So the problem is that this algorithm is not efficient, not at all. And, yeah, as you can see from this plot, it requires, you remember there is a restarting criterion, right? So the number of times you converge to wrong states is exponentially within. Even though the exponential increase is mild because we managed to run it up to n equal to 2,500 already, which is considerable dimensionality already, we cannot go further than that because the exponential complexity destroys it, okay? You need to wait forever to find just the first pattern. Then it becomes better and better, of course, towards the end, but the first one is a nightmare. Okay, so this, all of this was for rather macro priors. So, and the theory was for generic priors. So you might wonder what happens with other priors. Well, we're still in the process of testing numerically. I mean, the theory is there, but we're still in the process of testing numerically our decimation procedure with other priors. But we have a rather complete picture with sparse priors, like this one here, where we introduce this parameter rho that interpolates between the rather macro priors when rho equals 1, and when rho is very small instead, it is very sparse. Like if rho is 0.15 or 0.2, like here, let's say 0.2, 80% of the components of the patterns are zero. So the matrix psi, psi transpose is very sparse, okay? Only rho square of the components are nonzero in the matrix, psi, psi transpose. Okay, so here, you get again, here I plotted the MSC and not the retrieval accuracy. This is the mean square error, which is, I mean, the square norm of the difference between the pattern and our estimates eta mu, okay? It is another measure of accuracy, which is more, let's say, suitable for this case due to some norm fluctuations in the estimators. But you can see that this mean square error on the retrieval of the patterns decreases. So this behavior is still confirmed. And in red, you have the theoretical curve here. And in blue, you have some points obtained with the AMP, again with informative initialization that are in good agreement with the theory. And yeah, this AMP was run with beta equal to 10. Okay, so why were we interested in sparsity in the first place? It is because we expected, and it is actually the case, that sparsity helps the system to become more robust against the sources of noise that we have in a model. In fact, what you are seeing here is the phase diagram for different values of sparsity of the first decimation step. So when you are trying to find the first pattern, in particular, forget about the dashed lines, sorry about that, didn't have time to raise them. But for instance, where rho equals 1 and you start the decimation procedure below this curve, this red curve, the solid red curve, so with small enough alpha and delta, then the decimation is able to estimate the first pattern and then it is able to estimate the second and so on. So it gets simply through. But you see that when sparsity is rather aggressive, like 0.1, this retrieval region here, it increases sensibly. And when rho is 0.05, it becomes huge. So you can store a lot of patterns here. So alpha can be 0.48, if I will remember. So you can store, if you have n equal to 2,000, you can store 960 patterns inside here, which is a lot, okay? It is absolutely not accessible with Rademacher plus minus 1 patterns, okay? All right. So even if sparsity helps in theory, it creates also a nightmare on the other side because the energy landscape, and this is, I obtained this plot just last week. So, well, yeah, I should be careful about that. But I have, what are you? It's a bee. I think that what happens here is that, unfortunately, sparsity also creates a golf course energy landscape, which is very bad because what is a golf course? It's like, it's flat everywhere, and this is due to the fact that when you have sparsity like rho equal to 0.15, 85% of the components of the patterns are 0, and in the matrix Psi Psi transpose, that is the one responsible for the attraction towards the ground states, has only rho square non-zero elements that are very few, okay? So what happens here is that as long as there are many patterns stored in this memory, there are many pits in this golf course, and so you have a high chance in falling in one of them. But, as in this case, when alpha becomes very small, like 0.001, so you have, in this case, in particular, only one pattern stored there, you have only one pit in this huge golf course that is flat almost everywhere, so if you run an energy-based method like simulated annealing, like the routine we have, it takes a lot to find the last pattern paradoxically, but it takes very few iterations to find the first one. And yeah, I think that with this, I am over. Thank you very much. Yes, sure. Sure, here we use the replica symmetric answers, but there is the Almeida-Tales line. We didn't venture into the replica symmetric computation because it's a real nightmare, and, I mean, the AMP algorithm that we run to test our plots, let's say it was more likely that it matched the replica symmetric prediction instead of the replica symmetric breaking one. So we settled for that. Even for the OpFill model itself, I don't know a replica symmetric breaking computation, but maybe there is one. There is one? Okay. One RSB scheme? Yeah, it could be, yes. People, they leave, okay. Okay, sorry, sorry, sorry. It could be, yes. Yeah, sure. I mean, there are ways to regularize this effect. For instance, I've experimented that if I add noise, of course the reconstruction is worse, but this basin of attraction gets wider. So it converges before. Sorry, in fewer iterations, paradoxically. This is what I was telling Federica yesterday. Yeah, this, I still have to understand this, to be honest. Yeah, 10 components different than zero in a thousand by a thousand matrix is not much. Another question, please. This, sorry? Ah, this eta. This eta are obtaining by sampling the, I mean, they are the ground states of this energy here. The simulated annealing routine is a way to, yeah, to sample the ground state of this energy that allegedly look like patterns. They do, they do, in feasible regions of the phase space. This is an excellent question. So the first one is, the first one, okay, this is sufficient. The first one is a simple AMP with Y as the data of the observation matrix, okay? It is a rank one AMP. I mean, I'm not looking for low-rank components, okay? I'm looking for a vector of components. Is that clear? Then once you find a final configuration, you build the rank one, you subtract it from the Y, and you run AMP with Y1. The noise and function in the case of Isis-Pins is hyperbolic tangent. So maybe you can do something more accurate. But it matches the theory. I don't know. It's not... This problem... Yeah, I mean, strictly speaking, this computation is not in the Bayesian optimal setting. I mean, the noise are... Sorry, sorry. I know. Yes, yes. And here, maybe I should have said it, the statistician is Bayesian optimal, but since they don't know a Bayesian optimal strategy to do the noise or to find the side, they use this suboptimal routine. This one? Yes. Yes. Well, it is that of the optimal one. I mean, it's not... That is the retrieval phase. The one you're asking for is the intersection with this dashed line. Yeah, yeah, but there, the retrieval state is not thermodynamically stable. I mean, you still have chances to converge there. Exactly, yeah, yeah. Yeah, I mean, I overlooked this, but... So the red line... Yes, they're not only minima, but they are global minima. They are pure states. Yes, on the left of the red line, yes. I mean, if you're lucky, it can work. If you're lucky enough, and you initialize in the Bayesian rock traction, yes, it can work, but you have no theoretical guarantees that these retrievals... Yes? Yeah, this one, for instance. Well, there is delta. I mean, with delta equal to zero, the first overlap is the same. At the first step, then it becomes different. It's a different neural network model. The first one is the same, yes. Yeah, actually, this first phase diagram is all filled with some noise. It's nothing very fancy. Yeah, also, maybe I should say this, because I think you have a poster here about this, right? About this dreaming or learning procedure. There is a way to augment, to increase the size of these Bayesian attractions of this, and to make them more stable, right? So if you're interested, have a look at Enrico's poster. Enrico's poster. It's a hard one. It's a hard one. I think there is, of the literature's data. Yes? The capacity increases. Yes, yes, yes. That's here. Yeah, it's here. It's these dashed lines. Or the solid lines, if you want, thermodynamically stable. There is some literature about the... I mean, but it is different, because this is an inference problem. The paper I read, I don't even remember who is it from. That's my fault. But you have to modify the patterns in order to store them more efficiently into the Hebbian matrix. In a way, I don't recall. But here, you cannot do this, because this is an inference problem. So you are given some data, and you need to work with them. You cannot modify the Hebbian matrix. So with this, working with the J.I.J. you saw at the very beginning, this is yes. Thank you.