 Thank one matrices with mismatched, please go ahead. Thank you. Thank you everyone for the opportunity to speak I'll be I'll be presenting some recent work that I've been working on with at least like a and I'll be presenting some recent work I've been working on with my least like a and flora and it's related to estimating Require matrices and in particular I'll be focusing on two problems. One is universal And the other one is large deviations so I'll first begin by introducing the general setup for this inference problem Since I'm a Mathematician, I will sort of also go over how I see the theory of large of large deviations I will go over the main results and how large the reserve connects to this Bayesian inference problem As well because I'm a mathematician I'm obligated to tell you at least some ideas of the proof and at the end I will serve To end with some open problems and what is next with respect to these problems? So let's let's start by defining the model that we'll be working with Let's consider a very simple and Somewhat Generic inference problem. So this inference problem is I want to infer some signal So I'm my signal has a very specific structure. It's a rank one matrix of dimension n by n and It will be generated by the outer product of a vector with IID entries So this factor x zero will encode my signal matrix and it's a vector in Rn and all of my entries will be generated independently from this probability measure P zero and I'm going to observe a noisy version of this signal So what do I mean by a noisy version of this signal? Well my noisy version will be a matrix Why which is another and by a matrix, but it's its entries will be generated randomly independently and conditionally on my signal matrix that I wanted to To observe so my my matrix why I'm going to assume that it will be symmetric because my signal will be my signals also I'm symmetric and These entries are why it will be generated from some output channel I'm going to denote it by this condition probability measure with this subscript out on the bottom and it's going to be generated conditioning on my signal and there's a funny little normalization term on the On my matrix This is there just because this is the right scale to look at things so that the problem isn't too hard and the problem Isn't too easy. It's sort of the the right scale where interesting things happen and A classical example of something that falls under this so much generic framework is the spike matrix problem So let's assume my signal is a random rank one matrix and I'm going to generate my data Why conditionally on my signal in the following form were generated from a standard normal? for not a standard normal from a normal the distributed random variable with mean given by the corresponding entry of my signal with variance one and This this is I'm equal in law as why is I'm equal to some additive Gaussian noise added to some normalized version of my signal matrix Okay, so my goal or my problem is I want to somehow estimate or want to guess what this signal matrix is from my observation and Classically, what's a nice guess for what my For what the best estimate will be well one way is out of all the possible functions of my data I want my estimator to minimize some some error when one thing we can do as well on average I want to minimize the differences of the squares of each of these entries of this matrix on average and The best estimator will be given by the by the conditioner expected value of my x zero given my my data y so Moving forward I sort of want to understand what this conditional expectation is because this will be the object that will minimize the mean square error in this problem associated to this problem is the conditional probability measure and By a base here, we have all the parts to sort of have a very nice explicit form for it So we can write the conditional probability that my X by my signal matrix is equal to any particular value conditionally on my observations Why I can use basiom to write as some ratio and I can use independence to write it as some product of all the entries of These of these of these objects and this p0x is is a on-product measure I just I just did not put the tensor on the top to simplify some notation and Related to this posterior probability, which is related to the optimal estimator is another quantity this Statistic wouldn't call it the overlap and this overlap will be defined as the inner products between two vectors One vector is my original x zero vector Which was what appeared in the signal and the other one is the inner product with my with my guess So a sample from my posterior probability measure from this G optimal that I'm going to call it and I'm going to take the inner product between these two objects and This sort of loosely measures how close my estimator is to the original signal so Sort of the magnitude of this corresponds to how many are entries I've Guessed correctly or or what's the angle between my estimator and the original signal? I want to guess and this this parameter this statistic is a very fundamental object and that if I want to compute interesting quantities such as the free energy The mineral means great error this behavior of this overlap sort of encodes the order parameters for these models So I'm good bit be a bit more precise of hopefully I can now motivate where does this overlap appear And why is it somewhat somewhat interesting? I'm just going to return to the spike matrix example because everything's simple and explicit for this problem From this model I have an explicit formula for the posterior as mentioned before I can write down what this is. So in some general formulas, I now have a somewhat explicit explicit form for this condition of probability measure and If I expand it out then Then there's gonna be a term of this Hamiltonian Which will it will appear in the exponent minus some term that only depends on why we don't really have to worry about it since it will sort of cancel with the factors on top and associated with this is somewhat called your free energy Which is one over and expected value of the log of the normalization term in this probability measure subtracted the terms only depend on why and This should just be the h n of x in the exponent here and this quantity this free energy sort of encodes the behaviors for this conditional probability and that if I take if I take the raises with respect to lambda parameter, I can recover certain moments of of the term that sort of encodes the behavior of this probability and Placically this limit of the free energy was proven. It's It's proven by the large and Milan is given as some variation of problem It's given by the supremum over some parameter of an explicit function that that depends on it prior in some very specific way and These overlaps are sort of important is In these in these types of models we observe something called overlap concentrations So the behaviors of these overlaps in the high-dimensional limit they sort of Average or deserve concentrate on its expected value. This was proven for in general after a small perturbation to this to this measure by John and From here we're sort of able to We have some really nice behaviors of these overlaps to these problems and this parameter q that that appears in this in this Optimization problem this some q corresponds to the expected value of the overlap up to some up to some smoothing of the gives measure So this is the classical setting and this is a classical result for spike Matrices we're gonna move on to a slightly more challenging setting This is more realistic setting in the sense that if you have a normal statistician like myself I might not know what the prior is and I don't know how the data is is I'm generated So I will make my own model and from here I can build another Another another condition of probability measure that might not be the right one So this y here was generated from the P zero and the p out that I defined way back But I'm gonna build my own model since I don't know what this is exactly. I have some guests on what the prior is I had some guests on what the on what the output is and I can define a I can define the associated probability measure with these guesses and again I can look at what the overlap is it is it's gonna be the inner product between maybe not the optimal estimator but some estimator based on my best guess and the original signal that we saw before and This mishmash setting is has been studied recently by many people in this in this room in particular Jean Barbier and Francesco Francesco camellia camelli Marco and Manuel So this is a somewhat more complicated setting for a friend. It will be called the mismatch setting and Again, let's just do a concrete example to Do our demonstrate these ideas. So let's return to the spike matrix example and Let's say that I know that my dad is generated from some from some spike Gaussian additive noise But I don't know what the prior is so I'm good to guess my guess is My signal will be generated with taking values plus one minus one with probability one half I Can build my conditional probability measure and again, I want to understand what's the free energy for these models for example, and the only difference here is my px here is different than the p0 that appear before and this this limit of a different energy was proven by Francesco and the co-offers so this limit of this free energy is given by some again some other variation of formula and This variation formula in this mismatch setting is already much more complicated than what we had before There was some explicit term that peered over there now I won't I won't go over what this is, but it'll be the solution to the SK model at some external field and some temperature beta I will make it a bit more precise later once I once I explain the other results but there's some it's there's some explicit formula for it. It's a bit more complicated and And I just want to show this just because in a mismatch setting even the limited free energy becomes much more complicated for these problems and now you're probably wondering or Why am I focusing on all of these spike matrix to these models? They are nice examples because objects are are explicit but how is this related to the general difference problem that we were interested in or die that was interested or that I introduced at the beginning and this connection will be connected through something called the universality of the overlaps and I'm going to be a bit more precise about what I mean about universality and Before I could explain what this universality is I want to introduce or at least review What large deviations are? So our least large deviations in the context for these inference problems So suppose we're back in the optimal case I know that my overlaps concentrate and if my overlaps concentrate Let's say I want to say something a bit more precise than just concentration Let's say I want to find some estimate on what's the probability that my overlap deviates from its from the value I just concentrate on so let's say I want to compete what's the probability that my random overlap deviates from its from its expected value and And I want to sort of make this precise with two parameters So I want to find out how fast is this exponential exponential decay So I find out what's the exponent on this on the exponential decay and also when you find out Well, what's the what's the coefficient in front and this coefficient can also depend on the empty So this is some form of concentration in Inequality we sort of want to find out what's the probability that it's that is sort of deviates from its expected value and Large and large deviations you can think of it as a more precise version of this So I might classify for any rare event a and to find some some constant and some speed so that I want to compete What's the probability that my overlap lies if in this set and this the set I want I'm going to insert and less than or Equal to I'm a sort of approximate equality at least up to the first order and If I take one of when to the alpha and the logs of both sides It sort of means I want to compute probabilities of this form So can you give us a probability that my overlap takes a certain value? We know that it I'm concentrated but at finite end. This is still a random object So I want to see well, what's the probability that it takes some off takes some value and this N of alpha and the speed sort of encodes Serving codes where this value is so All of these objects can be encoded by something called the rate function and the speed and this is encoded for something called the large deviations principle So we'll say that a sequence of measures in our case dark sequence and measures will be the laws of these overlaps under the Under some maybe some posterior measure either the optimal one or the one from mismatch So I want to understand the laws of these overlaps and it's encoded by two objects One is the rate function. This rate function is a nice function. It's lower semi Continuous if it is a even better rate function if it's a good rate function in all of its level sets are also compact So there's some there's some nice function I and There's a function a event which it encodes our speed and basically what it says is the log of the probability We had before will be upper and lower bounded by some infimums over the sets of this rate function and the intuition or how we can think of what this rate function is is The formula we had before of it dilemma one over end of the log of the probability of a set a is given by the negative Thememe over this set well on one end the one over end log of the probability is like the integral of my measure And on the other hand when a n is big then you would expect to just integral on the right hand side Will concentrate on the near map on the maximizers of i of x in this a which is precisely what we have here Or the minimizers I guess in this case because it's minus sign so This rate function i is sort of the density function Corresponding to the mu of n so you can think of this i of x as sort of Encoding the probabilities of these rare events in some simple way So this rate function encodes everything that we need to know up our understanding these rare events and a classical example for this or a concrete example is Is on is on chameleism which tells us the behaviors of a certain statistics, so let's take IID and IID random variables and want to compute what the sample mean is in this case, we know that a sample mean concentrates by the law of large numbers and We have an explicit formula for the probability that the sample mean debits or the sample mean takes certain values So this rate function will be given as the log of the expected value of its moment generating function You could take it some Lejeuner transform, which will be the supremum of its convex of this of lambda of x minus the minus the log moment generating function and this this Lejeuner transform of the log moment generating function encodes the probabilities for these rare events As in it was given as the up as this supreme over sets in the set that we are interested in Okay, so so this lambda star will be the rate function for sums of IID random variables And this has a direct application to what we had before in the sense that let's say our perfect information My estimator is precisely what my x zero is in this case My overlap just becomes the sums of the entries of my signal vector squared And this is a sum of IID eight variables because I'm assuming that the entries of my signal are IID So cremation actually tells us what the what the large deviations are for this overlap It says that the log of the probability on the overlap takes a value in any set We'll be upper and lower bounded by the infimums of this rate function Say by some supreme of some prime of lambda minus some log moment generating function And this form will be important to remember for the results are to say later on And now I guess the question is well Can we generalize this result when the case where my overlap has a very simple structure to the case when my estimators are from these general probabilities that we observed before and now I guess I'll be explaining what the main results are at least what we managed to show So the first result is a universality result To go in line with the topic sort of female disconference So I'm just going to find a few few objects So I'm going to find two log likelihoods one log likelihood is the log likelihood of my of how the data was I'm generated and the other one G without the zero is the log likelihood of My uninformed guess on what I think this log likelihood is going to find free free on different parameters I'm going to call them Fisher score parameters that they're just functions of Average over my data why assuming no signal? So I guess this is sort of the behavior of the way my dad is generated in the absence of any signal So in this in the spike matrix model just to be like average over standard normals because when there's no on signal then you just have the additive Gaussian noise and You have these three parameters that serve only depend on the first and second derivatives of my log likelihood functions and the inner products between them and In the simple case when our guess is perfect, so in the Bayesian optimal case when my One might guess for the log likelihood and the actual log likelihood are the same Then all these parameters are actually all encoded by one parameter called lambda then the mismatch case these parameters may be different I could define two likelihood functions. So one likelihood function is the is the Or one Probability measure will be the probability measure that we observe before just some normalization by some fun That only depends on why I'm just going to include that here or I'm going to subtract it off since since we need some extra normalization and The other probability measure. So this gn of y is the probability measure corresponding to the original reference problem and this other probability measure z and of beta or gn of beta will be a Gaussian measure corresponding to what appear to be in this in the spike matrix problem We had before except lambda parameters are replaced by these free temperature parameters We had before and this and this model this Hamiltonian isn't new This has been this is precisely the model studied in the work by by on Francesco And we're just looking at the giz measure for this model So here's our result for universality, so we got to assume a few things They could probably be weakened But I'm going to assume that the that the way that that is generated and my under form prior I'm going to assume those are compactly supported. I need to assume sufficient regularity on my log likelihood So it has to be at least three times differentiable and there's also some bounds on the On the first and deferred derivative. There's some funny condition as I need some consistent estimator estimator condition and the intuition for this is At least in the spike matrix model in the absence of a signal I know that my dad is generated centered so my additive noise is centered It means that as the student as the if I make my guess of how the dad is generated It also has to be centered so I do preserve the mean somehow. So that's the intuition behind what this consistent estimator condition is and I'll be looking at the joint law of two statistics So one is the overlap they saw before which is our inner product between our signal and my estimator And I also have some R1 when overlap the self overlap, which is just the length of my of my Sample the reason why I went to it went to look at the joint law of them both of these objects is The value of the norm or the norm of this X sort of encodes the behavior of this overlap So if my vectors X is very small the overlap would naturally take very small values Just by the Cauchy Schwartz in E in E quality and I sort of want the overlap to sort of measure how close we are I don't want this negative to be really influenced by the size of the vector So one way is I can like normalize it by the overlap or I can just look at the end look at the end joint law together So what my what the large deviations results says is that? If I take my beta parameter to be encoded by the Fisher score parameters I defined before which is the expected values of the part shows of the log of the log like it's so my beta's take this very specific values we saw before then the joint log of the overlaps under the original gives measure and under the gives measure that we saw That we saw with the beta half parameters. They satisfy the same almost sure large deviations principles so the probability is very events are the same and It's almost sure large deviations principle because these problem measures actually random they they are They depend on the the They are generally conditioning on the why so almost surely no matter how we How we generate them to satisfy the same large deviations principles They have the same probabilities rare events and you also get for free that the free energies are the same for these for these inference models Okay, so that's the universality part is that These spike matrix models that I introduced as toy models are precisely what we want to look at and The next object is well can we compute what this rate function is and now I will sort of go over what this P symbol? We saw before so just P symbol is it can be defined or this preasy function it can be defined with two two sequences of of Of on parameters there are some explicit form and I don't want to go too much into I'm deal of how this formula looks like but I just want to point out that if you recall what the What the on Kramer's theorem was there was a infimum over lambda and you of some linear part minus the log Moment generating function of some object. So this is the linear part It's the log moment generating function of the other object. So from Kramer's point of view. It's sort of generalizing the large deviations principle to when I have some extra terms in the exponent corresponding to my new Gibbs measure and If you're from a spin last point of view, you could think of this is the standard preasy formula But since I have some probability here. I went I have some extra part that sort of is Is influenced by the probability of this event and this is the The Kramer's theorem part. So you think of it as some spin last part added to it The Kramer's theorem part or the Kramer's theorem part You just add to it some spin last part if you're unfamiliar with either one there was just some explicit rate function out that we have and This is just to make things precise. This is a possible domain of all the overlap So if I take linear products of my overlap and myself overlap The possible values are sort of bounded within some sets So so these are these are possible values of my overlap. They can't take all values since you're generated from probabilities with them with them with certain assumptions out of support So so the max max values have to take certain values and the C encodes the sets of possible values and Our result is well We have a almost sure large the large deviations principle for the Gibbs measures so the joint law under our General beta parameter. So for any choice of beta parameters, it doesn't have to be the special ones But for any choice of beta parameters I will have a rate function which is given by the negative of the function that we had before on the supreme over It's possible values. So this rate function will be infinite if my values of my overlap SNM are not within its feasible values Which means that I'm just taking the log of a zero. So it would be negative infinity and if it is in the set of possible values then it's Then this large deviations or its or its rate function to probably are very fancy be encoded by this rate function here And it's almost sure in a sense that this gives measures random But for almost all realizations of w and x zero we have a we have that it satisfies the same Rate function, so this random probability measures holds this result holds almost surely for this randomly generated functions Gibbs functions and As a consequence if I take my betas to correspond to the parameters we had before We have a formula for the rate function I can also take rectangular sets to optimize over a self-overlap if you if you don't want so I have the large deviations of these overlaps from The inference problems I had before so almost surely for all observations for my y I have some as I have some rate function that's given by the same object as the beta parameters had before okay, and As another free result we get we could compute the limit of the free energy We so I get the limit of the free energy for free just by optimizing over the rate functions and If I know my rate function has a unique minimizer at some s and some m Then I know that my overlaps will converge to s and m almost surely So we have another way to show concentration for these models just by looking at the minimizers of these rate functions Okay, so those were the main results it was basically It's basically saying that The inference problems or these overlaps generated from these general inference problems Are the same as the overlaps generated from these spike problems with a very specific choice of beta parameters And we also have a function that Encodes the probabilities of these rare events So we know that these overlaps at least they're universal in the sense that their rare events are given by the same By the same types of rate functions and these are almost your large deviations principles Okay so I have 12 minutes to go over Some ideas of the proof and I think I will just have time to go over the ideas of the universe alley part and What I'll do is I'll first show a quench universe alley part and then to get to something that's almost sure we we use basically some concentration and I guess precise estimates on all the errors to sort of go from quench to almost sure so the log of The free energy behaves close to the its expected value This this the object before the the expected value in the object with the expected value will be will be close So so we could just study the expected value for now I basically to show universe alley. I want to show that these logs of these probabilities This is just the top term in the gives measure For any site a I want to show that this log of this probability of the object on top is is Can be reduced to the log of the probability of this e to the Hamiltonian of these free beta parameters on the bottom So how will we do that? Well? Because I assume enough regularity on my g I can take some first first and second first and second derivatives And because of the normalization on my X I X X J I could throw the third order term because this is of higher order I guess this is will be off on more grand this will be off of small order. So we could just ignore this So I've basically reduced my Hamiltonian to some object that depends on the first first derivative and the second and the second second derivative Next you some via some more I can use concentration and the fact that I guess this random coefficient the second order term It will be cave close to its expected value. So we can use some matrix concentration in equalities replace that second term with this beta parameter that we had before This is precisely how we find the beta parameter in the second term and Now we just have to deal with this first term in this object and this first term in this object is I ID we assume that my entries were generate ID. So there's just some I ID random variable Time some X I X J and I know that perhaps in very the very large end This will probably behave like a standard normal or not. It's not a standard. No, but some normal with some shift so You could compute what the mean is of this object you get some beta parameter You can compute what the variance is you will get the other beta parameter and you can use universality of Universality in the sort of force pin glasses because you just have the ID entry times X I X J You can unravel it to say that I guess I could replace this with the equivalent Gaussian model and From here just log it is free energies The term in the exponent was precisely the H beta parameter that we had before So this this holds for all a so it shows that at least quenched all of these All of these all of these friends breezy free energies all of these on condition free energies So it reduced to the same thing up to some up to some little Small error that sort of goes to zero in the limit and since in the large division, but we take anti affinity That's that's that's really all we all we all we care about okay, and Now I wanted to go over the proof for the large deviations Maybe I will just say a few a few quick words about it is that these Conditional problem of these cut these free energies conditioned on some values of these overlaps these are called the france per easy france per easy france per easy potentials and The log of the probabilities that we wanted to study before these overlaps can be just written as the difference between the france per easy potential and the free energy which is without this without this This extra extra conditioning so I could write these these objects. I wanted to compute for their large divisions large deviations principle as the difference between two log partitioning functions one is conditioned on the overlaps taking some values and no one is not conditioned my overlaps taking certain values and I guess you can use Varda and Zama and just say well these these are coins I could just take any to do to zero I can add them in later So if you notice in the rate function, there were some some simple parameters that disappeared at the end Those are what you get when you just add them back in so to prove this We really only have to look at the behaviors of these gives measures under the first term by universality studying these these these gives measures will will basically give us the large deviations of the object that we had that we were interested in at the start and Because the free energy is concentrated by by some Concentration we're able to just look at the quench one so we're able to average over the radness in these in the way these measures are are Defined and then get the almost sure Almost sure object afterwards, but this is a bit easier said that done since These this indicator functions not really smooth with respect to x zero. So the concentration here is actually a bit more Delicate to show so to to go from quench to almost sure we Requires a bit of work Let's see so it's six minutes to lunch and this will take too long. So I would just ignore what the Ideas are for this object but maybe a Very quick word on on how I'm new to how I'm this is proved Um, this is I guess this proof of this was sort of generalizing to work on large on large deviations by my supervisor on the Dimitri pension echo back in back in my PhD thesis so he was sort of proving He proved a large deviations principle for the self-overlaps and what we did was we added and what happens when you're adding the R1 Zero overlap and just introduce some extra challenges in the sense that I guess there's some There we lose some smoothness by adding this just indicated functions You can smooth objects out in a in a nice way You can also add a perturbation to our probability measures for changing the large deviations principle So at the scale that we're looking at we can smooth out our measures a bit So we could get ultra matrices all the nice things that we that come with it I can use the our cab commissures to prove the lower bound and then I can use the ideas to prove Cramer's fear I guess it's a bit more complicated, but you could You you could basically use some tools from large some large Deviations to sort of finish the proof for the lower bound and You can also go from quench to to almost sure if we were a bit more careful with all the estimates Okay, so future problems and what are we interested in in the future? So it's a natural quit questions are can we generalize this to higher rank models? This was sort of done for Done for the classic spike matrix models Can we generalize similar results to instead the rank one case? Can we go to the higher in case? There's also the question is can we go to the tensor case? So instead of a x i x x j Can we assume that our rank one model is basically generated by some tensors or the look at the corresponding p-spin models for these objects? Another assumption is well, we assume that these y i j's are sort of generated independently and they had the same law IED so can we generalize this to when there's different different post years for each i and j and these are the Hetero-Hetero-Midgenius models that's been studied by the works of Reeves and Francesco and lastly We have some rate functions. We can study the phase the phase transitions for these models We can study well one is a replica symmetry breaking when did these objects concentrates when do you have ones that have a Unique minimizer so we haven't done that yet. We just have a rate function There's still lots to understand about the behaviors of these overlaps under these design rate functions and I guess we have some time for questions or not then it'll be time for lunch Yes, so thank you for your time. Thank you very much. Justine is there Thank you. It was a very nice talk. Okay. One thing that I didn't get is what was the The mismatch scheme on the likelihood the privilege of the data given a signal. What was the assumption on that one? Oh What would the assumptions on the Guesses for this mismatch. I see so one assumption is I needed my guess for the prior to be compactly supported This can probably likely be weakened The other assumption is for our mismatch is I need enough regular I need some smoothness so I need the on the log Likelihoods to be defendable enough times and the last estimate or the last condition I need on what my guess is for this is I need that this this quantity to be equal to to zero and the way that I Would like to interpret this or at least in this in the simple case this this What this condition means it means that in my original model if my original model without any signal So my structure my noise if my noise was centered then it means that if I'm basically guessing how my how my Observations will be on will be on generally I serve also need my noise to be centered So I also need I also need Need I'm just hold true because if this was not if this was not zero then there'll be an extra mean mean There'll be an extra mean term that sort of is of high order than then everything else will have a slightly different problem and A I'm comment is in the optimal case this If we have the right if we choose the right log Likelihood this object could be zero automatically So this is this is sort of automatically satisfied for the Bayesian optimal cases and it's an as an extra assumption You need so it's to be saying that mice that my statistician's guess cannot be really wrong I need I need to at least match match the means of the noise Okay, thank you last question Okay, the intuition behind the second overlap them. Can you just share the idea about it? Yes. Oh So, what's the intuition behind this some second this second overlap? I Guess we don't have to understand the The Joint laws if I want to only understand you over the original overlap I could also just just look at that but suppose you wanted to something a bit more precise this overlap We want this to sort of encode how how close our Guess was we want to sort of encode the angle behind this and in order for us to encode the angle acting So normalized by the length of these vectors so I sort of have to control the length of these of these estimates and I guess if I In some cases like if I just if I just flip flit some coins like plus or minus one This this overlap only takes one values But in some more generic problems my p of x may be supported on some interval then these Overlaps can take an a range of values and you don't want a small value from my estimator to really Influence what this overlap is or someone isolate the angle for these problems. So that's the intuition Why I want to do I understand what the joint law was He's modest Very nice work. I read it And I have a ton of questions of course the first one is so maybe you partially answer to this Have you tried to extend it to vector spins Yes, so I have not done it and I think it should be done the main challenge to understand to to go from this to vector spins are You have to go to basically a vector version of Chromersphere and that extra step it should be doable, but There's a bit of a bit more work to be done and the other may may obstacle that sort of Prevents us from going to vector spins is will we now have to just understand the joint laws of many estimates from for for In in state only one exio in the higher-rank case. There'll be many Exios serve after I said I can understand what the joint laws are and the nice news What that is the synchronization that was used to understand the classical vector spin models will also work Work for these models because things are at the right scale so we could still add to regularizing perturbation So I think the main challenge will be You'll be generalizing chromers theorem from the vector case or from the scalar case to the vector case And and making sure that nothing breaks for those photos okay in the follow-up question to this is in this Proof of your university you need to expand the g y is zero in the second argument right and to do that You need that the second entry of this g y Is small otherwise you cannot neglect the further terms. That's that's right. That's right Do you know when this for what kind of rank scaling this breaks down? I see I see yeah, so Francesco's come completely on correct any three times the French one also needs some bounds on what these what these what these are and it's and it's on written paper and So When are some examples of some log likelihoods where this will not work I I could say that it works in I guess models Corresponding to stochastic block models and models where it will not work I would assume that maybe if the white has heavier tails or something like that Then it then it will not work since the log of the probabilities might be too too big So so implicit here in the behaviors of this G is sort of some very precise Precise Precise or some maybe somewhat Restrictive conditions and what the log likelihoods can be so It'll be interesting to see how much we can remove that but this is Right now we just we just Assume assume some are uniform bound on these on these on these derivatives. I Have one question as well. So from the formulas that you get again do you get insights? from the form of the rate function when Do you expect to have? concentration in this mismatch settings or not Is there any insights you can extract from this? Yes. Yes. That is a very nice nice own question. It's sort of something that We haven't done and I guess one of the motivations for approving this Models are at least for on Florence and that is what is was to precisely understand how these are perhaps behave in these in these models but we just have a great function now and we haven't You haven't done any analysis, but hopefully in the future will be able to use this to understand To understand what were you where you mentioned or I guess that was why Florent and Menka were One of the reasons why why why they why they were Where they were interested in this in this problem, but it's not been done yet. Oh, but I think It has been done in In some settings, but it worked by by Francesca. I believe yes. Yes So so yeah, so so they just would correspond to the precisely for the unique unique minimizer case Yes, so they were they were able to show that these that use these overlaps overlaps Constituted for these models Okay All right, I don't see other questions. Maybe we can go for lunch and thank you again Justin