 okay so everything is working yeah okay so I will continue what we started yesterday on the generic problem so you estimate a symmetric matrix plus noise so this is a base optimal estimator for the mean square error when you are sampling according to the posterior and so I'm making the assumption that so remember that this notation is for the Euclidean dot product so in absolute value is converging to q and I'm quite happy that Pierpolo gave his presentation this morning because this is exactly the same q as in his matrix in his lecture the first lecture today so this is my on that if you want why I'm making this assumption with absolute value here it's because I'm only able to show that actually the square of this is converging to q square so I don't know there is still a sign he is not I'm not able to prove the that the sign is a plus but this will be sufficient for me to to show so yesterday I gave you a proposition and today we'll prove only this part that's this on that lambda max it's converging to q so yesterday I gave a bunch of other results connecting to q yes sorry otherwise there would be a quite a problem between the two thanks so is there any question on this the the the main goal here again it's a kind of pedagogical because if you only care about the minimum mean square you don't really need to to to show this but it gives I think a nice overview of the mathematical tools that are required in order to to understand the what physicists call their own that's on the overlap matrix okay so let's go for a small proof well it will take so how much time do I need to stop at what time okay a little bit after so this okay so this will do like usual this is the same thing so I'm I mean here I'm I'm just removing the bracket here and you see that there is a the dot product appearing between two independent copies and with with uh sorry nishimori you can say that this is the same as this and by my assumption this is this converse to q square okay so if q is equal to zero then the trace is on the problem on the lambda max so it's okay but if q is strictly positive I need to to do something else in order to prove this second part one way to to deal is to take the third moment of my matrix m this is always true so what you need to prove is that for this which you can rewrite like that so remember this is the euclidean the scalar the euclidean dot product then you take this one you want to show that this goes to q3 and q3 q square so you will have that lambda max is equal to q thanks to the upper one okay so this is uh you need to to prove this and then you you will be done with my result this particular result is part of no importance but the way we prove it I think will exhibit a nice mathematical tool the the first observation which is crucial is a symmetry in in in this problem so if you are looking at this matrix containing the the euclidean product of a replicas okay so this is not a real matrix this is an infinite array because you can take as many replicas as you want uh iid each time with the posterior distribution those x1 iid according I think I called it gn yesterday the posterior distribution given y so the observation so this is a this is a what is called a a random array but it has a lot of nice property so it's a random symmetric clearly it's a positive semi-finite and it's weakly exchangeable so I will define this infinite array so here I need to define a little bit this term because I'm not dealing with a finite matrix so here's a definition so an infinite symmetric random array r is a collection of random variable within this kk problem such that the symmetry is obvious so this is always almost surely so now the definition of weakly exchangeable is the following if you take a finite n you take a permutation so this is a group of permutation on n elements then you have this property in law okay so if you look at your infinite array if you take any permutation of the n first component you need to to have equality in law okay so for those of you who are familiar with the notion of exchangeable random variable this is an extension to a 2d array well it's hard to define a permutation on something which is infinite so yes that's here in law and the second thing which is what does it mean to be a positive semi-finite it's what you expect I guess so again you take the first n component when you want this matrix to be a positive semi-finite okay so so on for any n so does it make sense so it's clear that this is true in this case I mean because if you are permuting these are iid so it's even stronger than okay so what I will do is I will cheat a little bit only and I have something that depends on n when n is the size of the problem so this second of array is indeed tight because all these quantities are bounded so I will there are also other I mean so this is really the matrix so on the diagonal you have one because I did not specify but I take the expectation the the norm of X to be X is on the unit sphere so the the posterior mean still on the unit sphere and by my assumption I have this okay this is exactly evaluating okay I'm cheating a little bit it's not it's convergent but I will directly take okay I'm saying this is a tight seconds I'm taking a converging one and then in the limit I will consider an infinite array with this so now I have an infinite random array which is a weakly acceptable positive semi-finite and satisfying this and what I want to prove is that all this assumption implies so I'm already in the limit okay you need to do a little bit of mass in order to translate my result in convergence but here you see that I want to show convergence what I will show you is that in the limit this is true meaning that if you take R12 times R23 times R31 this is equal to Q3 okay does it make sense and R is this prop is an excuse for me to introduce I think a very nice representation theorem for such a random object you might know there is a well-known theorem of Aldous and Hoover for a weakly acceptable array when you are adding a constraint of being a positive semi-finite there is a specific instantiation of this theorem which is due to Dov Bish on Sudakov I think Dimitri Pacheco I mean has a proof of this theorem basically using only the Aldous Hoover representation so you take any infinite symmetric weakly acceptable positive semi-finite random array then there exists a separable Hilbert space but yeah the theorem is H with the corresponding scalar product and the random probability distribution eta on H R such that R is equal in distribution sorry H R prime plus possible term on the diagonal so this is this product here in the inverse space H and where conditionally on eta the law of the IID with low eta you have you can find a Hilbert space in which you will pick vectors in this HK and this is just a real number so this delta is for the chronicle so this is just on the diagonal you you can add a constant basically a scalar and then I mean clearly if you have a random array of this form it is a weakly exchangeable and a positive semi-finite and this is the only possible case basically so for Aldous Hoover it's a bit simpler because you well first you don't need to exhibit a Hilbert space you can take all your random variable uniform in 01 and then you have a function then you will basically replace the Hilbert space by you you don't have the the dot product in your Hilbert space you have a generic function that will take as arguments your uniform random variable IID random variable here there is I mean all the randomness is in the H and then this is the the dot product in your Hilbert space so I will not prove this I will just use it in my case so again I already took the limits so I'm basically in this setting with this additional constraint and I want to prove this so what we'll do we will work conditionally on eta okay so this law given by Dovish-Sudeck of theorem on nu will denote the first marginal so nu is basically the law of the H okay so that what we have is h1 h2 is equal to q nu almost surely okay so this is exactly this this condition and I'm a poor mathematician so I don't know the sign so I'm just fighting to for a sign so this indeed implies that sorry the H is a vector in a Hilbert space for nu I'm taking only the distribution on the H not on the yes I don't really care on the diagonal the diagonal I have only one so this is is that is that clear the well I guess this is a simple argument to show that this implies this because okay probably I will just draw a picture so if this is a sphere of radius square root of q so let's say that this is not correct so this means that you have you can find h which is not on this sphere okay over there and now if you if you take a small ball of radius epsilon around it you see that the norm here is let's say strictly bigger than q they will all have radius strictly bigger than q if you take any two vector here if the ball is sufficiently small you will you will be able to to have for all pairs of points in this small ball the euclidean product the dot product will be bigger than q2 okay so basically on this ball you have a ball somewhere that h8 and epsilon where what I'm claiming is this and it's strictly bigger than q for example and now okay if you look at so and this is clearly a contradiction with with this because this means that two different of q is at least bigger than the square of the ball which is built okay contradicting this this fact so here I'm okay so I'm I'm almost done because no so this means that this cannot happen so which means actually this that everybody's so all the vectors in h are on this sphere so you have equals q for all the vector so let's take say okay I have h1 here what are the possible value for for h2 with this constraint which is true almost surely yeah so vector h1 is sitting here I'm picking one according to my measure nu then so h2 has to be on this sphere yeah so there are basically two points okay which are this one so which means that the measure nu is concentrated on at most two points on the sphere and now this is enough for me because what I want to show is what so what is the value of this well so this is plus or minus q it's time but let's say it's plus q here and minus q here then what is the value over there so plus minus which means that h1 and h2 are the same h2 and h2 are different so which means that h3 and h1 are also different so these two should be the same so it's always you more sure okay so even so I don't know the sign I'm still able to to conclude that this is true which is exactly what I was looking for so yep I know it looks like you are using a very elaborate tool to prove something which might be obvious to physicists at least but I don't know what to do so okay here I was so I did prove only the part on the infinite object so the limiting object but there is no difficulty to translate this in a in a result of convergence when n turns to infinity okay I'm not doing that part and I mean being on the infinite object actually even on a on a math from a mathematical mathematical point of view it's I mean it's it's like when you are dealing with exchangeable random variables there are some definitions when for vectors but it's it's a bit cumbersome so you need an infinite second in order to to to to play with this with this object okay so even so I'm not able to show that the matrix of our lab is the one you saw this morning with one on the diagonal of q I have a plus or minus q I don't know the sign but it's enough to to show the result I am interested in for the minimum mean square error and yeah at least if you so it will be more I don't know satisfying to show that you have a connection to q but I'm not I don't know if Jean has a proof now of this or not but yes that's you do the only thing we are able we were able to show is that the square is converging to q square so we still don't know the sign no I don't know okay so uh yeah do you want me to to stop yes so exactly well you are not a physicist anymore then no yeah it boils down to this problem of determining x to a sign or everything for us it's a it's a pain but okay I don't know so when you are not planting anything so when you are not the the the the the plane charitan k k partake model so we not in the inference the the plane charitan k k partake model yeah but but it's I mean the the landscape it's much more difficult than the one I'm presenting here so here I'm basically in the replica symmetric setting I think this is how you call it in statistical physics uh if you are not in this setting uh you need to to work a lot more uh on this overlapped yeah so I I I didn't made uh I I didn't make uh I didn't prove anything right now because I started with an assumption that the absolute value is converging to q yes so this I I'm making the connection uh and with uh we have a formal proof that so we can show that the assumption is correct so in the end everything is is correct um but uh right now you you don't have I didn't show you how to prove that's my first assumption is correct uh okay so the the last part uh I wanted to uh to come back to and this is I mean also uh this is true for uh in in a rather general setting so but I will come come back to the spike uh I'm afraid I will only have 10 minutes to okay let's start then yes I will write it properly so that I I will start directly with this this afternoon back to the spike so again okay then uh again we are observing only the upper triangular part of this I'm assuming now that's uh all the x are IIT with the distribution p0 the Spickner matrix I hope I convince you that the right MMSE is not on the vector because of the problem of sign and this is this one okay so this is a framework so you you might wonder why I spent so much time uh uh at the beginning presenting a Bayesian tool with uh for uh with a vector and with Gaussian noise but uh you should be able to see that indeed I'm exactly in the framework uh we use uh in the previous lectures uh you basically just need to flatten your matrix uh x okay so now the number of indices is exactly as this n times n minus 1 divided by 2 and so you have a of course if you flatten the vector now the entry of this vector they are not IID anymore but I never assume that uh I had the IID component for my x vector in uh in my in my previous lectures what is important is that uh the noise is a white noise so there is no covariance and this is still correct in this setting so all the IMMSE theorems that we show all the the free the free energy all the computation we made and the connection between them are still valid in this setting you just need to rescale the I'm rescaling the lambda with the n so you just need to play with the parameter lambda on the number of components which is not a small n now but this quantity and you you you can use all the tools so what we did uh before is not lost if you if you want and we uh we can apply all the tools I show you so I will uh I will end this morning course with basically two basic facts uh I will keep this one later so I told you we're interested in the minimal mean square it's also good to to look at the performance of natural algorithm so there is the first one which I'm calling the dummy one the dummy estimator is the one where you are not even looking at the dataset why so what is the best guess you can do without even looking at the why here so it's also just taking it will be constant equal to the mean I'm taking of this uh yes the entry of the matrix is the square so I'm just taking the mean to the square so I'm assuming also let's say that okay so what is the performance of this so call it d dummy mean square error this is x square minus so this is the best constant estimator you you you can make so I'm making the assumption that this is one okay so and this is clearly uh an upper bound on the d dummy square error okay so now I have uh what I'm calling the naive pca so uh all what I said I tried to convince you that what you should do is look at the matrix m which is uh you sample according to the posterior you take x transpose and you take the leading eigenvector of this but of course if you knew how to sample uh given the posterior then basically the problem will be solved so you do not have access to the m matrix I introduced that before so instead uh you have access to this matrix and what you you might do is uh do the same algorithm but not on the matrix m on the matrix y hoping for the best okay so this is why I'm calling it naive pca so your estimator now for x hat is a leading eigenvector of the matrix y so I'm normalizing it uh equal to n the dimension and now your estimator so okay we'll take a delta times this so you are still have one degree of freedom and we'll try to to to pick the right value for for delta so for this you saw uh I don't know you you call it if there's a in in maths it's called the bag benarus pca phase transition and in in this case I will take it here but if you uh put it in in this setting when lambda is less than one then the leading eigenvalue is two it's uh the bulk of the semicircular law and you have no signal zero almost surely when lambda is bigger than one then the leading eigenvalue is square root of lambda which is strictly bigger than two so you have one outlier and which is correlated with the true signal so now I want to take delta this you do the math and it's a function of delta two delta times zero if lambda is less than one and times one minus one over lambda if lambda is bigger than one so which means that if uh you you take delta equals zero if I mean you want to minimize this in delta if lambda is less than one so basically there is no I mean in this regime there is no hope to recover signal anyway uh on the the when lambda is bigger than one then the right value for delta is this okay and I will write it so this is the optimal value if you want to minimize this in delta and so the msc for the pcrn of lambda tends to one if lambda is less than one and this value one minus lambda to one over lambda okay so here it's a very simple application of of the pcr so you see that okay if you in the example we'll we'll see as this afternoon I will take a a symmetric distribution for x so this will disappear so in this case the best estimator is zero okay uh so the value one here corresponds to random guessing so there is no information and as soon as you are above the dbp phase transition you are able to recover a little bit of signal and this is the exact value for the mean square error achieved by the pca so now the question is is it the best possible uh is it equal to the minimum mean square error or not and we'll see uh this is a teaser for this afternoon so at what time do we so is there any question on all right what time is it it's one it's uh 15 past 12 we were supposed to restart at two let's restart that 130 okay okay so let's start again at 130 p.m so for those online we we start earlier because we have some transportation problems here so keep that in mind see you