 please um all right sorry give me one second i'm gonna start my step watch okay thanks uh so today i'm gonna talk about um amp uh to characterize optimal type one type two error trade-offs for sorted penalized sorted l1 penalized estimators or or slope so this is joint work with some folks from uh the university of pennsylvania uh ji gi bu and wei ji su and then also um my colleague jason kluzowski uh from from princeton so um while that notion of sorted l1 regularization can be used in many problems i'm going to keep it simple today and just talk with talk about high dimensional linear regression so this is this framework where all you know kind of know and love right um we're thinking of output y equals x beta plus noise um my x matrix is going to be high dimensional so it's short and wide um i'm in the the usual asymptotic regime that most of us have been talking about where uh the ratio of the rows and columns um is some fixed delta uh we'll assume that the matrix is iid gaussian and um i'm also going to assume that my vector of coefficients beta is sparse with iid entries and in fact we have linear sparsity so a fraction of the the elements are non-zero and that fraction stays fixed as n grows so um again what i'd like to do i know y and i know x and i'd like to recover beta so um okay so for for this problem a well known and widely used uh procedure is the lasso right and so what does the lasso do well it produces an estimate of this coefficient uh vector beta uh by minimizing a cost function where the cost is the sum of a data fit term you know this is the the sum of the squared errors plus a penalty term where the penalty is the l1 going all right so um the the penalty term there uh forces a sparse solution right and the the level of the sparsity in the solution is is controlled by uh this parameter um lambda so the larger the lambda the more sparse our solution so um okay so what i'd like to talk about today is a new procedure uh referred to as slope or uh sorted l1 penalized estimation that's a generalization of the lasso and so the idea was slope is we want to break the monolithic l1 penalty in the lasso that treats every variable um in the same way by instead using a penalty that uses different regularization uh depending on the coefficient magnitudes so uh the motivation for slope was um to control thoughts discoveries um by sort of privileging correct support recovery over minimization of prediction error so uh when i say thoughts discoveries this is a notion i'll use throughout the talk um i mean elements in the output that are non-zero uh that correspond to zero elements in my my true beta right that there's also the notion of true discoveries which are going to be you know non-zeros in the the um the output that correspond to non-zeros in the the truth and so the way that slope sort of controls the thoughts discoveries is uh by incorporating adaptivity into its penalization sequence here right so it's like the lasso we have a cost function that's two terms a data fit term and a penalty but now my penalty is adaptive and so um what i mean by that is um it's sort of uh penalizes different values uh depending on on their magnitudes right and so the the adaptivity that it it sort of is motivated by is um that in strategies like benjamini-hatchberg uh the the procedure for multiple testing which you're going to you know compare like more significant p-values with more stringent thresholds so we're going to kind of marry those ideas with uh the the usual advantages of the l1 penalization that gives us that that sparsity so um it turns out that the sort of uh adaptivity in the the regularization sequence for slope um gives it a lot of nice properties uh for example it can be adaptive to uh the level of the sparsity so you don't necessarily have to know that ahead of time and um it obtains these optimal performance and certain problems and so um though the lasso is clearly a specific form of slope so right i could just take all of these lambdas to be equal and i get back the lasso um you know the lasso isn't going to have you know all of these these nice sort of properties so uh what we aim to do in this this work is to kind of quantitatively characterize the benefits of this you know uh freedom in in this you know sorted uh regularization and kind of just further contribute to um helping sort of answer the the question of what what what does this actually give us um by uh in particular exploring model selection properties of slope and uh we'll do this with the help of amp so i'm gonna sort of give the talk by um sort of telling us the the implications this has for for slope and then i'll talk a little bit about um uh the amp aspects of the the proof okay so um when we're in order to characterize uh model selection i'm going to talk about uh the true positive proportion or the tpp and uh the thoughts discovery proportion or the fdp so um this sort of gives some idea about you know uh the the rate at which our procedure is finding uh you know true discoveries versus false discoveries and so um in particular uh something that uh has been shown before is that in the case of the lasso there's a fundamental trade-off between uh this true positive proportion tpp and false discovery proportion fdp that's uh rigorously characterized so you can you know we sort of know the best um the lasso can do in terms of uh comparing these two values right so this is equivalent to like a type one type two error trade-off um if that's more familiar language so um okay so formally if we're talking about the lasso uh the situation looks like uh this so uh the work that um wagey did uh you know uh in in his dissertation sort of computes an exact boundary curve uh that separates achievable uh fdp tpp pairs from those that are impossible to achieve uh by the lasso um no matter what the signal to noise ratio is in the problem and no matter what regularization parameter you use so um kind of you know obviously right what we like is for the fdp to be low we want no you know low false discoveries and high tpp high true positives right so it's kind of like this is the region of the fdp tpp plane that that we'd like to to hit but it turns out you know that uh this work shows that this you know whole favorable region um can't be reached by the lasso so uh and the this image right the shaded area is the unachievable region by the lasso so in short um what what this work sort of implied is that you know no matter the choice of the regularization parameter and no matter the signal to noise ratio both types of errors can't be low simultaneously uh moreover um what they show is that that you know when delta is small so when my matrix my measurement matrix is really fat um or when epsilon is large meaning my signal is is dense uh the tpp is going to be uh asymptotically bounded away from one uh again no matter what you do with the regularization or the the s and r and so um this is sometimes referred to as a dano-hotan or phase transition and it's visible on the the plot on the left by this kind of vertical line here right and then um in the plot on the right we we just have the epsilon delta space where such a phase transition occurs so the the sort of epsilon delta pairs where you know your tpp for the lasso is going to be bounded away from one so um in problems where such a phase transition occurs uh i'm going to refer to sort of this largest achievable value of the tpp as the dano-hotan or power limit or the dt power limit so you know in our plot the dt power limit's going to be something like point point six here okay so um here i'm i'm plotting sort of the realized fdp tpp pairs for both uh lasso and slope on the same problem um where we have 10 independent trials and i'm just burying the regularization uh of both methods across the kind of um the the trials so um the message uh that that you know i want to to sort of convey with these plots is that you know uh if we look in the low tpp regime both methods seem to undergo a similar trade-off between fdp and tpp right so the red and blue points are kind of you know they look the same right but um what's more interesting uh in somehow is that in the high tpp regime uh clearly you know in each plot there's a certain tpp past which we never see red points right so this is exactly the the dt phase transition on the the previous slide so you know it's like point six here and maybe point four here but um you know obviously the blue points though aren't bounded away from from uh full tpp in the same way that the the red points are so it's like the the blue slope points um are able to pass this limit towards uh achieving full power so um kind of recognizing this right uh we're tempted to ask like why is sorted l one regularization better than the usual l one regularization uh for high tpp and um why do slope in uh lasso exhibits similar uh trade-offs in the low tpp regime and so um this is what we're we're going to try to to answer today okay so um what you know what we set out to do uh is to characterize the optimal uh fdr tpp trade-off curve so um what i mean by the optimal curve right is uh i would like to give the smallest possible value of thoughts discoveries um that's asymptotically achievable by slope uh with any regularization any snr um for every fixed level of the tpp right so i fixed my tpp how how low can i can i drop the false discoveries um so it turns out that uh we weren't able to actually do this exactly but we could give upper bounds and lower bounds uh for this this curve and so i'm going to introduce the upper bounds and lower bounds next and then talk about how a and p helped us uh to get um so if i give an upper bound and lower bound obviously the true curve somewhere in the middle so the goal you know the aim is that these are tight in some in some sense okay so in this plot uh i i give an example of um the bounds for just a single problem instance so the plot on the right is just kind of a zoom in of the the plot on the the left and in the the plot the blue and the green curve is our upper bound and then the red curve is the the lower bound and so we know that the optimal curve is between the two so um okay so i want to discuss a little bit about how we get these bounds uh and then i'll go into to some more details so uh when we consider the upper bound so this is the blue and the green uh to prove this upper bound our proof is is constructive so in the the sense that you know we specify a prior on the signal and a regularization sequence for slope and then we show that this asymptotically achieves that fdp tpp um point and so you know if we can achieve it with a certain uh setup right then the minimum fdp must be below below that value so uh below the uh dt power limit um the the upper bound is simply given by the corresponding curve for the the lasso so this is green on the the slide right and so um this is just you know recognizing that the lasso is an instance of slope the fact that the lasso can achieve these values means that the slope must be able to do it at least uh as good um and then uh the the other part of the upper bound above the the dt power limit which is the blue case um it turns out that this is given by a sort of the shape of the curve as a mobius transform and so i'm going to discuss the the proof of the upper bound um in what follows but uh as i mentioned it's it's constructive in the sense that it specifies a simple sort of regularization um and then it uses the amp analysis to uh demonstrate that such a regularization achieves these uh fdp tpp pairs um okay so then um i'll talk a bit about the lower bound but this is all i'm going to say about it uh in this talk so it turns out that proving the lower bound on the the optimal curve is actually a bit more subtle uh and so to to do this we developed a technique um based on a class of infinite dimensional convex optimization problems and uh while like i i think like the the actual proof is like um is pretty cool and uh it uses some novel ideas uh that probably you know i imagine will have um application and you know when we want to prove theoretical uh things about regularizations more generally um for the sort of for the sake of time i'm i'm going to focus on uh the amp aspects of the the proof um but uh before i i uh okay before i get to the proof um i just want to show one more um picture of of what these upper bounds and lower bounds look like more generally so on the last slide it was like one problem instance but now i'm trying to vary the epsilon and delta um pairs here so in this uh image the solid lines are the upper bounds uh the dotted lines are the lower bounds the true curve lies somewhere in between so you know it's like i'm happy to report on the slide that the upper bounds and the lower bounds are pretty tight um but you know i i kind of show this slide because uh there's something kind of interesting going on in the the plot on the bottom right so there in all of these plots we're fixing either delta or epsilon and then varying the other one and so um if you look at the plot on the the uh bottom right and this is something i don't really have a great explanation for but um you can see that what happens is we fix delta so we fix the dimensions of our measurement matrix and we're varying the signal density um and it turns out that the tradeoff curve isn't monotone there right so as the signal is getting more dense it's not like the problem is monotonically getting more difficult or something um which uh you know is perhaps something we would expect and is in fact the case for the the lasso um okay so i had one more plot but i'm gonna skip it and for the sake of time and just talk a little bit about um the proof uh so i just wanted to say at a high level you know um when we talk about the proof i'm gonna talk about mostly the upper bound um beyond the dt uh power limit and so um just at a high level to give you some intuition for how the sort of freedom in these parameters helps us beat the power limit in slope um i made this slide so the kind of idea is that it's well known uh with the lasso that um the lasso estimator will select at most in out of its p uh elements to be non-zero and um moreover you know a significant proportion of um false detections are always going to occur uh for the the lasso um in this like linear sparsity regime so uh and this is interspersed along the lasso path so it's always going to miss uh a fraction of the the true variables which gives that dt power limit so um on the other hand in slope uh we don't have the same sort of bound on you know in out of the p non-zeros the kind of corresponding bound is in fact on the number of unique non-zero magnitudes um so in theory you can you know kind of construct an extreme case where the the recovered slope estimator is completely dense in which case you know the tpp is automatically one so um that's kind of the high level idea about what what changes here um but what i want to talk about is is how we actually you know prove this and so uh the the the big idea is that um it's constructive so we use a simple like two level regularization sequence so um instead of just penalizing with one lambda we now have two lambdas and you know some fraction gets the the big lambda some fraction gets the smaller lambda and we're going to use amp to to characterize the tradeoff curve exactly for this regularization sequence and so um the the program kind of um that's satric uh laid out before is uh you know is the following we're going to you know construct an amp algorithm show that its fixed point solves our optimization problem and then prove that my amp converges to that fixed point and then once i have that i can pass the the sort of statistical guarantees from the amp state evolution onto um my my slope estimate so um this is kind of where i'm starting in my proof sketch my amp analysis gives me an asymptotic characterization of of the slope solution you know from its state evolution and then i use this characterization to character or to say something about the properties of the solution like the sparsity the fdp and the tpp and then this analysis you know gives me the conditions under which my delta epsilon pairs uh under which i can achieve various uh fdp tpp values um but to wrap up my talk i think okay i have like one minute left um i want to say i want to just shout out a a piece of work um that uh i have uh with sorry i'm skipping some slides um okay uh with uh a few colleagues at the University of Cambridge so this is with Oliver Fang, Ramji Venkata Ramanan and Richard Samworth and it's a tutorial article um on amp that was meant to kind of introduce amp to statistician so in general we wanted to um kind of derive and motivate amp without using uh replica or belief propagation since these are kind of ideas that uh statisticians tend to not be as familiar with um so if anyone has the bandwidth uh you know we're we're definitely looking for some feedback on um the the article and you know most of you guys are cited uh a lot so uh that's just a quick plug but one of the um things that i think that's cool about this article is uh we tried to kind of unify some of these ideas and one of them is this this program that Cedric used in his work and that i used here for slope um that's that of deriving exact asymptotic characterizations for estimators that are solutions to convex optimization problems so in particular this work provides kind of like a recipe for um doing this so you know anytime we have a an estimator that's a solution to an optimization problem of this form um what that paper gives is a generalized amp algorithm that has a fixed point that will solve this optimization problem so i'm still thinking of you know some sort of glm that's my model and i have a sort of a loss and a penalty that are separable and assumed to be convex but you know um these things you know well let's let's okay with the this simple sort of um optimization uh what we do is we provide the amp algorithm that will have the the um the fixed point that solves it and then you know the work on behalf of the the user is to then sort of verify the existence and uniqueness of the fixed point in the state evolution and then prove the the convergence of the amp to the fixed point and so um truly like the step three is the hardest part um as you know usually the cost functions are not going to be strongly convex at the solution so uh showing that we converge you know um takes a little bit of work but uh you know uh we give some strategies that other papers have have done to to uh to do this so what we're hoping is that you know with this recipe we kind of unified the work that you know um folks did for the lasso and the robust in estimators and you know pragya did for logistic regression and you know we did for slope here in a way that that sort of just gives a recipe that we think you know probably could be used more generally for glms um of this sort um so with that um I will stop uh thanks everyone for the the invitation and um I will pass the the baton to uh I think Joe's next so