 All right, so this week we have our own Josh Cooper talking to us about the two equators of the Permitohedron. So take us away Josh. Thanks Yeah, so this is this is joint work with a couple of New Zealanders Two at University of Waikato, it's Aiba Franck and Geoffrey Holmes and then Rory Mitchell is actually a component of his PhD thesis. He's now at NVIDIA. This is going to be interesting because when I gave I've given this talk for some portion of this talk before and it was a five minute talk. So I spoke very, very quickly and didn't get to everything I wanted to so I've expanded it and we'll see what five minutes turns into today. All right, so what is this weird word Permitohedron? All right, the order in Permitohedron, it's a polytope where the vertices are permutations. More precisely, it's the convex hull of all the n vectors, n dimensional vectors where the entries are one through n. So if you think about what that looks like, okay, well we've got, for example, one two and two one. That's the n equals two case and okay it lives in the plane but you just get a line segment because it has two vertices and so it's this first polytope. And then, okay with n is three you get, okay these six vertices and they they live in three space of course because n equals three but they all live in the x plus y plus z equals six plane and so it's actually one dimension lower and turns out you get a nice regular hexagon out of it. And then it lives inside the standard simplex which is the portion of that plane that x plus y plus z etc equals a constant intersected with the first fourth length. And then if you move up to n equals four, now okay you get all 24 vertices, this is a what a truncated cuboctahedron or something. And again this actually lives in the x plus y plus z plus w equals 10 plane or hyper plane dimension three but if you pull it back to three dimensions you just perform an affine transformation to put it in three space it looks like this nice regular cuboctahedron. And of course you continue but then it's much harder to picture. So this is the permanent hedron, it's got lots of amazing properties, it's really a beautiful object, highly studied. First of all of course it's got n factorial vertices not hard to see that they're all on the convex hull so they are actually all vertices. Furthermore, each vertex you might wonder what the neighbors of a vertex are because clearly there's this one skeleton the vertices and the edges of the polytope. So what are the neighbors of a vertex? It's the permutations you can get by performing an adjacent transposition. So if you look for example the two, four, one, three there that vertex, its neighbors are two, three, one, four that you could get by swapping the was it the third and fourth places. And then these are actually that I've ridden the inverses of the permutations so it's a little hard to read. But anyway you get three different neighbors and for each vertex its neighbors are the permutations you get by performing an adjacent transposition as you take two adjacent elements and you switch them. And so the one skeleton is the Hassid diagram of the weak brouhaha order on the symmetric group. This is the poset where the bottom element is the identity and the top element is the reverse identity and you work your way up the poset by performing adjacent transpositions. So it's also the Keely graph where the generators are the adjacent transpositions the Keely graph so the Keely graph of the symmetric group. And then like I mentioned a minute ago because the sum of the coordinates is constant across all of these vertices well actually this lives in a one-dimension lower a co-dimension one affine subspace of n-dimensional space. It's not full-dimensional but it's only deficient by one. Now how did this come up? All three of those people if you look them up they're computer scientists and they mostly work in machine learning. So it turns out this actually comes from a machine learning problem known as the problem of estimating feature importances. There's all these contexts in machine learning where you've got lots of inputs that you want to learn from lots of features of the objects of study and you want to figure out you know can I predict something or can I estimate something based on all of those inputs. Those inputs are called features it's a natural thing to call them they're just statistics of the underlying objects and those features though they vary in importance and very often there's so many features you can imagine in an image each pixel is a feature it's even maybe three features you get an r value a g value and a b value red green blue and then also an alpha value the transparency it's pretty rich there's a lot of features and so very frequently this happens that there's just too many features to work with you want to instead find a key set of important features and disregard the rest and try to learn from them. It makes the problem more computationally tractable it makes it explainable there's this whole area of AI explainable AI so you're you're trying to produce a model that a human being can understand what the system learned. So estimating feature importance is looking at the different features and determining or at least estimating how important they are to the final decision is really a hot area of research and machine learning these days and it's it's if you want to be computationally efficient it's even important in applications which turns out is exactly this is being used apparently by NVIDIA at this point the stuff I'm going to talk about today. So there's a really classical way to estimate feature importance is before people even had the words feature and importance is well they had those words but didn't refer to what we think of as today. The idea is that it's this thing called a Shapley value the idea is that you're going to adopt the features one at a time and as you do it you watch how much impact it has on the some cost measure or benefit measure of the model so usually this is your your ability to your correctness and the predictions and precision or accuracy of your predictions some measure of how good your model is you go look at as you go from from feature to feature how much the the fidelity of your model improves and the problem though you might anticipate is that when you adopt a feature the previously adopted features could make the newly adopted feature more or less important it it depends there's a kind of coalitional effect where maybe all of the information gained by adopting feature number 17 is already present in features one through 16 and then the benefit is zero but if you adopted feature 17 first then you'd get a large benefit out and so the idea is what we really want to do is we want to average over all of the possible orderings of features as you go through them one by one what is this marginal improvement with this the value of each of the features but average over all of the possible orderings of the of the features so in other words all the permutations of the features that's a reasonable thing to to look at and indeed there's quite a bit of literature on the shabby value and so this is just to summarize is a way to think about or way to measure how important all the different features are if you're taking into account the impact of multi feature importance these coalitions of features problem as you may have noticed and factorial is big and so you can't possibly list all of the n factorial orderings of the features and especially you know if the number of features is really large n factorial is just completely out of hand so one way to deal with this is well you could instead replace the set of all the permutations of the features with some representative set something that's going to do a good job of representing the space of all permutations and this is the notion is you want to sort of quasi random set of permutation something that that is looks like a random set is well distributed throughout the space of permutations you might try a random set in fact a random random things tend to be quasi random and well that works but it turns out it doesn't converge very quickly and something that if you haven't encountered this before is a little bit surprising often it's possible to get a better estimate of something you're trying to estimate from a well chosen carefully distributed set than it is from a random set random sets that the gaps between the objects tend to be tend to vary quite a bit in their their distance from each other if you picked a set that's well distributed or equi distributed you can find something often that's better than random and get better convergence properties so the question here was is there a better family than just random because random is not uh it turns out in this instance it doesn't work all that well um so what do we mean by that I I want somehow a set of of permutations and like I said we could do a random set but really you want you want to enforce that they're distant from each other you want you want the nearest pair of permutations to be as far apart as possible so you want some sort of orthogonality between the permutations near orthogonality but the kernel with respect to which you are orthogonal uh essentially the the bilinear form with respect to which you're orthogonal um is going to matter a lot so this is gonna this imposes a geometry on the space of permutations and you want to choose a uh a notion of distance well so that the quasi random set of permutations is first of all straightforward to compute and second of all does a good job of providing an estimate so uh one possibility for a kernel and this is a this is a really classical uh notion of distance between permutations it's called the kendall tau um it's basically just the number of inversions that it takes to get from one permutation to another so it's the the number of adjacent transpositions the minimum number of adjacent transposition takes to get from one to the other one way of saying that is it's the number of inversions of the permutation sigma inverse sigma prime if you're measuring the distance between sigma and sigma prime and inversion is just when two elements of the permutation are out of order so if sigma of i is less than sigma of j and i is less than j then those two elements are in order but if sigma of i is greater than sigma of j but i is less than j then they're out of order that's called the inversion um and a a non-inversion is sometimes called a non-inversion um i i would maybe like to call it a version but people don't do that so anyway it's we have inversions and then non-inversions of course they add up to n choose two because the total number of pairs of elements is n choose two so the idea is we're going to measure the distance between two permutations using the relative number of inversions between them and well okay so we're not actually going to use that raw quantity we'll normalize it so that our measure of distance goes from zero from negative one to one so that if there is far apart as possible it's going to be negative one and if they're perfectly aligned it's going to be one um right because if the number of inversions is n choose two this quantity is negative one and if it's zero if the number of inversions is zero that means that sigma and sigma prime actually agree everywhere then it's one so yeah when sigma equals sigma prime you'll get a kennel tau value of one and if one is literally the exact reverse of the other one you'll get uh negative one and then there's a sense in which you could be in between and it is a two two permutations are orthogonal with respect to the kennel tau kernel if this quantity is zero or approximately zero the idea being that for about half of the pairs they're reversed between the two permutations and for about half of the pairs they're in the same order in the two permutations I'll call that the combinatorial equator of the permutahedron because of course this is a function of pairs of vertices of the permutahedron as well as of the symmetric so this provides us with a kind of equator actually really what I mean by combinatorial equator is the the set of sigmas so that the kennel tau with respect to the identity is zero or approximately zero now choosing orthogonal vectors with respect to the kennel tau is as you might expect slow or hard it's not it's a it's a difficult combinatorial problem to come up with a let's say maximal family of permutations which minimize the maximum absolute value of kennel tau for a given number permutations or something like that so instead we'd like to just use geometry instead of this this strange measure of distance in the permutahedron the kennel tau what if we could just use dot product this ordinary ordinary dot product and the the distance that that induces the two norm so what I mean by that is we're going to take those permutations sigma and sigma prime we're going to apply the affine map that just pushes them out onto the surface of the appropriate dimensional sphere and centers the sphere at the origin so just it's just the affine map that takes the that standard simplex the hyperplane containing the standard simplex and projects it down in dimension by one to get rid of the extraneous dimension and centers it at the origin and then also scales it so that the permutahedron is actually circumscribed by the unit sphere that's an a and then we're going to use dot product okay so here's a visualization of the order four permutation with on n equals four and here I've pictured the geometric equator and also the combinatorial equator so the geometric equator this is literally just where the plane that encodes orthogonality from the identity permutation where that cuts through the permutation or through the permutahedron and more generally the sphere that is circumscribing the permutahedron that's this blue circle so those six six eight eight vertices of the permutahedron that lie on the the geometric equator they're in blue here and that's the that's the geometric equator the permutahedron the combinatorial equator are the points of the permutahedron which are combinatorially three away from both the north and south pole so if you notice the south pole is the identity permutation here one two three four north pole is the reverse identity four three two one and since kennel tau is just a linear function of affine function of the number of inversions as you move from edge to edge here kennel tau is just incremented by the same quantity and so really just want something that's equidistant in the graph in this one skeleton from the north and south poles so you can figure out if you look at the two four one three vertex that's sort of near the middle here it's number of inversions from one two three four is three because if you follow an edge you get an adjacent transposition as you move from two four one three to one four two three and then another inversion you pick up by performing an adjacent transposition that moved to the one three two four vertex and then finally one more inversion to get to one two three four so it's distance to one two three four is three and it's distance to four three two one is three well there are one two three four five six the ones in pink here are the combinatorial equator those are the vertices that are at distance three in this graph from both the north and south north and south poles and you notice there's not a perfect alignment between the combinatorial and geometric equiders here there's some vertices that are on the combinatorial equator but not the geometric equator those are in blue there's some vertices that are on the combinatorial equator sorry the other way around those are in pink the ones that are on the geometric equator but not the combinatorial equator are in blue and then the ones that are on both are shaded with both colors but it's not a perfect overlay you can see they're close to each other so you might wonder how close are they is this actually a good approximation are the two equiders near each other or not because we'd like to use the the geometric equator as a replacement for the combinatorial equator the dot product is a lot easier to compute than Kendall tau well all right so here's the here's the theorem that relates the two this thing's a mess so let me try to illustrate a little bit the blue is the upper bound and that's the dot product so remember a is this linear transformation onto the unit sphere and so the you transform a with the linear transformation to get stopped product the transform of of sigma prime and so the vertical axis is the dot product and they're orthogonal if it's zero the horizontal axis is the Kendall tau distance between the two so you can see they do track together but we've got an upper bound in blue and a lower bound in in red that's these these strange looking functions that I put in red and blue boxes and then the the dot product the range of possible dot products as least as as far as these bounds go is in green so you can see for various levels of dot product if the dot product is at some level this is the the possible range of Kendall tau values um it's uh it's not every it's not all the way from negative one to one but uh it's also not exactly on the line y equals x so there's some looks like there's some wiggle room in particular um if you just look at the dot product being zero that's the most interesting case because that's the uh that's the competition that's the geometric equator um Kendall tau is between negative a half and a half so I have little low ones here because it's really in the limit as n goes to infinity but the idea is that if you're close to the combinatorial equator then Kendall tau is between negative a half and a half or to say that in another way that if two permutations are geometrically approximately orthogonal then with the respect to the number of inversions they differ by between a quarter and three quarters of the possible inversions that's what that negative a half to one half turns into a quarter to three quarters so if two permutations are are essentially orthogonal then the number of inversions in which they differ is somewhere between a quarter of the possible number of inversions and three quarters of the possible number of inversions that's that that horizontal line the green line segment here going from negative a half to a half so of course reasonable question is this tight and actually I don't know um so here is a sample this is I think 15 million permutations of I remember I think it was n equals 12 random randomly selected permutations uniformly at random those are the the blue dots here are upper bounds in orange and the lower bounds in green you can see the the blue stuff is not all the way to the edges so that maybe if you looked at all the possible permutations I really would fill in this this uh sort of eye shaped thing or walnut shaped object almond shaped object it's not a walnut um and uh but you know it doesn't look like it looks like it it sticks closer to some maybe s shaped curve that goes between the zero zero and one one point minus one minus one one one um and in fact I I don't know whether these bounds are tight so here is um a demonstration that uh well it's not far from the truth of these so this permutation what I've done here is imagine you plot a permutation literally you plot sigma of one at one and sigma two or two and sigma three and three etc so it's the just the graph of sigma and then you zoom out enough that it looks like well this object so imagine this is your permutation this if you're familiar with permutons this is really a permuton looking at um so this is a permutation that is decreasing with slope negative one up to about one over the cube root of two fraction of the possible values and then after that it's slope one um that permutation uh it turns out that's exactly at the geometric equiters so the dot product of this with the identity permutation is well it's zero if you scale it appropriately because this is of course it's going to have a dot product that's positive because these are all positive but if you you scale it so that it's now uh going from negative one to one instead of zero to one you get dot product so this is on the geometric equator but uh how about what's the number of inversions well the number of inversions it turns out is about n squared times two to the minus four thirds which is that's about 63 percent of n choose two well 63 percent is less than 75 percent which is that three quarters and uh the uh uh kendall tau um that that means that that 75 turns into negative 0.26 which is bigger than negative a half which corresponds to that one half above so because there's two ways of viewing it you could view it just as inversions or you could transform it into the kendall tau so 63 percent is not quite 75 percent not far but it's certainly bigger than 50 so you the answer is not 50 percent but uh it could it could be somewhere between 63 percent and 75 percent and I I I should say I haven't worked hard to try to find a better permutation this was just playing around with a few examples uh so maybe you could do better by just trying out a few things but uh certainly it shows that we don't we don't know if the the bound to tighter um so let me just suggest the proof first give all the details but um this is this is the the proof of the upper and lower bound let me just uh introduce bubble sort because it's going to be a key ingredient of the but bubble sorting is if you've never seen sorting before you've never seen the algorithmic problem sorting before it might be the first thing you think of I said here's a list of numbers can you put them in order please uh you might try to bubble so bubble so you just move from left to right and as soon as you encounter something that's out of order you bubble it one to the left so you imagine that sometimes it's drawn vertically it's really like things are bubbling to the top so we're just going to keep doing that we're going to read through the list and as soon as you encounter something that's out of order we'll bubble it one to the left and if you think about moving from the identity permutation to some prescribed permutation that means that whenever you find an element that's greater than the element to its right sorry less than the element to its right then you swap it with the element to its and then you start over you start reading left to right again you just keep doing that until it terminates you think about it for a minute the the number of inversions is going to be increasing as you do this and so you'll you'll never get into a loop it's definitely a terminating process and it ends with the permutation that you're so this gets you from the identity permutation to the permutation that you're targeting and uh and it does so by moving along the edges of that ossifier along the the one skeleton of the permutahedron so you could this is the basic idea you could try to track what happens to the dot product of a permutation and the identity permutation as you move along edges of the permutahedron following a bubble sort right so let's just look at an example this is a proof by example so consider permutation pi of the numbers one through nine that in one line notation is this completely randomly chosen three one four five nine two six eight seven and uh right so by one line notation if you haven't seen this before this means that sigma of one is three and sigma of two is one and sigma of three is four etc just a compact way to present a permutation so what i'm going to do is we're going to track we're going to perform bubble sort on the identity permutation to get to this permutation three one four etc um and as we do it we're going to track what happens to the dot product of the identity and our partial pi so notice the number of inversions of this permutation three one four five etc is nine that means our bubble sort is going to take nine steps and okay let's start with pi not is the identity we're going to move to pi nine is well pi itself that's our target permutation and we're going to watch what happens the the dot product of the identity with our pi j's as we move from one pi j to pi j plus one as these adjacent transpositions are applied what happens to that dot product all right so if you look at the identity dotted with pi j minus one and then compare it to the identity dotted with pi j everything's the same except for some spot which i'll call k at some position k position k and k plus one got swapped so whatever value was sitting at k that's pi of k call it a sub k that's being swapped with a sub k plus one which is it's just another name for pi of k plus one i just didn't want to index the pies in a confusing way those just swapped roles and so everything else is the same the dot product changes by the so called delta of the identity dotted with ij let's call it it's just the identity dotted with pi j minus the identity dotted with pi j minus one how much does it change while we have k times ak plus one in the after the adjacent resolution before was k times a ak so that usually this the first term of the sum second term of the sum comes from the k plus one term and so nearly everything cancels and you're just left with ak minus ak plus one that's handy that means that when you perform an adjacent transposition on a permutation the two elements that you swap the pi of k and pi of k plus one those two elements when you swap them the amount by which the dot product of the identity with the permutation changes it's just the difference of the two values you swap so if we can just keep track as we move along the edges of the permutahedron keep track of which two elements are getting swapped not the positions of the elements but the permutation applied to those elements which elements are getting swapped those differences are going to give us if you add them all up that's the total change from the starting point from the identity dotted with itself that's the that'll be the total change in the value of the identity dotted with the permutation we'll have a way to compute Ridley's estimate the dot product of the final pi dotted with the identity so here's what happens if you apply adjacent transpositions to our permutation the identity permutation to get to that permutation pi just look at the first example so if pi not is the identity you the first bubble that you perform is you swap the element that's in position two with the element position three you might say that that's bubbling at position two and then okay that's the adjacent transposition three two that's I'm reading the the row one of this table not row zero but row one and then how much does the does the dot product change while it's three minus two because those are the two elements we swapped notice you're always changing by a positive amount because well depending on how you think of it because it's a larger element is changing places with the smaller elements moving to its left that's always happening in bubble sort never go backwards so the change is one and the number of inversions of course goes up by one because every time you perform an adjacent transposition the number of inversions just changes by one and Kendall Tao is now well okay so instead of the inversions changing by one each time number of inversions changing by one Kendall Tao is changing by one over the total possible number of inversions which is 36 here so so you get a change of of uh well it's actually two over 36 because it remember it was twice the number of inversions divided by n choose two so that's it's going to change by two over 36 each one or one 18th but I've written them as fractions of 36 and then the dot product is changing by well we started at 285 if you dot the identity with itself you get 285 it's going down by one because the dot product changes by one this delta was one now okay look at the look at the second line the second line now you're swapping the first and second elements the one and three are out of order and so they get swapped when you do that one and three are the elements that are getting swapped three minus one is two so that's how much the number of inversions changes now number sorry the how much the dot product changes the number of inversions is two that means that the Kendall Tao is now 32 over 36 but the dot product is now 282 because we took 284 and subtracted two from it and just to pick on another row here if you look at row seven as you move from row seven to row eight actually yeah look at row eight and row eight if you swap the element so you're going to bubble a position five that means swapping the elements that are positioned six and five those are the numbers two and nine when they get swapped that's a delta of seven right nine minus two seven so the dot product is changing by seven is dropping by seven the number of inversions is now eight so Kendall Tao is 20 36 but the dot product is now 264 because if you've been keeping track of those changes all along the way you're now you land at 271 minus seven which is 264 so okay let's try to be systematic about this about this table of course is just one example what's happening is notice under elements swapped under elements swapped all of those pairs are distinct the reason is you never swap two elements and then swap them back later bubble sort only moves only moves these values in one direction things only percolate to the left you'll never swap two values and then later on swap them back so you only ever see one of these pairs twice you don't see all possible pairs but you might wonder which pairs do you see well actually in this complete graph I've drawn on nine vertices you can see the pairs that got moved so two three occurs and there's the edge two three and two nine occurs and there's the edge two nine etc so these are actually exactly the nine edges that we saw as pairs occur in the adjacent transpositions and the those red numbers that are written on the edges those are those are the weights of the edges it's the just absolute value of the difference of the end of vertices because that's the delta that's how much the the quantity that the dot product changed when we performed that particular adjacent transposition so if you look when when we performed the two nine transposition that the one that swapped the values two and nine delta changes by seven so the weight of that edge is seven and the total change is going to be the sum of all those red numbers so the idea is we've got the complete graph on nine vertices and all the edges are weighted by absolute value of the difference of the two end vertices and we have some subgraph of that that comes from which adjacent transpositions are happening and the sum of all the weights of that subgraph the weight of the subgraph is the difference between 285 and 263 it's the total amount by which the dot product changes as we perform the bubble so 263 is 285 minus the sum of all of those those edge weights where are these numbers coming from okay so first of all notice the sum of the edge weights one plus two plus two plus three i'm just adding up the red numbers that's let's try to bound that right so what's what's the least that that could possibly be if you had nine edges here what's the least that they could possibly add up to well you've got eight ones and seven twos and six threes etc if you think about it for a minute there's only one place that a difference of eight a weight of eight occurs that's on the one nine edge there's only one eight but there's two sevens and three sixes and etc so you could have if you had nine edges eight of them in theory could be ones and then you run out of ones and you have to use it too so this quantity the sum of all the edges the edge weight the weight of the subgraph it's at least eight times one plus one times two and it's at most eight plus two sevens plus three sixes plus well three fives because you don't get four fives that we've only got nine edges to work with here and if you use up nine values starting from one eight two sevens three sixes etc this is what you get you get three fives three sixes two sevens and an eight that's the most you could possibly just be greedy about so that's a lower bound up it seems like you couldn't get much out of this right i'm really giving a lot away but if you think about it for a minute when the number of edges approaches and choose two you're going to get actually pretty good balance or if the number of edges is near zero you're going to get pretty good balance so actually the bounds are going to meet at both end points which is why there's some hope for this method and where's that two eighty five coming from well that's the sum of all of the number identity dotted with the identity it's just the sum of the squares j squared as j goes from one up to n and that's you know simple formula for that some of the first several squares first n squares it's around n cubed over three because you get n times n plus one times two n plus one all over six so the two in the six give you a third it's about an n cubed over three so that that's where two eighty five is coming all right so if n is the number of vertices that's the length just total length of the permutation m is the number of edges that's the total number of inversions then the starting value here is about n cubed over three the lower bound is going to be n minus one ones the sum of those plus n minus two twos plus n minus three three etc up to n minus t t's well actually that last one you might not get all of them but it's not going to have an impact asymptotically and well what's t well t is whatever gets you to m edges so the idea is that the sum of n minus one and m minus two up to n minus t it should add up to n well again not exactly because the last one you might have to leave off a few of the elements but asymptotically it's not going to matter the sum of this is well it's nt and then something like t squared over two is a little approximate but again i'm just shoving under the rug things that end up not mattering asymptotically so you get nt minus t squared over two and so yeah if you solve that for t it's just quadratic equation you get that t is about n minus the square root of n squared minus two m just to point out number m is the number of edges so that varies between zero and n squared over two at zero you get t is zero in other words you get no terms so of course you don't get any terms because we don't have any edges when m is n squared over two that is you get all of the edges then this quantity is n squared minus n squared which is zero square root of that zero you get all n edges so right so t or all n terms here you get all of the edges so this is going to vary from none of the edges to all of the edges via this quantity t and yeah if you add up what is the sum i'm going to add up to well you get one times m minus one plus two times m minus two plus all up to t times m minus t that just adds up to well nt square over two and then minus t cubed over three approximately again i'm just shoving the things that don't matter asymptotically under the rug it adds up to this weird quantity m if you plug in this quantity for for t the expression for t there you get m times n plus n squared minus two m to the three hours minus n cubed all over three if you think about again if plug in zero you get n squared to the three house so it's n cubed minus n cubed is zero so you just get m times n but m is zero so it's just zero so yeah so you get zero when it's zero if m is n squared over two you get zero minus n cubed over three so negative n cubed over three but then m is n squared over two so you get n cubed over two minus n cubed that's n cubed over six so you can see where the where the bound is coming from and then you apply a similar analysis on that that's the quantity you get for the lower value similar analysis for the lower for the upper bound it's the same deal basically slightly simpler formula again you could check your plug in zero plug in n squared over two you get the right quantities so that gives you when you subtract that from n cubed over three because remember we had to subtract from the starting value you get these two bounds and okay well that doesn't look exactly like what I stated as the main theorem but it's not hard to see that if you transform everything by an affine you know affine transformations the number of inversions times two divided by n choose two subtracted from one that's this Kendall tau the identity dotted with the permutation you have to transform that to put it on the unit sphere but again it's just an affine transformation you perform the appropriate affine transformations that gives you the bounds that I stated in the original the original main theorem and uh you know there's a little bit to check because of course I I ignored all the asymptotically negligible terms but it turns out everything is fine um now like I said this is maybe not tight so why not tight it seems like this this pins down what's going on well one reason that this isn't tight is that we're not taking advantage of the fact that well maybe not every subgraph is possible you have this weighted complete graph and we're looking at all the possible subgrid you can't actually get all the possible subgraphs in factorial which is the total number of bubble sort subgraphs you can get in factorial is a lot smaller than two to the n choose two so you you don't actually get all the graphs there's something some graphs you can't get what can't you get right because if you could take advantage of the fact that not all graphs are possible maybe you can tighten those bounds um but which which graphs can't you get for example here's an ordered graph so the labeled graph back in that labeled complete graph that you can't get notice you can't get i to pass k so suppose i is less than j is less than k you can't get i to pass k in the bubble sort without i either passing through j at some point first or k passing through j right they've you've got to get one of the maybe both but you certainly can't get the ik edge without getting ij or jk at least one of the two so in that in that graph just the labeled graph you can't get um a subgraph uh that has ik you can't get the the one edge subgraph on three vertices that has ik but not ij or jk so there's lots of subgraphs that are not possible and actually if you if you don't like the labeling and i don't like the labeling it turns out that you can't get a c5 actually and induce c5 with anyway so these subgraphs are c5 free and do c5 free subgroups even that might tighten the bounce you just you look at what are the possible values for that upper bound lower bound if we don't allow any subgraphs or you're you're not greedily selecting the edges to be to include but in fact no c5s no induce c5s are allowed then maybe you could do even better the bounce but i haven't really even tried so this is uh uh to improve on those bounds so there's a wide open question here can you tighten those bounds and this is how you might go about that oh thank you all right uh thanks josh if everyone could uh thank josh in some way and uh i guess i'll open the floor for questions um i have a question um so this may have been what you were hinting at but can you go back to that slide with the chart uh with the permutation pi um right oh no uh sorry maybe the one with the yeah this one um so i was thinking uh this may have been what you were hinting at at the end there but with this upper bound here you're just looking at the the nine largest possible number of differences between the vertices but when you go through this bubble sort one at a time like the first time you swap the largest the distance could be as possibly as one right because no matter where you make that swap it's it's going to swap adjacent elements and then like the second time you do it i would say the largest you could possibly swap would be i guess two is that true and so on and so on so like could you could you like using that could you get maybe a different upper bound or i don't know if that would help or or yeah i don't know i haven't thought about it's interesting right there's all this information we're not taking advantage of and which edges can even appear and like you said like at the beginning you only get low weight edges um so i haven't tried to take advantage of it all with the upper bound um and then uh yeah i mean you can only get these high weight edges towards the end when there's been enough mixing of the the entries of the permutation that some large numbers could actually pass by some small numbers at the beginning they're far away from yeah that's exactly the kind of thing that you know i think if you take better advantage of that you could probably improve the balance now do they improve asymptotically i don't know but uh yeah there's certainly something that could be said there um yeah maybe it's more than just saying that there aren't any c5s maybe it's looking closely at you know the fact that these small weight edges occur necessarily at the beginning and you can only get the high weight edges towards the end yeah and only under these special conditions where you know like you've moved for the number one like a lot you know to the right already and yeah so okay cool um see any other questions for josh i should also mention there's um there are other kernels there's a spearman's row kernel is it there's the mallows kernel there's a couple of these like classical kernels that measure the distance between two permutations and some of their relations to each other are known so basically so other folks have shown that you can bound some of them in terms of other ones functions of other ones but uh uh we didn't try to compare the geometric thought product the ordinary product with uh any of these other kernels it it turns out in this context kindle tau was the it's the most natural choice um from the machine learning perspective and so that's why we're harping on kennel that's also very combinatorially natural you see it's just the distance in the one skeleton of the for metahedron um or the the distance in the bruja water the weak bruja water but uh but you might think about spearman or mallows or some of the other ones too we we didn't make any attempt to analyze those so there's there's sort of for any given notion of distance between permutations there's an interesting question here about the relationship between the geometric equator and the equator with respect to your favorite kernel i just want to point that out because there's lots of just completely unexplored questions here low hanging fruit okay so uh let's go ahead and thank josh again and uh yeah thank you uh it was an excellent talk i enjoyed it um and uh i guess i'll just say i know i'll send out my email on monday like i've been doing uh but i'll just say that next week's talk is planned to be in person so uh we can see how that goes it's going to be the first time in a little while um but um yeah i think uh um i think with that we'll go ahead and close up this and do you know which room do you know which room uh it'll be in the coliseum uh 1000 a i believe but it's the large conference room in front of julia's desk so i think we still have a little bit of testing to do in our room but you know we'll see how it goes but um yeah thanks again josh and uh i think with that we'll go ahead and we'll close up the seminar for today