 First talk of the day, we'll talk about worst-case analysis or strength-deserving analysis. So I warned Tim I was going to give, you know, while there'll be some technical content, more of a cultural sort of talk, hopefully in order to be a bit more offensive than the panel, or at least to get people thinking. All right, so sort of carrying on from the panel yesterday. I mean, it sort of goes without saying that although we're trying to look at beyond worst-case analysis here, worst-case analysis is clearly our friend in theoretical computer science. It's sort of arguably the defining characteristic of the field. And it's clearly dominant for a multitude of reasons that people have expressed here. You know, it's suitability for mathematical analysis. It really is often the right way to look at the problem in many practical situations. And I would also say, although this maybe hasn't come up, that I think, particularly in the beginnings of the field, it sort of culturally differentiated us from others, which I think was probably important in sort of getting theoretical computer science to take off or be respected as an independent entity. But in some ways, its success is also limiting tasks. It sort of focuses us in a way of looking at problems that may not always be the right way to look at the problems. And that's really, again, clearly what the whole workshop has been about. And the other way it sort of limits us is that when we think about, well, you know, we're not getting the sort of answers we want. How can we expand ourselves out? We really tend to look for things which I would call just variations of worst-case. It's sort of like, well, worst-case, and let's add a knob, right? So competitive ratio and now instance optimality, resource augmentation, even FPT type stuff. I sort of view under that lens that there's still sort of worst-case analyses with something additional added. So to sort of set off a way of thinking or controversy or something to get us thinking about this. So a couple years ago, Jennifer Rexford wrote this, I think, really interesting article, actually, about my 10 favorite practical theory papers. So Jennifer is a very well-known, highly respected person in networking at Princeton University. Certainly not unfamiliar with theory, and I think many of us are not unfamiliar with her. She gave one of the tutorials at Stock or Fox or one of those a few years ago. So certainly someone that we could go back and forth. And I'm not going to go over all these 10 papers, although I do really encourage you to download and look at the article. But just to, you know, we'll look at them. Her abstract, you know, she sort of says it's like, look, you know, theory is great. You know, we've got this networking research. Sometimes you can just really find this right way of looking at it through a theoretical lens that really just, you know, bang, it really gives you power to solve the problem. You know, that sometimes there are people in practice, sometimes there are people in theory, but sometimes these things really come together and it's amazing. And, you know, I certainly believe that and think that's true. So hopefully now everyone's read this because I don't want to read it to you. So here's the first five and I'll go through the next five. And, you know, again, I'm not going to sort of outline what each of the papers is about. I encourage you to look at her paper yourself where she sort of summarizes these things. But one thing that sort of, I think, stands out and glares at you in the audience is that many of you probably don't recognize most or all of these names. And they're certainly not what you think of as being venues for theory type publications, right, or even theoretical type publications necessarily. Right, and so, you know, on the next slide, you know, maybe you see a few more names that you might recognize here, not just me but, you know, some of the others there. But again, it sort of lacks on what we would typically think of as a community as sort of being our pride and joy for theory, okay? And, you know, to me, like this sort of says something. And so what are the lessons that I think we can try and take out of, you know, someone who's a top well-known computer scientist, you know, that, you know, so what are the lessons we can take out of this? First off, there's sort of the obvious one that I am shameless, you know, everyone knows this, I blog. So, you know, it sort of stands out right there. Of course, the other thing you might take out of this is, you know, Ruxford clearly has an impeccable taste. But, you know, maybe the sort of the other points is that, you know, something, there's some sort of disconnect here that we, in theory, are missing something, right? And I want to be clear that I'm not saying I agree with Jennifer's choices or her point of view, right? I mean, I think this is something one could clearly argue about and say, well, didn't you forget about this, this and this? But to me, it's sort of disappointing reading this and not seeing that someone who's respecting the field that this is their view of what, you know, theory is and it's not what we're doing for the large part. And so I view that as a cultural problem, right? And we can argue about whether it's our cultural problem or their cultural problem, but it's a cultural problem that should be thought about and addressed. And one of the things, if you do go through this list of papers and sort of look at them, is that they really aren't as a whole focused on worst case analysis, right? What they're focused on is coming up with a good workable solution to a good workable problem. And maybe that's, you know, I wouldn't want to say that that's the characterizing feature of these papers, but it certainly seems to be a theme throughout these papers, which suggests again that this workshop is a good idea that we really are maybe missing something and this is something that we need to look at. So Tim sort of asked yesterday in the panel, like where things were theory could be or where things that we could be doing. I didn't want to put this because I had this slide already ready, but sort of a different way of thinking is things were theory, things that seem to be growing and important and are major impetuses in other areas where I think theory has sort of been behind the curve where essentially our job has been it's like, oh, they're doing some interesting stuff over there, maybe we should look at it and maybe we have something to say after it's already sort of taken off and grown big in some other area. And these are some of the things where I think that comes in. And again these are things that from other fields point of view are theory, whether they're our way of thinking about things in theory or their way of thinking about things in theory, it really is theory. A lot of these things come from maybe the double E side of theory, but I think they're all things where people in theory have, CS theory have worked, but we really haven't been in the lead and I don't know why. It's sort of sad and frustrating, but honestly I think a big reason is because these sorts of work tend to focus more on statistical analysis on a sort of randomized analysis that they don't start with a worst case analysis view and because of that we come late to the party in terms of what we can offer and that's not the way I think of theory and not the way I want CS theory to be. So two points I'm going to try and make in this talk. The first one I think is more controversial. The second one probably less so and it's really sort of the theme of the workshop is that part of the culture on our end is that I think that we don't always promote an environment where this sort of work. We're working on real world problems and addressing real world issues is really appreciated, particularly when it comes at the expense of a more formal analysis. At the same time, formal analysis, as people express on the panel, is what we do. I'm not saying that we should all give up on proving theorems and doing a formal analysis. We should certainly be at the same time expanding our tools, but we can I think do these things come at it at both directions and improve way CS theory works or functions or a relationship to the rest of CS and the sciences in these ways. So in the rest of the talk I'll actually give some more semi-technical context for this. Here are two examples. Some work I've done on heuristics and some more understanding of what we can look at in terms of random input type analysis. All right, so the two things that I'll talk about are again some of my work on heuristics and some of the works that I did was a little bit on sitting in the back about hashing an entropy and how we can use that to analyze systems. All right, so we talked. We had a fun discussion yesterday on the panel about heuristics. I like Dan's idea. We should stop calling out the heuristics. We should just call them algorithms we don't have proofs for yet. And I think heuristics is in a particular area where I think Dan's right when he said yesterday we sort of view heuristics we sort of hold up our nose a little bit and I did some work on heuristics and I think I entered the project that I got involved in with that sort of attitude. It's like, okay I'll take a look at this maybe there's something funny there but you guys are working on heuristics that's not real stuff. But after working on it a bit I left with a lot more respect that in particular even if they can't always prove things that there's significant thought that goes into it and specifically what I call theoretical thought that you really do have to understand a problem in a very deep way to come up with good heuristics for it. And in particular what I think that means is that heuristics as a group because it really is just algorithms that could benefit from more CS theorists looking at it because we have approaches and insight that I think could really help drive the field forward even if we can't always prove everything we'd want to do about it. So the project I was involved in had to do with human guided search and so I started on something called human guided taboo search for those who know about taboo search it was a fun but it ended up a lot of fun things ended up coming out of it a series of papers on all sorts of different things and that's the other fun about working on these sorts of problems is I do think it gives you insight into other sorts of problems and other sorts of things. So the taboo search taboo search is a basic heuristic method where it's essentially you're doing some sort of local search procedure but it's local search with memory so somehow you remember the solutions you've seen before and you try and push yourself into new directions, new areas of the search space so you get some inkling you're trying to look for solutions that aren't like the ones you've seen before the hope is that you'll target some new area of the search space and come up with other solutions different than the ones you've seen before that are perhaps more likely to get you to a new and better solution so it's a very general paradigm we applied it to lots of different applications some of the cute pictures here but in particular another fun aspect of the project that I think is again something else that we don't look at in algorithms is this was called human guided taboo search so the idea was that humans could interact with the search process they could stop it, they could say oh this part of the solution looks nice I'm going to freeze that part of the solution only work on the rest of the pieces if you think of it like many problems like a jigsaw puzzle you could say this part of the jigsaw puzzle is right, focus on the other pieces and there are various other interactions that you could use to sort of guide the human guided search you could say oh go back to this solution that you saw before that looked promising and you didn't explore it enough go from there and one of the things we found is that human guided taboo search that you could really get better answers faster than you could without the human you know at the cost of human time of course one of the fun things that came out is just to highlight what I mean by sort of theoretical thinking this isn't really again anything having to do with a theorem but just in terms of design one of the things we came up with which turned out to be useful from both the taboo framework in terms of how you could restrict what moves the taboo search could do to try and push it into new spaces and at the same time be useful in terms of the human interaction the human interface is what we call the stop light framework so those are supposed to be like red green yellow and here the idea is you're looking at different orderings and I'm just doing a toy example where you have the line so the top is sort of the setting that you might have of your current state and the three being red means you're not allowed to move it red light means stop the ones that are green you can move any sort of way that you'd like because green light means go and the ones that are yellow the 5 or 6 you can move them but only if you're moving them in conjunction with moving a green one they don't move on their own you can't move yellows you have to move yellow and green so here the sort of things that you can do or you can do swaps of any pair swaps of any adjacent pair so what that means is that the 3-0-a stays in place essentially you've partitioned the problem into two smaller problems and the 5 and 6 have to stay in the same relative order because they're not allowed to swap with each other ever they can only swap with things that are green so this was just something that was led to a fun interface that was both useful algorithmically but allowed human interaction which is again something I'll try and drive as a point later also in the talk not something we in theory usually think about we usually think about algorithms we don't think about how people are going to use them or implement them which is something else that I think we could do a better job on so another aspect that came out of the work was a sort of a greedy algorithm variant and so if there's one thing I'm going to try and get you to come out of it with this talk is that there's a very clean basic heuristic that I really like because it's my version of it but that I think should be taught in every undergrad algorithms class so for the next few slides I'll try and tell you what it is and convince you that it's so easy that you can spend like 20 minutes on some lecture in your algorithms class and it will be a help to your students so greedy algorithms are great when you can actually find greedy algorithms that work in computer science we all know greedy algorithms generally we can think of as having two steps one is that you order the elements you order the things according to some natural ordering and then you do what I'll just call placing the elements sequentially so the idea there is you just say well I've got a partial solution I've placed this prefix of my ordering and that tells me now I can figure out how to place the next element and you can also think of dynamic versions of this where the ordering itself may change according to the placements but at each step you have an ordering all sorts of algorithms fit this paradigm in min packing you can think of best fit decreasing first fit decreasing the sort of standard greedy set cover vertex cover algorithms fit into this paradigm you can do job shop scheduling pretty much when you think of a greedy algorithm it's fitting into this basic paradigm okay alright so in theory we know we teach when I teach students algorithms I start out by showing them greedy algorithms because they're easy you can prove things about them nicely they're easy to understand and unfortunately there aren't too many of them that are really optimal the ones that are really cool minimum spanning tree things like that but they aren't always optimal on the other hand greedy algorithms are a common and well used and typical heuristic and so the theorist viewpoint it's like well you know what can I prove about it in worst case analysis form so I think okay well we'll do the competitive ratio you know how far from optimal can greedy be and sort of culturally you know something well I've been on the record somewhat objecting to in the past so it's not going to be a surprise a problem I see is that competitive ratio becomes the new metric you know you start thinking in the theory terms of well how can I improve the competitive ratio how can I get the 3 down to a 2 or log squared n down to a log n and you know we can argue about whether for specific problems whether that's interesting or uninteresting but in many cases I don't think that's the right abstraction to what ostensibly we're aiming to do which is to find a better solution to the problem not to actually find a better competitive ratio we're using competitive ratio as a proxy because it's not optimal but improving the competitive ratio does not necessarily get us to improving you know what the actual problem is people want to solve okay so here's a different way of thinking about greedy which is that to me the problem with doing a greedy algorithm is that you run it once and you're done right you say here's my greedy algorithm I ran it here's my solution there it is you know what do I do now well you're done that's it there's nothing you can do okay and you know naturally you might think it's like well you know the greedy algorithm took you know 0.01 seconds it's very fast even for big problems you know I happen to have maybe a full 2 or 3 seconds on hand or minutes or hours even you know what should I do next okay and a very simple thing that you can do is to consider randomized greedy algorithms okay so instead of just thinking of the greedy order right you should think about other orders now typically if your greedy design had a good design like there was a reason you sorted them in that order that means that you don't want to just do an arbitrary order you don't want to place the elements according to some arbitrary order but maybe you want to try something that's close to the greedy order right so sort of an older standard something that's been around for a while and a simple version you can think of as top K choose the next element just place uniformly from the top K according to the sorted order and there are variants of this and we were actually looking about this and sort of came up with our own idea independently you know so it's found some of the other work like top K but my version which is I think hasn't caught on because it has such a bad name people told me it's a bad name so I should have just come up with a better name we call it a bubble search so the intuitive way to think about bubble search is you can just say look I want something close to the greedy ordering but I want to perturb it in some sort of natural way so I'll flip a coin so I'll pick the top one with probability a half and if it's heads I'll take it if it's tails I'll go on to the next one flip a coin if it's heads I'll take it tails I'll take the next one and so on and you can implement this easily in different ways you don't have to actually flip the coin so you find the element to place you place it and then you still have your sorted list and you start over again back at the top and one thing that's not too hard to see the reason we call this bubble search is if you sort of pick your coins so that it's heads with probability P the probability of choosing the original ordering is well it's 1-P raised to this distance or it's proportional to something that's 1-P raised to D where D is the Kendall-Tau distance it's the bubble search distance between the original ordering and the ordering you end up with so it's a bit more general I think than the top K you can get any ordering some of the properties of bubble search is that you can get any possible ordering at least in theory you're not limited like you are in the top K orderings both of the algorithms are fall into what are called any time algorithms you can just keep running it until you feel like stopping and take the best solution you've seen so far just keeps going until you no longer want to run but the key is that it's really trying to follow the intuition behind the greedy ordering but in a much more flexible sort of way it's a bit of all the time that you have available to you and there are other variants in things to discuss one thing that tends to be useful is instead of in fixed priority algorithms where you don't change the ordering dynamically something cool you can do is when you find a better ordering you can make that your base ordering and keep going from there that seems to have better performance as you might imagine what sense is it more robust than top K because you can actually so you can change the parameter in a continuous fashion as opposed to top K you're sort of well you could take a discrete it's more robust in the sense that you really can get deeper if you're doing top three and it turns out that your best thing to do is to take the fifth one first you're sort of screwed right but in ours you might get that a much smaller probability of the time but eventually you're going to get there and you're going to take the fifth one first okay so I have to say although I've used it for things I certainly don't think I've been successful in promoting this as the thing everyone should try on the other hand for the people who have gotten to try it they always seem to be surprisingly they're just surprised that it's like oh this kind of works found things a few percentage better than our greedy solution which was otherwise sort of hard to beat quickly in a few seconds now the reason I acclaim that this is the heuristic that you'd want to teach in your algorithms class as opposed to some of the other sorts of heuristics is that this is just brain dead simple I mean there's not much more to the explanation than what I just presented here you can spend a few minutes thinking about how you might code it up efficiently but it's not hard it's very general anywhere where you have a greedy algorithm heuristic you can plug this in and in fact when we implemented it we sort of implemented it as a wrapper we just sort of said create your optimization functions your placement and sorting functions and just hook them into this code and it will do it for you so all you have to write is give the sorting routine and the placement routine and you could just plug into our code and it would do the rest for you so it's just an additional layer that you can implement on top of greedy and in that sense it's also simpler than other heuristics which are certainly more powerful I wouldn't want to say that there are certainly more powerful techniques when you go to heuristics taboo search and others but it's something that's very general very easy problems if you just want to say oh I just want to see if I can get something a little better quickly this would be the way to go and this is sort of the aside also why I think it's a useful thing to put into something like an undergrad algorithms course right is that and this came out also in our panel yesterday theorists think in terms of computational resources right primarily like what's the running time what's the computational complexity we also think about space right you know I also like to teach the way I like to phrase in my undergraduate classes that correctness is also a resource right you know I can get you an answer that's really I can get you an answer really quickly but it might be really wrong right on the other hand if you want an answer that's really correct it might take longer right and they sort of bulk at the idea but I think that's a great fundamental idea to teach students in an undergraduate algorithms class it sort of stretches their mind in an interesting way but the one thing that I don't think you know we think about and that the other side the system side hasn't maybe expressed to us in a clean way is that one of the most important resources is programmer time right we tend to think of coming up with the best algorithm the computer time computer time is cheap these days right you know the programmer time can be what's expensive and you know so one of the things we should think about is things that are easy right and one of the things I liked about bubble search was that it was easy okay so this is the paper if you want to ever look for it online okay so what about the theory of bubble search and sadly the truth is I just don't have one right I don't know how to try and think about this idea you know that if you have a greedy solution and you perturb it what can you say about it I mean you can imagine various theoretical tax maybe the competitive ratio you could say something about that for certain algorithms thinking in smooth analysis type terms you could say well the greedy solution is bad in terms of competitive ratio but when you look in a neighborhood around the greedy solution according to perturbed sorted orderings the competitive ratio has to go down because there has to be a better solution nearby right I would love to find a problem or a general class of problems that I could say something like that was true but I don't know how I think it's an interesting thing to try and think about yeah so if you look at the slide that you had about very powerful theory of bubble search really propagation of the association I don't know all of them well the ones that I do know were proposed by someone who was already famous and so the thing in theory is if you look at the theory best of all you will find lots of them but people who are unknown and suppose instead of you someone who was a first year assistant professor with less name than commission wrote the same paper on heuristics would they be able to get the same attention on heuristics on heuristics if you come up when you can do a theorem which is very concrete it doesn't really matter if I can come up for example if I can improve sign entries from 1.c9 to 1.1 it's going to get published in 12 parts but if I were to come up with an alternate propagation and I'm not very known would it if it does a lot better so I would like in it more so Dan will forgive me for this example so when we published our coding theory stuff we got a lot of push back from the coding theory community they were like this is I went and gave a talk and a very famous coding theorist told in the middle of the talk while I was up in front of people told me that the work was total crap that they had done this like the Reed Solomon code solved this problem 30 years ago that I had no idea what the performance of Reed Solomon codes were I was clearly mistaken so I wasn't famous in that community you have to justify that your solution is good you have to show that it works on a lot of problems so the human guided taboo search was published in AAAI I don't normally publish in AAAI I think my co-authors did and sort of knew the ropes and knew what they were looking for but the way we showed that this was an interesting idea was we did a lot of experiments and arguably that may be one of the things that you have to do to so what's the killer experiment for bubble search what do you show so I actually just sort of give them the paper and sort of let them look at the things that we did so something fun that someone at Harvard did online where they were packing music into CDs so this is just a bin packing type question and they found you can do it greedily and you get really really well you get 98% full and you throw bubble search at it and you get like 99.5% full after a few very small amount of work so you can argue about sort of an argument like if your greedy solution is already doing good do you really care about the extra 98% to 99.5% I think the answer is yeah for a lot of problems you really do so Norman Ramsey did that and it's he has some really fun Haskell code but I can't even interpret alright so by the way I don't have a theory but I'd be remiss in saying there's this great line of theory we haven't talked about here about priority algorithms this framework set up by Alan Borden and many people that he's worked with that tries to look at greedy algorithms your priority algorithms so he calls them what you can and can't do in this framework and in some sense you can think of this theory question as being exactly in that framework so you have randomized priority algorithms in this class can you say anything about what they can or can't do okay so just to close off this part of the talk you know there's strangely the other thing I found out is like there's this whole field of heuristics we never see them or talk to them because they don't publish theorems that get into Fox Talk but they do work in trying to actually come up with solutions for problems that they found interesting or people care about and it really is hard to prove rigorous results I mean people have tried there's certainly been work on trying to prove things about heuristics the one that always comes to my mind because it was while I was in grad school it was the go with the winners paper by Umesh Vazirani and who was it what? David Aldous yeah you know so it can be hard but I think it's also possible I don't think it's by any means impossible it's something that we can try and do even if we can't prove things I certainly think that there's plenty of ways that we could give guidance to the area both by coming up with new heuristic methods both by coming up with ways of telling them it's like look this really isn't a good direction or good approach because we can show that anything you can do in this model you can do at least as well or equally well or some notion of as well in this other model so trying to judge, provide, understand compare different methods or again you know trying to come up with explanations of why things work why these methods work often better than our solutions even if that means throwing in some you know strong or bizarre sounding assumptions just to get some insight into what's going on with these things okay so so the second part of the talk I'm going to talk about some of the work again that I did in looking at the issue of randomness you know and this is something I brought up in the panel yesterday right I think sort of starts with worst case analysis and then maybe afterwards as in after thoughts as well okay we can look at these random cases they're probably interesting too which is you know sort of funny just to me because again many other fields seem to sort of go the other way you know coding theory and queuing theory sort of being the obvious ones but others as well right and you know this is sort of basic stuff you know but we've been talking about many settings aren't modeled well by worst case but they're also not modeled well by sort of the standard uniformly at random sort of case right the world is not you know GNP random graphs you know the world of power law graphs sort of opened our eyes to oh yeah they're different models we could be looking at so clearly in the subject of the panel we need better ideas of what constitute good models or good random or semi random models and it's certainly direction CS theory has been moving it's not something that we're lacking in or lacking knowledge in I think we've been moving well in this direction so I'm just going to talk about some work I did with that effect paper from a few years ago on why simple hash functions work and what I mean by simple it's usually just pairwise independent or KY's independent for small K sort of family of hash functions although our results can be thought of in a more general framework that's the natural way to place them and what do I mean by work and what I mean by work is that you know this was something that came up for me in grad school when I was doing the whole power of two choices thing that some people know about hashing and load balancing like I would program up some code just to run some experiments and see how things work and I would just use whatever dumb hash function was lying around and it always always always well except maybe once in a while we can talk about that we would behave just like the random analysis right and at some point you know someone smartly asked me well why right why should it do that I mean it's if you've got actual data and you're throwing some hash function at it there's no reason why it should behave like a random hash function and you know this was something that's been pretty much you know unexplained for a while so for me I was really motivated by trying to understand why my experiments always ended up this way right so again just if you haven't seen this universal hash families or KY's independence just means that when you take you know any collection of K things the probability that when you hash them you get corresponding values you know it looks like they're independent up to K items and there's a slightly weaker notion of universal you know but really I think if you for the purpose of the rest of the talk you can just think in terms of KY's independence so again this isn't a new question I noticed that back when I was in grad school doing experiments for the power of two choices I found earlier stuff on bloom filters so there's this paper I'll show in a second in the 1980s we just said oh yeah I used pairwise independent hash functions and they gave me the exact answer I was expecting for bloom filters but in all these cases the analysis of it was depending on the fact that we were having random hash functions because otherwise it's much harder to say useful things to do the analysis without that assumption of randomness oops sorry so I guess I should say there with some notably rare exceptions you know so for instance Siegel's work on hash functions showed how to get sort of big enough independence using sort of what I would call a very non-trivial class of hash functions right so this was the paper I was talking about from 1989 practical performance of bloom filters everyone knows well for most people bloom filters my favorite data structure this is to sort of quote you know the results and details of the experiment were from hash functions transformations chosen at random from the class H1 which is just a it's a universal two class of hash functions sort of the basic pairwise independent style hashing functions he's just theoretical performance under a random function yes yeah he's saying that you know I use these hash functions and it exactly matches my theory which by the way was which was for random hash functions okay so in fact if you go to sort of a worst case analysis you know simple hash functions just don't work there's been a bunch of work of this sort recently a lot of really good work the recent work on linear programming by I forget who the R is I should have checked all these things before I gave the talk that show that for pairwise independent hash families you can prove that linear probing hashing performance is actually worse than random so as part of this paper we showed it wasn't a difficult thing but we were able to show that for any Kwise independent hash families you can come up with an input set where it's provably worse than the random analysis there are other problems you could look at but the whole point is that worst case doesn't match what you actually see when you run these things and so the question is you know why is that so this is again sort of a data modeling issue right so what you'd like to do is to say well maybe I should just model the data as being random that sort of solves all my problems if the data is random then the hash values are random and you're done but that's not very compelling it doesn't seem like a particularly good model for data when you ask real people you know is your data random they tend to say no and so we're looking for some sort of model between the worst case sin and the sort of average case of random data and the model we came up with is based on certainly lots of previous models about semi-random sources the idea here is you can think of the items we're going to hash into a hash table or hash into a bloom filter or you know hash into whatever you're hashing into as being a stream of random variables over some space and the idea is just that each element still has some entropy some randomness associated with it even conditioned on all the previous values of the previous elements so the idea is that each element is not perfectly random but even after you've seen whatever you've seen so far there's some randomness left you know you can't predict particularly well what the next value will be you may be able to have some prediction but not not complete prediction and so the intuition behind our result was that well if each element has entropy then if you could extract that entropy and if there was enough entropy left in that object in particular enough that you could get the right number of bits to hash uniformly into a random cell then that would be a way of thinking of what's going on right you say okay each new element has some randomness associated with it I'll somehow extract that randomness and I'll use that to tell me which location to put it in in the hash table so if I could extract then I should get near uniform behavior okay so you know so now we need to look at what we exactly we mean by entropy and so Tim doesn't have to cut me off and because you get more out of actually looking at the paper anyway we won't go into too many details here the standard notion of entropy there's collision probabilities of entropy there's min entropy all these entropies are within a factor of two of each other so it's not such a big deal we end up using collision entropy which makes sense because we're thinking of collisions in the hash tables so it turned out to be the right notion of entropy and really if you boil down our paper to one idea it was just saying well look at what the leftover hash lemma is telling you right it's telling you or my interpretation the leftover hash lemma tells you so if you have a hash function chosen from a pairwise independent family hash function family and you have a random variable with small collision probability that is you know suitable amount of entropy then when you hash it according to this randomly chosen pairwise independent hash function you get something close to uniform right and in some sense this is exactly what we want right we want that when we hash an item according to a pairwise independent family it looks close to uniform and the leftover hash lemma sort of tells us yeah yeah we know that um so ah some of the math mixed up a little not such a big deal so you know sort of the we have some refinements are looking at that in our particular setting you know sort of the big ideas that we're not looking at one item we're sort of looking at a chain of items we're looking at a sequence of items that we're hashing you know each one having some entropy conditioned on on the previous elements and so what we need is sort of a block form of this which says that given a block source you know hashing a bunch of items from that source um the totality if we look at you know all the hashes of all the elements it's some function close to uniform you know when we choose according to that method okay so really this is the right way of saying it is that if our random source has entropy if each of our random variables from the source has at least some amount of appropriate entropy then we get something that's close to uniform um and somehow you know that's really you know the main idea of the paper boiled down to one thing um but of course we had to do more in the paper because that would have been a short paper and you know we're not leaven I can't write those one page papers um but you know we did we did a bunch more in terms of improving the bounds in terms of showing what settings it worked in and so on um but really that's sort of the nice main idea which is that if you have a weak hash function and your data is sufficiently random for some reasonable theoretical definition of sufficiently random then you get something that really does behave like a random hash function um and that explains why these things always seem to work um and you know the joy of it this is also very general it works you know wherever you're going to be using hashing so bloom filters, priority choices linear probing, cuckoo hashing you know pretty much any place you can use hashing you can use this as sort of a black box result um so Salil and his student improved the bounds in a later work um so after our work Martin Dietz-Vilbinger and Shellback you know came out with this also very interesting paper which is a big warning sort of saying yeah yeah look you know that's all well and good but weak hash functions also just break in some very basic cases um so they were looking at cuckoo hashing and what they found is that even with completely random data you know so you had some sort of universe of keys and you were uniform over that uniform completely uniform over that universe of keys you could get cuckoo hashing to fail using pairwise independent hash functions and it almost sounds like it contradicts our result but when you sort of look at the numbers they're the way they were picking their set um uh their key set it just wasn't going to have enough entropy right so you know in some sense what it's I think of this as a nice warning it sort of says yeah if your data has enough randomness this idea will work but even when it might look like your data has enough randomness it might not right you know you have to be careful about what that means that your data set has enough entropy um there are sort of a number of random bits in the data like the fraction of the bits yeah that would definitely be true um you know other people have certainly looked at entropy for different other different means in the ways of algorithms um so Ali Alpal and his student and his well ex-student uh Gopal Panduranga and we're looking at entropy based bounds for online algorithms so you know we're not the only ones to have this idea at all that you know may be a way to sort of stretch out this idea of randomized analysis as to look at notions of entropy and how they play into it um again it's a sort of interesting potential way of thinking of things for future work um and I'd be remiss if I didn't uh mention the really cool paper that came out recently even though it's a worst case paper um by uh uh Mahypatrascu and Mikul Thorup on tabular hashing you know so if you're interested in hashing for any reason you haven't seen this paper um it's about worst case hashing but it gives an entirely new view on on tabular hashing and uh explains why it can work so well so I recommend it so certainly is it open questions you know are these ideas more generally useful this notion of entropy you know so you know one thing I feel uh bad about in terms of this work is that you know because it depends on this notion of the leftover hash lemma it feels really particular to hashing right so people can this give you insight into other problems um and you know maybe at the high level that entropy is a good way to view things and maybe that's the sort of randomness measure we should be using and thinking about in terms of our inputs um but not directly or not that I can see in the sense that you know it really is tied to hashing problems and I wish I had some way of putting this in a broader context um but I don't yet um but certainly uh it opens up the question of what are the sort of weaker notions of random data that are suitable maybe there are other notions besides entropy um that could be useful in the analysis of algorithms and at the same time be realistic in terms of what they model in the real world um so I don't know what my time is but I'll make Tim happy by finishing up um um so sort of the you know I think the more debatable or more controversial point I want to make uh is that you know sort of I think the way others view it and maybe our own self perception as well is that theoretical computer science is about proofs right this is what you know this is what defines or makes our field um and and I wonder sometimes if we can loosen that a bit right I mean algorithms is not just about proofs uh that's certainly a key component is about solving problems um and that means sometimes maybe we don't have a proof but we can still have a good way to solve a problem and I think the theory community sort of says well if you can't prove it that's not theory and I'm not clear why that should be the case um you know there's certainly uh I think design is sort of the key buzzword I'm hearing the academic community I don't know if you guys hear that too but uh on the engineering side design is all the rage um and I don't really know what design means um except that good design is you know when you see it um and I think that theory can offer a lot in terms of helping people come up with good designs because uh our approaches our way of thinking our our proof based mentality helps us understand things in a very deep way um and even if that means we can't prove things it often means that we understand things so well that we can you know explain to people or lead people to better designs than they would probably come up with themselves um and now the again sort of uh less controversial sort of conclusion is you know I'm on the side as it was coming up in the panel yesterday is that you know we need just lots more uh models and techniques I mean I think we do have a lot of models I think we have a lot of interesting people talked about different sorts of approaches here um I do think a lot of our different models are a worst case models with add-on uh with add-ons um you know with some some clear notable exceptions um you know I personally like things based on randomness I don't know why uh just you know uh guess what my training has led me to um and certainly you know theoretical computer science has not been averse to introducing notions of randomness and in using them um but I think we're we're far from done I mean I think if the explosion of power law stuff showed us you know the last decade it was like we were missing something that you know in some sense was right in front of our faces we had been looking a bit more carefully um I do think that we need these models to be quite somehow tied to reality and I think that what that means in terms of you know how do we how do we know if we're coming up with a good model that's tied to reality I think you know some fraction of us not everyone but but some of us actually have to get down in the trenches and solve real problems uh or work with people that solve real problems and talk to them um and otherwise my concern is that theory will be viewed as you know as the followers not the leaders right that you know need to clean up the messes that other people make as opposed to leading the field ourselves and that's that's not a position I think we should be in um and because of that I mean I have a very expansive view of theory you know I think we should be you know bolder more aggressive uh again there there were I think others that agreed with me uh or set made this point on the panel um that we should uh allow ourselves to make more assumptions and take more risks in trying to come with explanations for why algorithms work well on most inputs um you know uh while at the same time understanding that worst case analysis will still it may help us in this regard uh and certainly show us why it doesn't work well on all inputs and that was it it's easy to come up with apparently different algorithms, it's hard to make comparisons and therefore it's hard to be equivalent so you are on a certain level uh that community's best practice is really expensive experimental evaluation carefully looking at the front end of these conditions performing statistical analysis to make some comparisons one can also imagine sometimes theoretical analysis in fact I believe it's out there in those community to do some jobs and last kind of wondering after your talk what was that analysis do you really propose in your community to do it to back up in other words so I really think it is part of the problem in the context I think part of it should be in more connections with that community and that we actually should do some of those experiments uh whether it's us specifically as individuals or we find ways to farm it out to people who can do it better uh I think would be good but that's something that I don't think I'm the first to suggest that theory and senses have been lacking in the experimental methodologies for analysis of algorithms so we would point to that um it's a weak point in the sense of our culture it's also in some sense a harder challenging problem that I think this workshop has addressed like how do you know what a good benchmark is you know where should that come from you know how do you do evaluation that's fair we discussed you had code from different sources from different groups working on it you know can you call that a fair comparison you know at some point you just have to bite the bullet and say well there's this code, there's this code we ran it against each other, what else could we do right but uh I don't think that's something that we have systematized on the other hand I don't think that should be an impediment to good ideas and through I think if you have a good novel idea if you have a good promising direction even though if your experimental evidence is limited other people will hopefully see the good idea but I agree with you that if you have none you're sort of you're facing the fact that it will be hard for people to anyone seriously what do you think I want to strike a couple of additional interpretations of some of the things you're saying so first in the record list I think that actually stood out to me the most was that papers by people I knew were actually surveyed papers and maybe one takeaway point from that list is that one way we need to have more impact outside is to be doing more surveys doing more value on that part and the second thing is another takeaway point was it's clearly a good thing it's a good thing if there are theorists and more theorists doing some bit of work on real problems and doing some of the other work outside of this and conversely it's a good thing that people outside of theory are doing some math and doing a bit of theory but maybe in terms of what the work is the partitions of venues maybe has a lot to do with and for a good reason with how one evaluates quality of work and standards for work is the most important that maybe it's difficult to have single venues that can do a good job in evaluating many different types of contributions so maybe because of the same key to doing multiple kinds of work maybe we do need different venues for different kinds of work so for the first point I think Jennifer said she was pointing to surveys just because they were nice placeholders for the two-degree of mentioning I think the first one is really I think about bloom filters like a really great solution and she pointed to the surveys probably could have just pointed to them the original bloom paper from 1970 might have been in some sense more accurate just what our survey does was point out that these weren't like individualized instances there was sort of a nice cohesion to them in terms of how bloom filters could be used for network setting and whether the two choices you know she sort of pointed to the survey but she pointed to some other papers of mine that were actual results I think again it was more the not that the survey itself was what the sort of concept of the bloom filters and how our two choices are our sort of useful paradigms that they can be helpful so for the second I agree with you that that's a problem or a concern so how do we you know communities develop certain expertise in certain ways of thinking on the other hand sitcom puts me on its PC we don't put Jennifer on our PC we could open things up a little bit so the sitcom people they know that theory is important and they'll shove me a bunch of papers that I can't understand either but you know this isn't here the what you wrote the papers you want this figured figure this out I'm not fear that we go out of our way to say let's get some experiments and people don't understand experiments on our committee so we can be sure that when the papers come in that have actual graph charts and performance because they actually can sometimes implement their algorithms that we can have someone who could you know offer them good feedback what what so to get so that was so to get so to get so to get so to be so to get so to get so to get so to get so to get so to get to the so with the so sorry so that after this It's possible somehow to run a test that says, it seems to be one of the things that creates the optimal. I don't know what the optimal is, but I don't mean it's a spoiler, or whether it's related to tests saying, oh, for goodness sake, I know that I have optimal, we really need to be under some sort of easy test. But yes or no, it doesn't really concern you to go, you're still the judge, you're one of the, you know, you must have seen it. Yeah, no, I don't know. I mean, so there are, you know, hard ways of doing that, but I know quite a few people often, yeah, in general, they'll take characteristics and say, well, we can find some lower bound, find a good lower bound, do you think you're all going to lose a good upper bound? I mean, if you're wrong, hold on, I'll take you to the bottom. Exactly, that's the standard news, but I'd like to help Carl be sort of standarded. It's a good example of that, but yeah, more generally, right, find a linear program or whatever, a certain program that can keep the lower bound and then you can say something that's something that's commonly done in the experimental evaluation of algorithms, right, or are you saying the right thing to do is to say you cannot compare with, or I beat this other algorithm, but also compare to, we'll hear the bottom lower bound, we can talk about whether it gives a lower bound that we know from this technique. In some cases, it's already not that we've been as far away from the world, it's correct when we talk to the machine, we talk, in some, though, there's no answer to it. We all already have studies that we've already done, we talk to each of you, or you're not, that's what I'm saying, Yeah, so we really have classes of that, what do we think of them that way? Yeah, so I think that's useful. I don't know about a general easy methodology that would teach you that for the algorithm. I just want to add that I think we're both on the lower of the field, because there's a lot of, right, there's a lot of, I don't think there's a lot of, there's a lot of, there's a lot of, there's a lot of, there's a lot of, there's a lot of, there's a lot of, there's a lot of, there's a lot of good Brain comfortably with you, because you all are interested in what, what, and what can we learn fromervation here, but you're interested in, and if you're interested in doing simulations and reading research with natural equations, then we can get to look at your,