 היי, אני רורש טמר וזה חיים קפלן, אישהיינג'ו וקוביני סים על ספרייטי סטרימי מודל האבביבי סטרימי מודל. אז קצת, אני רוצה להתחיל על קצת סטרימי מודל האבביבי סטרימי לדפטי או האדברסריה סטרימי מודל שאנחנו תדעים על זה. כשבאים, אנחנו תחלטים סטרימי איזה שהסטרימי הולדים לדפטי סטרימי מודל האבביבי סטרימי מודל האבביבי סטרימי For example, maybe we want our streaming algorithm algorithm to track or estimate the current number of distinct elements that we have seen in the stream at any given point throughout the stream. Okay, so let's start with the oblivious or classical streaming model. So we have a streaming algorithm and we have an input stream. We don't know this entire input stream of course, but we assume that it is fixed in advanced, again unknown to the algorithm. Now on every time step our algorithm gets the next element or update from the stream and the algorithm needs to respond with its current estimation for the value of the function that it is tracking. Then the algorithm gets the next update in the stream, responds with its next estimation for the value of the function and so on. Okay, so that's fine. What happens on the other side of the screen? We still have our streaming algorithm, but now the input stream is not fixed in advanced. Instead, what happens is that there is some adversary in the background that chooses the updates in the stream one by one as time goes by. So in the beginning, the adversary chooses the first update in the stream, the streaming algorithm gets this update and responds with its current estimation for the value of the function. The adversary gets to see this response given by the streaming algorithm and based on that response, the adversary chooses the next update in the stream. Our streaming algorithm responds with its current estimation for the function, the adversary gets that response based on that chooses the next update in the stream and so on. And as you can guess, the difficulty now is that the choice for the elements in the stream depends on the estimations that we give out and as a result, the inputs in the stream depend on the internal randomness of our streaming algorithm and that's a problem. Okay, so let's see a little bit more formally the definition of the adversarial or adaptive streaming model. So as we said, first in the beginning we fix some underlying function g that we are interested in tracking its value throughout the stream and in this work we focus on functions g that takes a prefix of the stream and evaluate to a real number. We want to track this value throughout the stream or estimate it and we will also use the parameter alpha to denote our approximation parameter. So on every time step, we would like to obtain a value which is a 1 plus alpha multiplicative approximation for the true value of the function g at that moment. And as an example for a function g, maybe the function g that we're interested is we try, we want to count the number of distinct elements throughout the stream. Okay, now we consider a two-player game between a randomized streaming algorithm and an adversary. On every round, the adversary chooses the next update in the stream and this choice can be based on all previous stream updates and all previous outputs given by our streaming algorithm. The streaming algorithm takes this current input in the stream and responds with its current estimation for the value of the function. The goal of the adversary is to make the streaming algorithm err on at least one time step throughout the game. This means that the goal of the adversary is to make the streaming algorithm return an answer which fails to be a 1 plus alpha approximation for the true current value of the function g. Okay, so that's the definition of the adversarial streaming. And when you see this definition for the first time, you might think, well, okay, I see the differences between that and the classical oblivious setting. You added this adversary in the background that chooses the updates in the stream. But maybe nothing changed really and maybe existing algorithms for the classical setting, maybe they still work also in the adversarial setting. And if your streaming algorithm is deterministic, then you would be correct. It's easy to show that deterministic streaming algorithms, if they work well in the classical setting, then they also work well in the adversarial setting. However, the issue is that we know that for many problems of interest, your streaming algorithm must probably be randomized. And then when your streaming algorithm is randomized, you need to be careful because as time goes by, the adversary might learn properties of your internal randomness, and then you can't really make valid statistical claims about it anymore, and you run the risk of the adversary causing you to fail. And as it happens, many of the existing randomized streaming algorithms, they are not adversarial or bastard. So the takeaway from this slide is that we don't want the adversary to learn our internal randomness. Okay, so now let's see what are the existing results in this context and what are our new results. So first, several papers presented several interesting positive results. They showed, they presented transformations that take an oblivious streaming algorithm and are able to transform it into an adversarily robust streaming algorithm, and this transformation incur only a small space overhead relative to the oblivious algorithm that we start with. And now the question is, okay, what do I mean by small space overhead? And the answer is quantified using a parameter that was introduced by Beneliezer, Jaram, Woodroof, and Yougev. They call it the flip number of the stream. So the flip number of the stream, we denote it as lambda, is the maximal number of times that the value of the true function G can change by a more than a 1 plus alpha factor throughout the stream. So let's think about an example, maybe the function G that we are interested in counts the number of distinct elements, and let's suppose that there are no deletions in the stream. So if there are no deletions in the stream, the number of distinct elements, it's a monotonically increasing function, and how many times throughout the stream of length M it can jump up this number, how many times can this number jump up by a multiplicative factor of 1 plus alpha? At most 1 over alpha times log the length of the stream. So in the insertion only model when there are no deletions, the flip number of the function G that counts the number of distinct elements, the flip number is small. And that's good, because all of the existing transformations from an oblivious streaming algorithm into an adversarial robust streaming algorithm, they encore space blowup that depends on the flip number. So we have either blowup linear in lambda by Beneliezer et al, or root lambda by Hasidim et al, or alpha times lambda, where alpha is the approximation parameter of the classical algorithm A by Woodroof and zoo, or we have root alpha lambda by a ts and alpha. Okay, so the existing transformations, if lambda is small, they produce an adversarial robust algorithm with small space complexity. But what happens when lambda is large? So let's see the negative results. First in 2013, Hart and Woodroof showed informally negative results for linear streaming algorithms for many problems of interest. So as a negative result, that's great, but the negative result only holds for linear algorithms. So like before our new result, it was kind of open that maybe everything you can do in the classical setting, you can also do in the adversarial setting, but maybe just not using linear algorithms. We show the first separation between the adversarial setting and the oblivious setting. Specifically, we present a streaming problem such that if you want to solve this streaming problem in the classical oblivious setting, you can do it very cheaply using only a polylogarithmic space. But if you want to solve this problem in the adversarial setting, you must provably use large polynomial space even if you're trying to use a nonlinear algorithm. So that's an exponential separation. And in addition, so I don't know if the flip number lambda is the correct parameter to look at, but our result says that if you decided to use the flip number lambda as the parameter that controls the blow up of your algorithm, the space blow up of your algorithm, then in general, root lambda is the best you can get because we have transformations that keep the blow up at most root lambda. And in general, a blow up of size root lambda is necessary. Okay, so that's our result. How do we get it? We get it by formalizing a connection between the adversarial streaming model to the recent line of works on adaptive data analysis. So before I'll tell you how do we get our results, let's first recall what adaptive data analysis is all about. And in order to remember what adaptive data analysis is all about, let's start with the following warm up. Let's call this warm up simple problem number one. So it's a very simple problem. I'm sure we all know the answer to that problem. We have an unknown underlying distribution D over a domain X. And we also have a fixed known predicat mapping elements from the domain X to either zero or one. And we have some desired approximation parameter alpha. Our input is a sample S containing N IID samples from the underlying unknown distribution D. Our goal is to use this input sample in order to estimate the expected value of the given predicate H over the underlying unknown distribution D. And the question is, how should we compute our estimation? And also what should N be? How big should this input sample be in order for us to be able to guarantee small error with high probability? Okay, so that's an easy question. We all know the answer to that. We just compute our estimation using the empirical average of the given predicate H over the given input sample. And by the hofding bound, as long as N, the size of S is roughly one over alpha squared, we should be fine. Okay, so that's easy. Here is a little tweak to that problem. Nothing changes really. Let's call this problem simple problem number two. So everything is the same. We have the underlying unknown distribution D. Our input is a sample S containing N IID elements from this underlying unknown distribution D. The only difference is that now instead of having only one predicate H, we have K different predicats, all fixed in advance, all given to us, all known a priori. Okay, now we want to use our input sample S in order to approximate the expected value of each of these K predicats over the unknown underlying distribution D. What should we do? We should do exactly the same as before. We estimate the empirical average of each, we estimate the expected value of each of these predicats using its exact empirical average on the given sample S. And still by the hofding bound, maybe with the union bound, as long as the size of our input sample is roughly one over alpha squared times log log K, we should be happy. So okay, that's easy. But an important assumption we made here when we applied the hofding bound is that all of these K predicats were fixed in advance. In particular, in order to apply the hofding bound, we need the sample S to be independent of the choice of these K predicats. Okay. So now here is the basic formulation of the problem that is considered in the line of work on adaptive data analysis, let's call this problem the ADA problem. And what changes now is that the K predicats are not fixed in advance, instead they are given to us one by one. So we still have the underlying unknown distribution D. Our input is still a sample S containing N IID elements from this underlying unknown distribution D. But now we don't have the K predicats in advance, instead for K rounds, on every round we get the next predicat and we need to respond with the current approximation AI to the expected value of that even predicat over the unknown underlying distribution. And as you can guess, the difficulty now is that the choice of the predicats may depend on the previous answers that we gave out. And as a result, the choice of the predicats may depend on the sample S, which would break our previous analysis completely. Okay, I'm going to say exactly the same thing, but with a picture. So in the adaptive data analysis problem, we want to design a mechanism M that in the beginning of the game gets a sample S containing N IID elements from some underlying unknown distribution D. Now for K rounds, we get a predicat mapping the domain X to 01 and we respond with our current estimation for the expected value of that predicat over the underlying unknown distribution. Then we get the next predicat, we respond with our next approximation, and so on. And our goal when we design the mechanism M is to ensure that all of its answers are accurate with respect to the expected value of the given predicats over the unknown underlying distribution. And the basic question in the ADA literature is what should N be? What should the size of the initial input sample be in order for us to be able to guarantee all of these good things with high probability? And when you first see this problem for the first time, you might think, okay, I understood why the analysis of our basic problem number two, why the analysis breaks where we applied the hofding bound that I understand, but maybe it's just the analysis that breaks. Maybe answering with the exact empirical average, maybe that's still a good idea. But if you think about it for a little while, you will see that the answer is no, that's in the adaptive setting, that's a bad idea that can fail already in the second round. Okay, but no worries, we can do other stuff. And here is a summary of the current known upper and lower bounds on the basic ADA problem. So first we know how to construct a computationally efficient mechanism that answers K adaptively chosen queries using a sample of size root K. In addition, we know that if you are interested in computationally efficient mechanisms, then root K is what you get. In other words, assuming one-way functions, every computationally efficient mechanism for answering K adaptively chosen queries must use an initial sample of size at least root K. But we also know that if you are okay with computationally inefficient mechanisms, then you can do much, much more. Suddenly, the number of queries K you can answer can be exponential in the size of the input sample, can be exponential in N. Okay, so these are the current upper and lower bounds. What's the connection between that and adversarial streaming? So we show a reduction from the ADA problem that we just mentioned to adversarial streaming. And more specifically, we design a specific streaming problem such that if you could solve this problem in the adversarial streaming model, then you could take such an algorithm that solves this problem and use it in order to construct an algorithm that solves the ADA problem with related parameters. And once you have that, then you can apply the negative results we mentioned for the ADA problem to obtain negative results for adversarial streaming. And the streaming problem we present, we call it the streaming adaptive data analysis problem, or SADA in short, is defined as follows. Every update in the stream has one of two types. Either the i-th update is a data point, let's call it data point pi from the domain x, or it is a function h i mapping elements from the domain x to either 0 or 1. Okay, these are the updates in the stream. What do we want to estimate? Our goal is on every time step i, after obtaining the next update in the stream, we want to approximate the average of the last given function in the stream over the multiset containing all of the data points that were given to us throughout the stream. That's the SADA problem. And intuitively, why should that be a good problem to look at? Because first, let's agree that in the classical oblivious setting, we can solve this problem very cheaply in terms of memory, because we don't need to store all of the potentially many, many, many data points that were given to us throughout the stream because we can use sampling. We can just maintain a small representative sample out of all of these data points that were given to us throughout the stream. And now, whenever you give me a function, I evaluate this function not over the many data points that were given throughout the stream. I evaluate it only over the small representative sample that I'm maintaining, and that's cheap, memory-wise. On the other hand, in the adversarial setting, we don't have this trick of sampling. In the adversarial setting, if you want to answer many, many adaptively chosen queries by the lower bounds for the ADA problem, if you want to answer many, many adaptive queries, your sample must be large. Your sample is large intuitively. Your memory is large. You're not doing very well. So that's the intuition. And I just want to mention that, if you recall, the negative results for the ADA problem, they were computational. They assumed one-way functions. While we are aiming for an information theoretic separation, which means that we cannot plug and play the negative results for the ADA problem, as is, we need to open them up and modify them. And that we do using the bounded storage model, using cryptographic tools from the bounded storage model. Okay, so to summarize, we establish a connection between adversarial streaming and adaptive data analysis. Use this connection in order to present a streaming problem that separates the adversarial setting from the classical setting. And this is the first general separation between the capabilities of these two models. And that's it. Thanks for listening.