 Thank you so much. First, I am Ravi. I work at Google in Mountain View, California. So first of all, I want to thank the organizers for inviting me. It looks like a great workshop, but I also want to apologize for not being able to attend. So it turns out that this week we have some internal event at Google where, in research at Google, where all of us are expected to show up. So that's the reason why Andrew is not able to attend it as well. Okay. So today's talk is going to be on random walks and graph properties. This is some a body of work that I have been working for the last few years with my colleagues. And the way it's connected to the topic of the workshop is that we all know PageRank is inspired by a random surfer. And the point I'm trying to make is that random walks can do much beyond just estimating PageRank or giving a score for every node. So random walks are very powerful objects and can be used for a variety of statistical purposes. And that's the aim of this talk. So I prepared the talk for about 50 minutes so that you all can go home early. So if I finish a little bit, I apologize in advance. Okay. So this, as I said, is joint work with a lot of my colleagues, with the Flavio, Anirban, Silvio and Tamash. And this work appeared over a series of papers, some had developed up and some had KDD and some had wisdom. So I will not go into the exact citations, but at the end of the talk, I'll put up my email and you can ask me if you have any questions about any specific topics. Okay. So what is the problem that you're looking at? It's very simple. So we have given a graph and the goal is to estimate its basic parameters. So what do we mean by basic parameters? Let's say how many nodes, how many nodes does the graph have? Or how many edges does it have? Or you can go one step further and ask what are the fraction of nodes or edges of a certain type? Or you can ask what is the largest or average degree of the graph? And you can even make it even more complicated by asking stuff about clustering coefficients or higher-order objects like triangles. So that's the basic problem we are looking at. So you want, you're given a graph and you want to know its basic parameters, like what are the properties it has? Okay. So why is such a question interesting? Why is it useful? So I'd like to claim there are two main use cases for that. The first is business intelligence. I would like to know if I own a social network, I would like to know how many art lovers are there in a particular social network. Let's take Facebook. And as a person outside Facebook, I would like to know how many people who love art are present in the social network. So that's like calculating the fraction of nodes in the social network that have expressed some love for art. And another business intelligence reason is if there are two social networks, say Facebook and Twitter, they would like to know, one of them would like to know how well is the other one connected in a particular region? Say, is access to social network in Paris as well connected as it or why? So for business intelligence reasons, where I don't have full access to the social network, I would like to ask some questions about that. The second reason the second reason is algorithmic, just algorithmic curiosity. So a typical question is, is the triangle density, the number of triangles present in a particular region unusually small in certain portions of the graph? So that's useful because we all know that the number of triangles is a good indicator of the health of a social network. So if a particular section of the graph has very few triangles than you typically expect, then it gives us some indication that it's that portion of the graph is not doing that well. So maybe you can take some corrective actions. And another example of a question is, how does the graph's degree, average degree vary over time? So we all know these graphs change over time. They don't remain static. And the question is, how does the average degree, how does the connectivity change over time? In fact, one of the problems that we're going to talk about later arose out of this kind of business intelligence type questions. I'll come to that when we talk about it. So now that we have kind of motivated why looking at graphs and understanding its properties are important, and now we turn to what are the tools that are available to study such things. So the main, the big tool that's available for us is sampling. The sampling is a critical and important tool to understand large graphs. And the focus of this work is going to study such graph properties using samples. In fact, this turns out to be the only realistic option in many situations. So think of, say, I as an individual would like to understand the graph structure of Twitter, Twitter's graph network. Of course, Twitter is not going to allow me to or not going to give me the entire graph, or it's in fact, it's going to put restrictions on how many times I can ask the network as about connectivity. So it's not that I have the entire graph at my disposal. I need to make queries to Twitter's database to get all the nodes in the edges. So in, in this setting, the only realistic option I have is to sample Twitter's graph in a, in a, in a, in a nice way, such that I can make inferences. And the second reason why sampling is the only method is that there's no graph. Actually, the graph keeps changing. There's no fixed graph. Even like taking Twitter as an example, we all know that there are thousands or tens of thousands of people join or leave the network on a daily basis. So there is no fixed graph where I can really run my algorithms. The graph keeps constantly changing. And therefore, there's already randomness built into the underlying data, underlying object. And so sampling is one way to mitigate against this randomness actually. And another important reason why we use sampling is that sampling can give rise to provably good algorithms. In fact, if you can say something about the quality of samples, how good the samples are, or how representative, or how statistically unbiased or independent samples are, that directly impacts the quality of the output. So it's important to have provably good algorithms for sampling. And that's the paradigm we'll carry in this talk. Okay. But sampling is not a new problem that we are looking at. In fact, sampling has a very, very long history. So sampling can be traced all the way back to the world war, where there is a famous problem called a German tank problem. I don't know how many of you know about it. So the problem is actually very, very nice, very, very elegant problem. So in the world war, German tanks, tanks were produced with Germany with sequential serial numbers. And some of them are captured by the Allied forces. And the question was, and every tank had an ID, which was like sequentially generated when the tank was produced. And the question is, by looking at the IDs of the tanks I capture, is it possible to estimate how many tanks the opponent forces have? That's the question. And there are various ways of looking at it. Depending on what kind of modeling you do, whether you're a frequentist or whether you're a Bayesian person, you can come up with various statistical estimates. But this notion of looking at samples and making inferences about the whole underlying data objects is very old. And the whole field of statistics is somewhat devoted to this question. And coming to making inferences from sampling, there are, even to date, there are a lot of applications in field psychology that rely on sampling. For example, I want to count the number of birds in a particular region. And how would ecologists do that? They'll go on day one and they'll catch a bunch of birds and tag them with some tag. And then they'll come back the next day and see what, again, catch a bunch of birds and see what fraction of them have the tag. And under some very simple assumptions that the distributions are, they don't change much, you can write down the equations for estimating the number of birds by using these two estimators on two different days. And this is the Peterson-Lincoln Chapman index that used in ecology, used even today in multiple fields where one cannot really have full access to the data. And another kind of sampling is done to estimate a subpopulation. I want to know all the people, as I said, all the people in the network or in a community that are interested in, say, tennis. So if you're looking at a subpopulation, a population with a very specific property, you can use samples, you can use samples to infer that. Okay. So the kind of places where sampling turns out to be very, very important and useful is when the underlying population is too large or it's too difficult to have direct access to. The bird example is one thing. There's no way that I can go to any area and catch all the birds and count them or counting any number, any animals in any region. Or the same thing can be said about social networks as well. There's no way I'm able to go and access Twitter's network and count the number of users there. And sampling has turned out to be extremely useful in multiple areas, including things like statistics, computer science, sociology, economics and so on. For example, every two years in the US, we rely on polling to estimate people's political leaning and preferences and who's going to win and so on. And every 10 years, recently, US has been resorting to sample to do test sensors, like estimating average income, education level, the racial distribution and things like that. So sampling is present everywhere throughout all the, yeah. So somebody's asking for slides. Should I send the slides or should I continue talking? I see, okay. So hold on one second. Let me, I don't know why the, I thought I was sharing it actually. Sorry. Okay. Sorry. Some of I thought I was sharing the screen. Is it okay now? Okay. Sorry. You know, I was assuming that you only see a source slide, not me. Okay. Okay. Yeah. So, okay. So as I said, the, yeah. Okay. So sampling is a very important paradigm and it's very important when population is too large or it's inaccessible actually. Okay. So now that I have motivated sampling, I can actually go to the specific problem that we will discuss with sampling graphs. So how do you sample things in graphs? Okay. So to be able to study sampling, the most important thing is how do you assume things about the graph? What is the graph access model? So what sort of things are available to touch the graph? And once I touch the graph, what information that comes back from the underlying system? Okay. So here is a very simple assumption. So you assume that you can query any node by its name and get its out neighborhood. That means I can crawl somebody's home page and get all the links that come out of the home page, the web page, or I can go to somebody's Twitter account and get all the followers or people they follow from that account. So it's a very, very natural assumption. But it's just enabled by crawling actually. So if you go and write a standard crawler that goes around crawls on the web pages, you have this assumption dictated. And this is true for both web and social networks. So this is a very standard assumption. In fact, things like page rank are built on this assumption. You're able to go to a web page and get all the out neighbors of the web page. And the second assumption that you'd like to make is that a very small number of truly uniform nodes are available to the algorithm. So notice that truly uniform nodes are really expensive. How do I get a truly uniform Facebook user? I have no idea. Or how do I get a truly uniform web page? That's even trickier. Because as I said, the graph is changing. And so we really have to make a lot of assumptions as to be able to lay our hands on a really uniform node in the graph. And another reason why we like this graph access model is that this model supports random walks. As I said, this is the classic model used in, say, page rank or other graph access algorithms. And therefore, in particular, random walks satisfy the property. Because once I get a node and I take and I get all its out neighbors, I can clearly pick one uniformly among them and then do my continue my random walk. On the other hand, even though this model is very simple and elegant, querying inherently, it's an expensive operation. Because every time I touch the graph, I incur some cost, be it time or resources or even limitations. Twitter, for example, will not let me query their graph, say more than 10,000 times a day. There's a limit they would like to put on how frequently I touch their graph. And therefore, any algorithm we develop under this model has to perform as few queries as possible. And that's the goal of the algorithm, should minimize the number of queries. So now that I have motivated sampling in general, and how we are able to access this graph, the kind of assumptions we make on this graph, I would like to go to the actual problem that we'll look at next. So here is a simple problem. You are given an undirected connected graph. Let m be the number of nodes and m be the number of edges. And as I said before, you don't have the entire graph accessible to you. And you are prescribed a distribution on the nodes. So let d be some distribution on the nodes of the graph. The problem is using this graph access model, you want to output a node in the graph picked according to the given distribution D. Of course, you would like, I mean, you would never get exact answer. So you are allowed some small amount of slack, say within like plus minus epsilon additive error. And you want this algorithm to perform as few accesses to this graph as possible. The distribution is very general. Now I'm stating a very general problem. I haven't told you exactly what the distribution is. But this is the abstract problem. So we are given a distribution and you would like to design a process that outputs a node according to this distribution using a random work like model. That's the abstract problem that we're looking at. Okay. So why is this interesting? Here is a very easy case. Suppose the given the prescribed distribution is actually proportional to the degree of a node. For reasons that will come obvious later, let me call the distribution D1. So D1 of V and node V is proportional to its degree, the degree of V. It's the uniform edge case. I want to basically output a node in the graph proportional to degree. So the solution here is very simple, you know, this workshop is appropriate for that. So you just do a uniform random work on the graph. And that we know once the random work reaches its limiting distribution, then the solution is D1, which is the degree of the node. The limiting distribution of the work is D1. And the number of times you need to do this work until this limiting distribution is achieved is exactly the what is called the mixing time of the graph. So basically the length of the random work that I need to run in order that the node I output at the end is chosen according to distribution D1. So it's a very, very easy case. So let's look at a slightly more, in my opinion, more interesting case. What if I want to output a node uniform at random? So let me call the distribution D0. So D0 of V is 1 over n, where n is the number of nodes. So I want to pick a node in the graph uniform at random from all the nodes. So what will be the, how would I do this using random works? Okay, here is the first idea, the very natural idea that comes to like a computer scientist. So it's called rejection sampling. So the paradigm is generate and reject. So I'll connect the uniform random work for say T mixed steps. So I'll continue the random work until the work mixes. And suppose at the end of this random work, I reached a node called U. What it does reject with probability proportional to one over the degree of U, you output U and stop. Otherwise, you go back to the first step and restart the work all over again. And why is this one over D of U? So it really makes sense that if the node is very high degree, then it has chance of being reached multiple, multiple times. So you need to discount that. And therefore, if D of U is very high, then I should output that appropriately discount that and output that with lower probability. So the probability that you definitely is one over D U. In fact, you can show that formally assume that the minimum degree of the graph is one just just for analysis purposes. The claim is this algorithm takes T mix times D average number of steps to output uniform load in the graph. So it's a product of two quantities, the average degree times the mixing time of the graph. And why is this true? The process, you know that you generate U according to the distribution D1, which is proportional to the degree and you output U with probability one over D U. So if you calculate the probability of outputting some node, it's just the union bounds of probability that you output a particular node and which is like probability of generating U equal to U times the rejection probability. So you output only with probability one over D U. And since the probability of U equal to U is exactly the D1 distribution is D of U divided by two times the number of edges, if you work out the algebra, you get to be exactly one over the average degree. And therefore, the the probability of outputting some node is one over average degree. So if you repeat this process, the average degree times, then with a good probability you'll output a sample. So the expected number of queries or the expected number of steps of this algorithm is D average times a T mix because you run for T mix times and you repeat this process D average times to get a one sample. Okay, which is a very, very simple analysis. Okay. And the and the goal of this exercise is to see how good is this analysis, you know, can this be improved at all? Here is one attempt to improve it. The intuition is that if the graph were uniform degree, if every node in the graph had the same degree, then outputting a random edge is actually equivalent to opening a random node because you know that the degree of V is some constant times, you know, some constant for all nodes. And therefore, if I output by standard uniform random work, then I'm essentially outputting a random uniform node. So how about we make the graph uniform by artificially slowing down the random random walk at low degree nodes. So here's an example. So you have this example graph where the maximum degree say three. And now at node B, you artificially add a self loop that keeps you at B with probability two thirds. So you kind of stay at B longer. So the advantage of doing this is that the effective maximum degree of this graph is actually three because every node you stay at typically three steps. And therefore, since you stay at node B for longer, it could be the case that you would query the graph fewer number of times. That's the hope. Now, maybe this walk will result in querying this graph fewer number of times because you kind of hang out at the low degree nodes longer. So maybe that helps us. So first of all, we need to analyze this process. So we have essentially taken the original graph and we kind of modified it by adding self loops. The question is, what is the expected number of queries on this modified graph? Now, first of all, it's easy to convince yourself that the steady state of this modified, I call it the max degree walk is actually d zero, which is the desired uniform distribution. So that's an easy case. And you can say that the expected number of times you spend at the node is actually d max, the maximum degree divided by d u. That's also easy to see because you add in the appropriate extra weight to the low degree nodes. And we'll use a simple variational kind of inequality which relates the complicated expression I put on the screen to show that the one minus lambda two, the second eigenvalue, the gap of the graph is actually related to the steady state of the walk. And using these simple relationships, you can show that the mixing time of this modified random walk is one over one minus lambda two times log n. And using all the stuff putting together, you can again show that the expected number of steps is actually totally that because of this log n factors of the original quantity, the mixing time times the average degree. So what, even though we have taken a more complicated and intuitive route to try to see if you can gain anything, it turns out that the expected number of steps is still the same as before. It's still the product of mixing time times the average degree. So what about let's look at how a statistician think about it. So if you ask somebody an undergrad in statistics, how do you do this problem? They immediately go to Metropolis Hastings. It's a very, very classical, like more than 50 years old method to actually generate a sample according to any given distribution. In fact, it's a way to sample from any target distribution B starting from an arbitrary transition matrix Q. In this case, we'll assume that the transition matrix Q is actually given by the adjacent adjacency matrix of the graph. And the way Metropolis Hastings works just a quick recap is that you're at a state U, you first generate a node V according to the specified transition matrix that's using the matrix Q. And you go from U to V with certain probability. The probability is the Q of V U times D of U divided by Q of U V times D of V are capped at 1. So this transition can be shown to achieve the desired distribution D. So basically, it's a way to transform any transition matrix Q to a given target distribution D. Think of that way. And in our case, our target distribution D is a uniform distribution. And the arbitrary transition matrix Q is our adjacency matrix. So you can work out the Q and the D in this expression. And you can calculate that the probability of U transition to V is exactly 1 over the maximum of the degrees of the endpoints, degree of U comma degree of V. And that turns out to be the Metropolis Hastings transition distribution for achieving uniform distribution. And what if you do Metropolis Hastings? So you can work out the thing as before. And what we'll get is that the expected number of steps is actually T mix, the mixing time times the maximum degree. And the proof is similar to what we did for maximum degree of work. The important point to notice is that instead of average degree, you have maximum degree, which is no, we know that maximum degree is degree. In fact, I mean, in power law graph is substantially more, this bound is worse than what you would be given by a max degree or rejection sampling. And in fact, this is not like a pessimistic bound. It's not a bound that's an artifact of analysis. You can construct a graph such that the expected number of steps for Metropolis Hastings is omega of this. So it's at least T mix times D max. And the construction is very simple. So you put a, you put a key, say KD in the center of the graph, and you put a lot of long paths, paths of length K from every of the nodes. And you can set up things such that. So notice that in a path, the mixing time is like K squared because, you know, that's a cover time for a path. And so you need to touch all the nodes in the path and that will take like K squared. And the fact that you have K different paths, and you have like a large things in the middle, you can work out the details to understand that you need at least T mix times D max to be able to actually see and touch all the nodes in this graph. And that gives us a lower bound for image. If you work out the probabilities for image, then this is the bound you'll get. So the bound we had upper bound we had for image is not pessimistic. In fact, it's tight. So what this shows is that even though image seems to be like a very natural way to use random walks to achieve uniform distribution, it may not be the optimum to use. We have to be careful while designing algorithms based on stuff like image. Okay. So now that we have argued that somehow we seem to be coming, running into this barrier, right? We are running into the barrier of T mix times D average. And that seems we are in no matter, like whichever route we take, we seem to be hitting that wall all the time. So let's see, let's try to see if this is actually optimal. The first is that it's easy to show an Omega average degree bound. So the construction is you take a random GNP graph with average degree D and then you add some randomness to that. So at every node, with probability half and with probability one over D, you take GNP as one instance and you take another instance to be with probability one over D at every node, you add a small tree. And you can see that these two graphs have a huge distance actually. If you want to generate uniform node among these two graphs, it's fairly large. It's like half because probably whatever you have, you're having like a tree of degree D and you can calculate that distance actually is quite big. However, if the graph uses little of the average degree to distinguish these two graphs, it's not possible because you'll never hit this, you'll never hit this like the vertices where you actually add an extra tree. So you can't distinguish these two instances by using a little of average degree number of queries. And similarly, it's easy to see that T mix is actually lower bound because I mean without mixing, you know, you cannot hope to get for even for regular graphs, you cannot hope to get a uniform edge, let alone uniform node. So these two are lower bounds. So the kind of lower bound you will get is D average plus T mix. But on the other hand, the upper bounds have, we have been having a D average times T mix. But it turns out that again, the upper bound unfortunately is tight and this is like very recent result by Kereketti and Hadadan, which showed that any algorithm to obtain an additive approximation of the average of any function of the boundary degree of the graph must use D average times T mix number of queries. So basically, this is the best you can do for any graph. And there's no way around it. So that concludes the first part of my talk, which where the message was random walks are great. But you have to be a little bit careful by selecting the right algorithm, in order to even solve the very simple problem of outputting a uniform node in the graph. So if you choose image, then yeah, let me skip this construction. So if you choose image, then you might run into efficiency issues. And let me quickly go with experiments to convince you that this is actually not just theoretical artifact, but also observed in practice. So we have two kinds of way you can test the whole thing. One is that I can check how uniform the samples I get from this algorithms, or I can actually use the samples to do something else. For example, I can use the samples to estimate the size of the network or to compute trusting coefficient and so on. So we'll come to these questions in the second part. So the experiments on real graphs show that image is actually very bad. If you calculate the first part of the graph shows the y-axis shows the L1 difference between the distribution you obtain versus the uniform distribution. And the x-axis shows the number of queries you need to make to the graph. And you can see that image is really bad, but the rejection sampling does pretty well. And the phenomenon exists for other graphs as well. And in fact, the same thing holds. If you use the samples to do various estimations like estimating average degree, then still rejection sampling or MD, they do pretty well. Okay. So I've been using the symbols D0 and D1, and that's for the reason. So in fact, you can define a very generic distribution, D of say 1 plus epsilon, where the goal is to output a node with mass, say, a degree of the node times 1 plus epsilon. So I want to basically pick high degree nodes to see higher probability. Unfortunately, it turns out that you cannot do anything for this. You cannot, the number of steps you'll make is actually polynomial and n. And that's because for the simple reason that if I amplify the degree like this, if you want to output higher degree nodes, even with higher probability, then I can always hide some really high degree nodes in the graph, such that you have to basically look at the entire graph to be able to output that node. So that is a trade-off here. So the D0 and D1 that we talked about are very special, and if you try to extend it to higher degree, then you run into this lower box. So basically, you have to look at the entire graph. Okay. So this is basically the first part of the talk, where I tried to motivate sampling as an important problem, and generating a uniform node in the graph, even though it looks like a very simple problem, is technically quite intricate because of some pitfalls that one might face by using things like a metropolis as things. Okay. So the second part of the talk is the most statistical part that, suppose I have samples like this from D0 or D1, how would I use them to estimate interesting graph properties once again, using random books? Okay. So here is a very simple property. I want to count the number of nodes in the graph. So estimating N, very, very basic things. I want to know how big Twitter graph is. And this has been considered extensively in our community, in the social networks and the web community. And the rough idea in most of the workers are using birthday paradox. So if you have K uniform samples, then the expected number of collisions in this K samples is roughly K squared divided by N, right? I expect like, if I have square root N samples, then I expect two of them to have the same birthday. And using this as the basic tool, cards of liberty and SOMEC propose a following simple algorithm. So you want to sample nodes proportional to the degree. And if x1 through xk are the samples, then you output this quantity, which is some of the degrees of the samples divided times some of the reciprocal degrees, the whole thing divided by the number of collisions. And why I'm insisting on the first bullet, why sample nodes proportional to degree? Because that's something that we really like, because that's exactly what random walk would achieve. Instead of, I say, argued getting uniform nodes is very hard, but getting uniform edges is relatively easy. And therefore, if I sample nodes proportional to degree, which can be achieved by random walk, I get uniform edges. And this quantity that they output, you can show that is related to the expected number of collisions. The expected number of collisions, this graph is exactly K2 square times the D i over N squared. Okay. And in fact, you can do something slightly sharper analysis actually. So even though this is very, it looks very pessimistic, if the graph is, say, regular, then you can show that square root and sample suffice. Or if the graph is, say, has a power law with parameter two, then even fewer sample suffices into the one four suffice actually. So by making even more assumptions on the degree distribution, you can get sharper and better bounds on the number of samples you need for estimating number of collisions. And there was some follow up work by Cooper and et al, who used to return times in the random walks to do even better estimates. So right now, we just use random walks in a black box manner. But if I also estimate the amount of time it takes for random walk to return to a given node, then I can get even better estimates. The question we turn to is actually average degree. So this is estimating number of nodes is very simple. And in fact, the motivation for this was actually when I was talking to my friend who works for LinkedIn, I asked him, I know how big is LinkedIn, tell me how many, what is average degree of LinkedIn? And he came back with the answer saying that, sorry, I can't tell you, it's like a business secret. And that actually motivated us to look at this problem from an algorithmic point of view. So if I want to access LinkedIn as a crawl, how can I use that to estimate the average degree? So average degree, I mean the number of edges divided by number of nodes. So first of all, there's a very simple answer to that. I can estimate nnm independently using collision counting. So I can estimate the average degree using order root m plus order root n times the number of samples. Or I can even do using directly actually using node collisions, turns out that that gives you a bond of square root of n times some average degree quantities. So basically you're stuck at square root n. And in fact, here's a very natural algorithm to output the average degree of a graph. You sample nodes in the graph uniformly at random. You output the average of the degrees. That's a very, very natural algorithm, right? You pick a uniform node in the graph and output the average degree that you see. Let's forget for the moment that uniform nodes is hard. But even if we have access to that, here is that this is a natural algorithm. And Figer proved in a beautiful result that if the number of samples is square root of n divided by some bound that depends on the average degree, then this algorithm gives you a 2 plus epsilon estimate of the average degree. And you might ask, why does it give you 2 plus epsilon estimate? It's natural because there's no way around it. Now think of the graph in this example on the right hand side, the top graph. So unless this uniform sample hits the blue node, I cannot distinguish this graph from a graph consisting of just disjoint edges. Because this graph has a lower average degree. Sorry, this is a higher average degree. The other one has average degree 1. So distinguish these two things, I need to really see this blue node. Therefore, any algorithm that uses little of n, say square root n samples, will never see this blue node. And therefore, it will give you only a factor 2 approximation away. And in fact, you cannot, this is basically what you can do. You can get only factor 2 plus epsilon approximation and you have to use square root n samples. And there are obviously there are good lower bounds for that. So there are limitations of this algorithm. And Goldreich and Ron proposed a slightly different estimator where they try to bucket the sample nodes by degrees and they kind of throw out the small degree, small buckets because they contribute to higher variance. So they compromise on the unbiasedness of the estimator. And still they only get 2 plus epsilon approximation. And if they make an additional assumption that for every node, you have a random neighbor that's available, then you can get a 1 plus epsilon approximation. But still the number of samples is square root n. So basically, we seem to be running into this barrier of square root n. So the square root n from the naive collision counting or through a Figes method or through Goldreich-Ron. And that seems to be like a barrier that we are not able to overcome to a estimator average degree. And in fact, as I said, any uniform sampling algorithm cannot build phases lower bound. So you have a sample lower bound of square root n for any uniform sampling algorithm. Okay, but now that we've argued random walks are really good, what about non-uniform sampling? What about if you try to use nodes picked according to a random walk? Say, dilly biased. Okay, again, this may not give a good answer because look at the example that on the right hand side. Okay, so the top set of nodes have a very low average degree because there is a huge component that has like average degree zero. And the bottom example has average degree of say three, say let's let's call it n over four regular. Okay, so it's average degree of n over four. But if you just use random walks on the top graph, you will never see these isolated components like no single times. Therefore, you think that the average degree of this graph is actually different from the bottom one. So you have to be careful. So you can't just say take random walks and use the samples from the random walks to estimate average degree. So you will kind of give too much importance to high degree nodes. You will never take into account loading nodes. And that's the intuition here. So uniform random walks are very bad for high degree nodes because you end up not taking high degree nodes in the account properly. And degree bias nodes, a degree bias sampling like random walks are bad for loading the nodes because you never see them. And the question is, can we get the best of both worlds? Can we get the best of uniform random walks and the degree bias nodes and do something better asymptotically for this problem? And here is the idea. So instead of sampling nodes uniformly or sampling nodes proportional to degree, you sample nodes with probability proportional to degree plus a small constant, a smoothing constant. So that's the paradigm that I'm trying to promote. And notice that this node of sampling is still random walk friendly because I can do some kind of a lazy random walk. I can still do random walk, but I add this constant to make the random walk say stay a little bit longer at every node similar to the similar to the max degree walk that that we talked about. And the only interesting question is, how do you choose this constant carefully? Because a poor choice of constant might, for example, the constant at say zero is bad because you know it's it'll be like degree bias or constant very high will make the walk uniform and that was a bad. So there has to be some sweet spot between these two that hopefully will give us an asymptotic improvement to the number of samples. That's the idea here. So the algorithm will actually involve three steps. It's like a bootstrapping algorithm. So first it does it estimates what is called a course estimator, where the goal of the estimator to get a constant approximation to the average degree. So think of like a factor 12 approximation in the average degree. And then there's a bootstrapping step that takes this constant approximation and makes it arbitrarily small. So it's we call it the refined estimator. So it takes this constant and makes it like one percent. So the algorithm is just putting these things together. You first run the course estimator, get an estimate, a very crude estimate of the average degree, and then use that estimate has the smoothing constant and run the refined estimator. That's the whole algorithm. So first let me talk about the refined estimator because it's easier. And what is what is what are we given? We are given some course estimate C, think of C as a constant. What we do is that we sample k nodes x1 through xk with probability proportional to the degree plus this constant C. And that's I argued it's easy to do using random work. And then the output of these two quantities, the ratio of these two quantities A and B, where A is di over di plus C and B is one over di plus C. And you can calculate that the expected value of A divided by the expected value of B is exactly equal to the average degree. Notice that you can't say much about the I mean it's not the case that it's you're looking at E of A over B. In fact, it's the numerator denominator treated separately, and the average values of both will give you the average degree. And that's actually quite important because it says that this estimator is actually biased. Sorry, it's not unbiased. It's not that we average these estimators and get average degree. Okay. And how do you prove that? So it's fairly simple. So if suppose let's see the crude estimator that you got is some constant factor away from average degree, say it's alpha times average degree. And if the number of samples is something like one over alpha divided by epsilon squared, then you can show that the refined estimator outputs one to epsilon estimate. And the way to show is that these two quantities A and B, the numerator and denominator are concentrated given this value of K. And you can analyze the second moment and use Bernstein's inequality to show that the concentration happens. And in fact, the denominator needs the course estimator. The numerator is fine, but the denominator is actually you have to be careful. Remember, the denominator is something like this one over Di plus C. And that requires the estimator to be good. Okay. And you can show that the deviation B minus expected value of B is less than 2 over something greater to the minimum degree and the estimate. Okay. And as I said, this is actually not an unbiased estimator, but you can bound the bias. You can see the bias, you can calculate the biases at most, something related to the estimate and the average degree. And if the estimate is good, then it's bounded. And as I said, you can implement this algorithm as a random walk. And the number of samples you need for the random walk is related to the eigenvalue gap of the graph. Then again, you can work it out. It's fairly easy. Now we'll come to the course estimator. So remember the final estimator, the final estimator, assume that you already have a constant approximation to the average degree, but how do you get that in the first place? And that uses a guess and verify paradigm. So you first guess in logarithmic buckets, you choose C equal to 1, 2, 4, 8 and so on. And you sample nodes with probability proportional to degree plus C that again, we know we can do easily using random walks. And now you calculate the number of low degree nodes. If the fraction of low degree nodes is more than say some 5 or 12, then you stop and say that this C that corresponds to that experiment is my course estimator. And why does it work? That works using some version of Markov inequality. Then show that if C is something like alpha and average degree, then the probability that a particular node has a degree less than equal to C is bounded on both sides by this function of alpha. And therefore, if C is very small, then the fraction of low degree nodes is less than 5 over 12. And if C is an underestimate, then the fraction of nodes, the low degree nodes is large. And therefore, since we have bounds on both sides, by just trying C one by one, we will, by using the right criterion, we'll output the appropriate value of C, which is fine. I mean all the caribou is like factor 3. So we'll get a factor 3 approximation to the average degree from this. And that's the entire algorithm. So this is the final bound. So the final bound is that you can estimate the average degree, which say probability 1 minus delta by using log u, log log u plus 1 over epsilon squared queries, where u is some upper bound on the maximum degree. You can take u to be the maximum degree. But the point to note here is that maximum degree is less than n. And therefore, you get log n, log log n number of samples. And contrast that against the earlier algorithm, which all took root n. So basically, and in fact, there was even a lower bound. So you can't do better than root n. But we are able to get around that lower bound of uniform samples by using degree bias samples. That's the point of this. So you can get exponential improvement. You can get almost like log n, log log n number of samples. And it's not just theory, but also in practice. You can show that this random walks do obtain better estimates of the average degree than using FIGA or Goldreich-Rohrn algorithms, actually. So not only they do well in theory, but also well in practice. So it's a straight improvement. So that basically concludes my talk. So here's the summary. So the message that I want to convey is that random walks are really powerful. They can go much beyond page rank to estimate other interesting properties of graph like average degree or generating uniform node. And we showed there are interesting bounds on generating uniform node. It can extend that to other distributions on v. And one has to be careful using random walks for this purpose. Because if I use the wrong random walk like Metropolis Hastings, then I can get strictly worse results both in theory and in practice. And maybe this points out to this line of thinking also points out to maybe there are better notions of mixing for social graphs. Maybe lambda 2 or the mixing time is probably not the right notion for such graphs. Maybe there are like average case notions that are more appropriate. Maybe one is willing to discard large fractions of the graph in order to define a mixing time. Maybe that's a better notion. And second thing I wanted to say is that there is a lot of power in random walks in that they give rise to non-uniform sampling of the nodes. And that can lead to better algorithms, say even for a simple problem like average degree. And the question is can this idea of non-uniform sampling be used for other estimation problems in order to improve their bounds? So there are so many other estimation problems one can think of in the graph like number of triangles or crossing coefficient, things like that. Do these non-uniform samples on particular random walks can they give rise to improved upper bounds by circumventing the lower bounds that already exist? So that's all I have. Any questions please reach out to me on my Gmail. Once again, apologies for not being there. It looked like a fun workshop. Yeah, thanks everyone. Can you one second? Can you see me? No, we don't see you. Can you see me now? So, other questions, you mentioned return times. Can you discuss situations when you have like Pankareli currencies, power, or distribution or you will have always on this graph exponential distribution of return times? So I'm not fully aware of the line of work. So the return time work of the work of Cooper et al was actually I think more on the theoretical side. I don't know how much has it been used in practice or in like stylized models, but that the thing that they want to convey is that in addition to just random walks, you can also use some things that come out of the random walks like return times in order to do some estimation. So I don't know much about the Pankareli currencies or graph exponential work. Because in dynamical systems, this chaos and chaotic and integrable components, it's rather generous that you have power or distribution of return times. Because you can have sticking near some islands of stability. And then I wonder if your estimates will work for such situations. It's also directed network. So the directed networks are very tricky because you know, I mean, the lambdas, I like, I mean, you might get stuck in some local regions. And then typically there are, they will have like some of them will have a huge exponential mixing time. So I don't know how many of them will work in directed settings. So you may have to repeat the question. Yes, that is a question What about the non backtracking and walks? What is about non backtracking? Non backtracking walks. So these books are not these books are non backtracking. These are like regular random walks. They are uniform random walks. Actually, that's a that's a very good direction. Because if you are to allow the backtracking, maybe you can improve some of the those even further. But I don't know how to analyze them, but it's a it's a it's a good direction actually to to study the questions. Okay, so