 And I'm excited so many of you are choosing to attend. So it's a little intimidating to come and give a talk on a topic that is something I'm interested in, but hasn't really been an area of specialty for me. And it's a specialty area for many of you. So it's a little intimidating. So I'm hoping to really get some discussion going and things like that. So let's see. So I thought I would start with why I got interested in this and what's compelling about it to me. So let's start on a personal story. So this summer I was working for a judge Kaczynski in Pasadena. And I lived with the Hillis family in Los Angeles. And Danny Hillis is somewhat famous in the technical world. If you don't know him, he did things like parallel computing and he invented the rate array and just is a thinker that I admire very much. And we got into a conversation. He's like, why do you have this crazy background? Like, what are you? Are you quantitative? Are you like a fuzzy person? And you're swinging all over the place? And I explained to him that it actually makes sense. And so what's always been driving me, even since my undergrad days as an econ student, is this idea of social welfare economics. And what components do we need to have a society that's really allows people to be as best off as they can? And so that sounds very sort of idealistic and a bit naive, but it's something that's driven me. And it's still driving me, actually, throughout my career. And so that is what drove me into statistics, because if you're doing something like that, you really need to come up with theories and you need to measure them against data and understand how your theories are actually playing out. And that's something that I think if you're not well versed in statistics, it's going to be more difficult for you to do. And then the law school brought it all back together, where I could really talk to a group who were as interested in the policy issues as I was, and I could bring my empirical skills into it. And it felt like a natural home to me in the law school. So that's sort of my trajectory. So I explained this to Danny. And he said, well, what's your goal? What do you want to do with all of that? And I said, probably something along the lines of academia. I mean, I'm not wedded to the idea, but that's what I'm thinking. And he said something that changed my life, I think. He said, well, it's great to come up with these ideas and these theoretical ideas that you speak of, and maybe testing them out or whatever you're talking about. But you're not going to change things unless you really build tools and put these ideas into people's hands and make it something that they can fold into their lives and they can incorporate. If you just come up with the ideas and you put them out there, and you're just waiting for other people to build the tools, I mean, they might. And there's cases where they have done this kind of thing. But you're probably losing a lot of the potential impact you could have. And I really thought about that. And I think he's right. And that's one of the things that appeals to me a lot about Berkman is that impact is a component of the work that goes on here. And it's made me really refocus my career instead of like, I was really thinking I would probably just write papers my whole life, which I love to do. But now I've really reached out a lot more in my own thinking in terms of tools and things like that. So you hopefully will see that a little bit in my talk. So OK, these two images. So these are really dramatic images that I wanted to start with. So the one on the left here, this is Wafa Salton. And I heard her talk at Stanford Law School. And she was really an incredible speaker. And she illuminated for me a lot of the women's rights issues that are happening now in the Middle East. And for some reason, I just hadn't really thought about it. You know that there isn't much in the way of women's rights under Islam. But she explained it and what it was like to grow up there. So she was born in Syria and moved to America when she was 30. And has been really a leader in educating people, I guess, about what's happening with women in the Middle East. And on the right is, this is a blog post that was posted by Karim Sulaiman in October of 2006, I believe. And this is just on blogspot. So he just put it up there. And I don't read Arabic. I don't know exactly what it says. But he's critical of the government in Egypt and how they were handling some Christian unrest. And this is what he's in jail for four years. I believe he was warned before this. It's not only this blog post. But this, to me, is another compelling story where you have people expressing themselves and thinking some pretty dire consequences about it. And you can also see this ad for that was an accident. They just had one about censorship in Bahrain. So both of these issues I find compelling. And it ties into what I'm interested in. And also, you'll see how the tools play into this. OK, so a starting point for what I was thinking about this is I looked at Beikler's book, The Wealth of Networks. And he talks about the impact of the internet on democracy or on society more broadly. And he makes these four points on where he thinks the impact is going to be maximized. He thinks it will increase individual freedom. There will be a platform for democratic participation. He's very interested in the cultural impact, like more reflective and self-critical society. And then he talks about a mechanism to achieve human improvements everywhere. So these, to me, this didn't give me enough of a framework to know how I could shape my research if I was going to research in this area. They're extremely normative. And I don't have a clear sense exactly what each of them mean. I sort of have a bloody sense about most of them. So what I wanted to do was see if I could build on what he was doing. And I came up with a framework that I think is one way to look at the impact of the internet on democracy. And by the way, for democracy, I'm not going to define the term. I'm just thinking of it in terms of everybody voting in some sense, like a referendum or something like that. And that's a fairly minor detail. OK, so there's three ways we can think about this relationship. So the internet increases the flow of information about issues. So in some sense, it's a facilitator and a disseminator about information. That's one thing that's an important role that the internet will have. I'll talk about all of these more. The internet's also a tool. So even if you don't want to think about the informational aspects of it, you could actually use it for voting itself. Like, it's a facilitator of the actual infrastructure of democracy. And so maybe there's a sense in which you can circumvent regimes or something like this through the use of the internet. There's other things you can do. There's other tools I'll talk about for decision making or opinion formation. So by the way, these all these things blend, right? Like now I'm back into opinion formation and they overlap. It's not a lot that can be done about that. The other one that I thought of, I think the third prong, is how the internet can educate about what democracy even means when you might have areas that really just don't know about other alternatives and aren't cognizant of the debate and what you could actually accomplish and what democracy really means. So I'll talk about all of these a little more, but that was the framework structure. And by the way, I'm very happy to be interrupted with questions. I don't know if that's your normal format, but I feel absolutely. So for changing our minds on issues, this is positive in the sense that it builds autonomy. Each of us is even more responsible for, I mean, we're completely responsible for our opinions and our vote. But this makes it easier for us to form the opinions, to get information. And there's also a sense in which it builds community because maybe it enhances the amount of communication that you have with people. And you're doing community building. There's another sense in which it could be very powerful and there's a lot of restructuring of the information going on now on the web. Like, for example, I think changes in the web as we move towards a semantic web and restructuring of information so that machines can actually read and understand what's out on the web. And it's not just us parsing our Google searches. I think this can have a phenomenal impact on how we understand the information. An example of that is the Transparent Federal Budget Project. I think the website is weareallactors.org, I think. And what they're trying to do is information that's out there on how our federal budget is spent in the different departments, it's something that we can get. And we just click around and we're patient enough and we can understand exactly how the spending is happening and where our tax dollars are going. But that's a lot of work. And I can't actually imagine sitting down and doing that. So what they're doing is they're building a tool that allows you to see in real time the flow of money and where it's going in the federal government. And that's an example, I think, of something that could have an enormous impact on how we vote. And because I think people, I mean, I have something of an idea of the federal budget, but I don't really know exactly where things are going. Things like math light. And well, this is my idea. This exists, and that's just my thought. So math light tracks campaign contributions and voting. I think it's just in California. I thought maybe we could do something also, like tracking. I'm not just looking at campaign spending in other countries, but tracking aid and things like that and looking at its impact. So these are things that are giving people more information in more useful ways. But there's a lot of criticisms on this. So there's a lot more information out there, pages and pages of documents. And how do you really know what's just being put up there to objuscate? And so this is, I think, the main criticism that people come up with against the internet, that you just have a lot out there and you know what to believe. So you can look at this very positively and say that it feeds back into autonomy and self-determination and so on, in that you're going to have to make these judgments about information. But that just seems like way too much work, right? None of us can just sit there anywhere and just really sit through all this information. So then you get into this idea of the gatekeeper, which is something that traditionally, I think, has been very valuable, like who kind of certifies the information. And it hasn't been so much of an issue with traditional media because we had certain prestigious newspapers and we had channels that we somewhat trusted and things like that. And you know, it used to matter. Or I guess it still matters, like who the New York Times endorses as president. And this is the gatekeeper function. And so I think in the context of the Middle East, I think that's a really an open question, like how the gatekeeper role is going to be served there. And it's really vital to the internet, I think, and it's going to influence things in a positive way. So I think Bencler called it the attention-back vote and how you see sites that sort of gather. I mean, I don't have a model in my mind of people just sort of randomly clicking on the web and trying to gather information. But they do sort of point to sites like where they're going to get their information how these sites become trusted and what levels of trust are out there. I think it's really important to avoid things like manipulation. Another thing is with information, I'm just really concerned about some groups kind of overwhelming other groups. So I think that also plays into that in terms of information and gatekeeping. There's a debate out there with more than just Cass Sunstein and James Fishkin, who's a professor at Stanford in the communications, I'm assuming Cass Sunstein is pretty famous in the legal community. But Fishkin, I think, is not as well-known. He does a lot of research and deliberative democracy. So Sunstein has a very powerful idea. And he says, OK, even if you have information getting out there into people's hands and you have this attention-back vote of people who are sort of finding their way through the web on their own, aren't they going to reinforce opinions that they already have? And aren't you going to actually get a lot of polarization? And I think there's some argument that this has been happening in America where I guess Bush's term is the internet has been in its ascendancy. People really just read things that they're interested in and they're not actually changing their mind. So Fishkin has done research where he gathers people together and he looks to see how information impacts what they think about an issue before and after they actually get the information. And he doesn't agree and he thinks that this further information people do actually tend to open their minds. So I think that's an interesting question. So I'll come back to it and it keeps going a little. I mean, it's a theme and a talk. OK. So the internet is a tool in itself. So we could vote directly through the internet. So that's one way that the internet is useful. So this has been happening in Estonia. It's a tool for literally mobilization. So separating that from the informational context. So just a few examples, like apparently in 2002, the South Korean presidential election was widely attributed, at least the factors that changed people's minds, on the margin were attributed to who was more involved with the internet. Apparently there is a story that one candidate's wife made a really good savory pancake recipe video and people liked that a lot in the first one. So I don't know. But I mean, that's a synanica. And then I've also heard that the mobilization for the mass demonstrations that ended up really pushing Swartz out of power were coordinated through the internet. So this type of thing, I see that as more of a tool for political structure and change rather than an informational aspect. So there are more tools that are out there. So I've talked about the semantic web a little bit and how that feeds back into information that you get from the web. So Danny Hillis has a new tool for collective reasoning where he's using the web, and it's very early. And they're just, so I've been helping him a little bit. And it's a project that's just right now being tested out. But the idea is maybe we can even map out an argument through a table or through a flow chart or something like that. Like an argument like, what should we do about global warming? There's some huge thing like this where there's tons of arguments. And it's very difficult to actually speak concurrently. If we had a conversation about this in half an hour, there's no way we would ever get to the bottom of it because there's a lot. And so I think the internet can actually put tools like this out where people can, on their own time, sort of fill in it to the argument, find out pieces they disagree in, and the goal of the tool is just to map out the argument itself, not to try and corral someone into a certain opinion or anything like this, but just to see what people think. And I think that kind of clarity in reasoning, not telling you how to reason for anything like this, just in understanding the issues that are important and what other people think, that's another way that the internet can really, the democratic function. Okay, so in caveats. So this is an argument that Pam Carlin actually brought to my attention, which is, so she's thinking about voting and counting votes and groups, and she's thinking about it in the American context, but I think it generalizes. The internet pushes us towards virtual communities, not traditionally for democratic purposes, especially in America. You would have a very mobile and geographic-based community that would, and it still does, dictate the hierarchy of power, I guess. So as we move to more virtual communities, this washes out geographically-based groups. So far I haven't said anything particularly controversial, but there may be reasons you don't want to do that, and for example, different racial groups may cluster together geographically, and that's something you may want to be sensitive to and you may not want to just wash out. There's other issues that are inherently geographic and that you would want to maintain a geographic structure about. So I think, again, I pointed out things that I find very compelling, but I think there's reasons also to kind of be careful a little bit. Okay, so then my last poem is really just about, I guess, the word enlightenment sounds very patronizing and I don't mean it in that sense, but talking about other ways that you could actually have rule and also your sense of empowerment over what kind of rule you exceed to in some sense. So my biggest problem with this poem is I think in some of the most, I think it's a little bit pie in the sky for me to say that in the most rural parts of some of the most underdeveloped countries, it'll be just great when they're surfing the web and they get all this knowledge, right? It requires the internet to be pervasive in there and I'm not sure that it really, actually I don't know and I'm not sure that it really is in the sense that there's a lot of countries that really block access to the internet and I believe that they're probably doing it for this reason, right? And so I think, so this may be somewhat limited but as far as it gets out there and ideas can get out, then things like a bill of rights can be an idea that people are able to talk about or learn about but they might not be exposed to otherwise. So this is also something that I think is important too, that there are rights that can be internalized or maybe even should be internalized and people will be willing to fight for them and the first one is freedom of speech and so that's obviously the one that varies most directly on the internet but other ones like I talked about women's rights or environmental rights, opposition to aggression and violence, all of these things can come out in ways that they may not have been able to before in a way that might be more efficient in the sense that people can learn and see what other countries have done and what other experiences have been with it. Okay, and this is also the Solomon example too in that we actually see what's going on in regimes like in Egypt and how oppressive they can be when things like what happened to Solomon happen. So I'm really concerned about these and I know there's a lot of experts here on this and I'm not one of them. Okay, so I had a few empirical ideas to support the framework that I'm thinking about. So I think it'll be really interesting to know how, to know what regions are used in the internet and how deeply it has actually reached into these countries. I don't know and a lot of what I'm thinking is contingent on that and if it's not really reaching into areas it's not clear that it would have an enormous impact. I'm very interested in what types of internet use. So this is a difficult problem, but I mean I think it might be feasible to try and see how much of it is government, how much of it is really people blogging and that kind of very high level understanding of what it's being used for. So in talking to Bruce before the talk I realized that this attention backbone math ideas is going on I think in Iran and people are looking at what sites are really being accessed and where people are finding their gatekeepers and using that information to get their, I guess, political understanding from the web. So that to me is really an interesting problem and I think we can understand a little more about the polarization too and I'm not exactly sure how, but maybe we could look at traffic on the different sites and who's repetitive and so on and how the backbone changes over time may give some indication of its polarization in people's, at least how they're looking at the web. Okay, so also this seems to be something that has been heavily covered as far as I can tell from Bruce. I'm very interested in what the different regulatory structures are in the different countries and I see this as a cornerstone to effective use of the internet, especially access or what the free speech rights are and what the intellectual property rights are. So I don't have a good understanding of it and I would like to. And I think all of these can work together too. So I'd like to look at correlation between them. So those are the things that just jumped into my mind. So this is a joke. By the way, I got it from Gawker.com. So this is Anandina Judd's blog. So he put several blog posts up there and they did something humorous which it's a little tongue-in-cheek even for me to put it up here, but I actually thought the idea was cool. I'm talking to Bruce. I think I'm certainly not the only one to have had this idea. So they're mapping out what he spends time blogging on and they're actually not that wrong because they looked at the blog and essentially what they did is like he's got a post saying Merry Christmas so that's like 8% and so on. And so I think this kind of thing is really useful and I like the idea. I'm being a statistician. I'm very compelled by effective information display so that's why a lot of my empirical ideas were things like maps or it's kind of charts and things where people can understand what's going on very easily and it's effective communication. So this kind of thing I thought was cute. Okay. So I feel like at least personally I get very excited about the internet and I get very excited about technologies and when I thought about it more I think there are disadvantages and a lot to be careful of I think. But I think the framework of information facilitator as a tool and the democratic educator is one way I think to understand how the internet could be impacting the democratic process and democracy in general. I need an understanding of the regular structure. I think there's particular types that we need to have in place and the technology of course has to be there and then I'm worried about these trusted intermediaries. I think they're also important. So that, at the last minute Bruce said can you also talk about your license you've been working with the LISAC on? So I don't know if you want me to talk about that or I can take a break and we can have questions about the other stuff and maybe I'll get to the license and we can ask them about it. What about the license? Okay, go for it. I think you should ask anyways. We care about these things. Well I'm really interested in the license too so I'm happy to talk about it. So this is a license I'm proposing because I saw a problem in our current licensing structure. I've been, I would say I've been a computational scientist for a number of years and one of the components of my THD was building a software tool that housed all the software that I've used to work on my thesis these other problems. And so I just, I mean it's a joint, it's not just my project that was a joint software project and so we just put it out there on the web and I remember getting questions like people who wanted to use it and they were like well what license is it on there? And you know, I need to know this for a grant I'm like I don't know. And the reason I was having trouble with it is because if you are really going to take, I think if you're going to take your science seriously as a computational researcher, so a computational researcher or anyone who's using the computer, they're writing code and they're doing some kind of empirical experiments like that. So this is really broad and this is something that more and more people are doing like you see it in all sorts of fields like the geology or like all over like biology and people are running these experiments. And it's not just mathematics and statisticians and so on anymore. And then I think there's a lag between how they communicate the results of their experiment and what's necessary for science. So for example, typically have a new algorithm or something like this, that's some research product and then you publish it in this four page IEEE conference paper at six pages maybe and it's all very cryptic and no one really knows exactly what parameters you use to get those beautiful figures and so on. So my advisor and another researcher John Clairvout at Stanford, they are, it's John Clairvout's invention, this idea of reproducible research. So if you're really serious about your science, don't just put your results out there or like the six page paper or what have you. Put everything out there that people need to reproduce your science because this is really what science is about anyway. So put your code out there, put all the parameters you use out there, put all your figures and files, your data, everything out there. If you do this and the NSF encourages you to do this and so on, there isn't a license that actually covers you, right? Because you have media in there so Creative Commons license maybe can cover the media. You also have code in there. Different types of code, maybe just scripts or maybe you have some kind of file code or something and so that's maybe pushing you a little more towards GPU. Even like Creative Commons says, don't put code under our licenses that we don't really want you to. So my idea was let's have a license that's actually tailored towards this type of research output and it's important because scientists need to be encouraged I think to put this, all their entire research product out on the web. There's one more interesting issue in the release of data, which of course data is not copyrightable, but the arrangement of the data is and a lot of scientists when I asked them, why aren't you putting your stuff out there? Like why are you just doing this? They're like, wow, it took me six months to put together this data. I'm not gonna give it away and that guy will just like get a paper off it really fast. And so, but of course that's very anti-science and it's actually not in their best interest. If you stick your stuff out there over and over you see that people work on it. They get more interested in your problems. They cite you and it's just like this, you just get to the truth of the matter I think as you have more brains working on it and that actually helps your career. So I think the scientists are being a little short-sighted in doing that, but I mean I understand where they're coming from. So I thought if there was a way to encourage them to release it and so my license is essentially just worried about attribution. And if they use the certain arrangement of the data that you've come up with and they have to attribute you, I mean the currency of science is relaying citation and attribution. So I think that will encourage them a little more. So that was my idea was to have something that uses the creative common aspects for the media and so on and maybe some of the GPL aspects for the code. And then also includes explicitly this arrangement of the data sets in there. I'm concerned really only about viral attribution and if that's what the point of the license is. So that in a nutshell, that's what I was working on. So lots of ideas about democracy in the framework and ways to think about it as well as, I'm sure Wendy and other 12 questions on the license, but open it up to the floor for questions or debate, discussion. Yes. So the fact that yesterday science comments released an open data protocol dealing with exactly what you've been working on and I'm also for science comments and would be delighted to work with you. Okay, well that's exciting. I have a couple of points. One on the license, I'm vaguely aware that the neuroscience community has been working on some kind of arrangements for sharing of the basic data on scanning for instance. And I don't know whether you're aware of that, but again. It wasn't, but that's exciting. Okay, but what they've done with that, I don't know, I'm just, you should be aware that they're out there. What does your license do with respect to this kind of attribution? Because is it time to rethink what authoring a paper means? I mean, is there time to say data authors as a kind of a subset of authoring, even if you weren't? Yeah, I don't go that far. Like I don't really question the traditional norms and I leave it up to people as they have before to decide who a co-author is and what the wording contribution is. And so I haven't really put in anything explicit for the data, but the license does have a component that you're pointing to, which is suppose I use your data, then I'm attributing it, right? But a simple citation doesn't seem to do justice to the fact that if I put my data set up on there and I spent two years getting the data and you, for your some clever, a new way to draw some statistical conclusions out of it, you've done something clever, but just saying, oh, and the data citation is doesn't seem to be quite enough. I disagree, and here's why. I think the scientists in that field, they understand what it takes to create the data and attributing it to someone is non-trivial. I mean, I think that's a really big deal. And like I said, I think this attribution, that's why the scientists are doing it, just to get the attributions, right? I mean, it's not gonna be claim and forth fault, maybe, but that's not really what they're about. And then people getting quick papers from your data is a good thing for you, because suppose you put together a really great data set and there's 50 quick papers, it's 50 more citations for you and probably 10 years, right? So I think it's good, and I think people recognize that there's differing levels of work behind every citation. And then finally, if I may, before I see this work, sure. Can I follow up? What addition to attribution do you have in mind, Oliver? Well, I'm just thinking that in scientific literature, there's traditionally the authors of the paper and that string has gotten longer and longer in recent years as the collaborative nature of science and also as people get fairer about giving credit to the actual people doing the experiment rather than the professor overseeing it. And then there is the bibliography at the end where you cite all of the complications that you've already learned in your thing. I was the scientist who sat down and did the 12 months and you then take that data set. I just think it might call out for a new kind of citation, like a super citation, which isn't just, oh, by the way, and here's the literature on the subject. But by the way, you know, this person had you actually called them up and said, when you collaborate, they would become a co-author. I mean, normally if you did that kind of reliance. Let me say two things. One is who becomes an author and who doesn't on the paper is very field dependent. And so I think in biology, they have even certain protocols for the authors listing, like the guy who owns the lab, who's the big dude, we go at the end. Whereas in my field, that's like you should have put them in order of priority or I've always done it alphabetically, which kind of sucks because my advisor starts with D. So I was on the front. But so I mean, I understand what you're saying, but I don't know if I'd want to impose a hierarchy over all these fields that have their own way of doing authorship determination. And then I had one other point that I've forgotten, but it was a friendship. You can chat a little more about it. On democracy, just while I've got the floor and before I see it, very interesting descriptive work, I think. And the notion of bringing some statistical rigor to the good, let's get data and mine it as fast and as hard as we can. What I didn't feel you necessarily really had yet, and this is, I'm not sure anybody else does, except maybe a few people at this table, is see a model for how it works yet. Because data is interesting, but it's best and most interesting when you use it as a tool towards then saying what's, are we dealing with a neural net kind of operation here? Are we dealing with, what's the model in all that? And as an economist, economics have models. Yeah, yeah, yeah. So I'm going way out on a limb, so I don't, I've just kind of started looking at this. But it seems to me this is something that ideally, you'd like to have a lot of repeated episodes of this. Like we can say, oh yeah, you know, look, the internet came in and then this happened. And that's just really difficult. It's like other social science problems where you have all sorts of confounding factors, most of them you can't even measure. So I think, I think my impression is the place to start is really the case studies with this until we get a stronger idea of what exactly the model would actually look like. But I'm not, I'm not convinced that necessarily one model is going to be appropriate in a different area. I mean, I just, I don't, I'm just, I'm going off the cuff just sort of saying, I haven't really thought about it. But ideally that's what, that's exactly as a scientist it's very appealing what you're saying, like that's what I would want to go for to find out why they're guiding principles here. Can we draw any rules out of it? I think one of the things we've sort of found in no small part around this table is that these case studies end up being enormously controversial. And so what was interesting was in sort of listing possible scenarios, you actually hit sort of three four topics that have been big controversies around this table. Mary's written at some length about Korea in 2002 and there's all sorts of questions about whether the internet had any meaningful influence at all. The Korean Solomon case is one that I've written about to a huge extent and the main retort to that is to basically say within Egypt his case isn't interesting at all compared to Abdelmanam and some of the other people that Egyptians are actually supporting rather than people outside of the country supporting. What's been really hard for us is getting beyond anecdote sort of into data. And I don't know if the answer is necessarily where we sort of build a model and then try to sort of make anecdote fit within the model but I guess where I'm sort of coming at all of this and where specifically your background is very interesting is sort of questions of testability. And so I think sort of recognizing that you're very, very early on your work on this how do anecdotes like this sort of turn into testable statistically rigorous questions because we're having the same problem here. We've got the same set of anecdotes. The bad news by the way is that there's only 16 anecdotes in this field. We can write it all on this board and then we're done with them. The question is then how do we get from those 16 anecdotes after we argue over them into something that actually becomes statistically tested in some fashion. No, that's a great question and of course that's the elephant in the room. How do we actually do this? So I wish I was a little more, it's sort of talk to where I probably will have enough knowledge to answer that a little more coherently. I am worried about the amount of data that's out there. Like even aside from just the anecdote issues, I'm not convinced that North Korea is gonna turn over a bunch of accurate data to ask for example where we're having. And so it seems to me like a really difficult problem just even getting data that you can trust about any of this. So I had a few ideas, I mean that's why my slide about the empirical stuff was fairly limited. I had some ideas about how to get, we could just track IP addresses coming from, like I don't know, maybe talk to Google and see if we can look at where they're getting their search data from and then we can get an idea of pervasiveness or something. I'm probably talking about things people have long debated here. And then with regard to the anecdotes, I think I would probably have to dig into them more deeply and see if there are things that could come together in a data structure. I'm not exactly sure. Like 16 is a problem, but maybe there are a rich enough characteristics that we can ever come up with. I don't know. Yeah, it's a really big problem. Yeah, yeah. What's your sense of, and sort of the social science, literature and political science about how do you think they're using statistics and empirical analysis around, for example, maybe the democracy in the US or the US blogosphere? And do you think that's sort of transportable to models overseas where you may have very limited levels of sort of internet penetration, especially like over-filter in the Middle East? Yeah, so I wish I was more knowledgeable on literature, but narrowly I would be for a talk, this is short notice. I'm not exactly conversant with precisely what they're doing. But I can give a broader impression, which tends to be a very heavy line on some use of regression. I don't think they need to stop there, and I don't think it's the tool that solves everything. Probably that's what they're doing and trying to structure their problems around regression models. As far as how it carries over, I'm not, it's just, I need like a couple of days to answer your question. So it's basically mostly like, and this is actually still the case from like 1993 to now, trying to figure out which way causality goes between democracy and the use of the internet. It's Chris Kenzie in 1993 at the RAND Institute, but that's pretty much been the kind of most useful quantitative thing that I've seen, and that's really kind of not that useful. And the problem with all of this stuff is isolating factors, because it isn't just, if it were as simple as here are two factors, we believe they're correlated and that were as causality, that would be fine. The problem is, this isn't an isolated experiment, there's 50 other possible factors you can throw in there. They may actually have stronger factor effects than the internet introduction or all of this. All of this, as you guys know around this table, I'm always tempted to tell you that the mobile phone is probably a 10x important factor on top of any of the internet factors. So how you isolate any of that and pull it out of the data, particularly when the data we have generally, the other thing about this is with those 16 addicts, you only know they're important two years after the fact. So then going back and trying to retrieve, whether it's search records or phone records or whatever the hell it is, you would need to know ahead of time that it was going to be important so you can then go and try to isolate. Yeah, right, right. And there's also maybe some hindsight bias there too, like you sort of look back and see, but really if you look at a bunch of spots and I see what you're going to want to develop. Just to follow up on that, I was just, I was interested in your number two point about the tools, the kind of new quantitative tools for complex reasoning and complex arguments. And I think I've sort of vaguely thought about those as kind of next steps to getting past some of the regression stuff and trying to understand things. I don't know if you're familiar with like NetLogo or Stella with it. They're basically tools where you're trying to map out like what you said about Hillis's collective reasoning tool, trying to map up a very, very complex situation and all the parts of it, in something that people can look at and say, hey, that's wrong. And I think there's something there that would be great to hear more about that as a statistician along those lines. Yeah, that's actually probably, like personally that's probably the part I'm most excited about. Yeah, I mean, there's a lot of stuff out there and in graduate school, they like to tell you that that will replace, you know, a lot of the stuff in social science along the regression. I'm not even saying that, but it's pretty ambitious. Yeah, there's some crazy people around. Um, David and then Mary. Sure, this is a question exactly. It sounds like we're waiting for history to happen. We just, it hasn't happened yet. We're waiting for history to happen. And in the meantime, how do you, how does anyone gather data? We don't know whether, where the effects are gonna be. We don't know, maybe the big effect will be on campaigning and not on democracy. We're labeled beyond governance and not so much on, you know. How does anybody get the careful around this in statistically significant ways since we're still waiting for the history to happen? Yeah, no, that's the key question, the hard question, especially when you're dealing with countries that you don't have the reliable way of watching methods and release methods. No, I think that's exactly the right way to think about it. I mean, that's how. I was just gonna ask a question about taxonomy. Though it seems like we're now going from the opposite direction of so looking at the data and then taxonomy, but I wondered why you'd pulled education out of dissemination, because it seems like education would be part of dissemination. Yeah, I think, I think you can argue all about taxonomy, I may not be doing it right. The thing that I just thought was compelling about education is you can't kind of have democracy taking off people who don't know what it is. So I sort of gave it its own little area for that reason. Because even, because I guess there's some sense in which even if you can just get some ideas out there, then that can really just have this phenomenal sort of law effect, look with those particular ideas. And so it seemed worthy of its own consideration rather than the first one, which is really just let's get a lot of information out there about issues, whatever people are voting on and so it's a little more fundamental. But I mean, I think all of these overlap. So I think, you know, it's a good point. Well, true to type, I'll ask further questions about the license, because I'm curious how you've attacked the problem of attaching rights to something that many of us would argue intellectual property law doesn't cover and shouldn't cover. So like the thing, copyright collection of data, where are you? So I think there's, I think that, so the code that I've been exposed to that I've released and that, you know, I've been in collaborative groups, we understand that as copyrighted. So we always put little copyright tags on the bottom, even though we know that we're just gonna copy it and steal it, which I mean, it's fine, right? Like that they're just gonna reproduce the data. So I don't know, like you mean the scientific papers themselves aren't copyrighted? No, you were talking about a collection of things and programs, their functions wouldn't be copyrightable but the particular way in which the program was written might would be that, but the data collection where nobody else is interested in the arrangement but rather in the raw data itself. That the license doesn't attach any kind of copyright to you. I don't think there's any way to do that. I mean, that's just the data. I mean, like- So you haven't reached into contract? No, I think that's a lost argument, I think. So my goal was just, like I was making several assumptions which is people were hesitating about releasing the data because they feared they wouldn't have attribution for it and so I was trying to put it a little more in people's face that they needed to attribute to. That was one of my motivations for at least for the data arrangement. There really is, I think there are a lot of deep questions here. I don't think there's anything a license can do about things like if I look at your data set, your hard plot in one data set and I do just strip out the contents and just use it or strip out one column or something and use that. Hopefully, the scientific spirit people will attribute but the license certainly isn't gonna make that something that you can super damage or anything like that, right? I mean, it's just data. So is it trying to strengthen the scientific norm around attribution even if there's no legal recourse? Yeah, and just I think make people a little more aware because I think the scientists are easily, I mean, I'm a scientist, I'm gonna say they can get lazy or I don't quite wanna say they're greedy but maybe, I mean, I don't know. And so I was thinking there is a marketing aspect to the license in that it makes people really aware that there are things you need to attribute for and in case you weren't aware with the data then that's something that you might wanna think about. So I think, so that was something that was actually compelling to me was this sort of marketing aspect for the scientists. I was gonna suggest as I hear you talk that the license may not actually be the right context that perhaps what you need is a data cooperative because it's what you're describing is you can't through the license create obligations of disclosure but perhaps you could create a cooperative which limited access to people who would in effect agree in advance to abide by the rules of the cooperative and that would give you a different kind of hook into it. So I think you're thinking a little bit of the neuroscience example there as well which is, I think that's just a very specific and different problems have specific data associated with it that people will know how to work with. And so I think what you were actually postulating is a whole series of cooperatives that people would generate around whatever data source they're using or something like that, which I guess it's possible but I don't see any reason why they couldn't also work together, right? Like you could have a cooperative if that's what the scientists think. Well you can solve by contract but you can't solve by intellectual property to some degree so that if you're stumbling over well the data sets are once you put them online what can I do with them because you can take them anyway without the license. If you make them available through a context where you have to affirmatively agree to something before you go in, then you've got a different then it does become at least potentially legally blind. Yeah, so the overriding one for me here is that I want to disseminate as much data as possible with as few encumbrances as possible. So if they're gonna steal, I mean, I guess I could do something more rigorous about this but maybe there's some trade off between the amount they're gonna steal versus a tribute and hopefully the license gives that balance a little bit but I'm willing to allow some theft for getting the data out there and so on. So, I mean, maybe an affirmative agreement would work in some contexts like especially when you have really difficult lab data that's very expensive to get and so on but I'm not sure that that barrier is right for all contexts. I would agree there but if you're postulating that your basic problem is convincing some scientists who said I just spent a year doing this why should I put it out there for free? It may be that actually erecting a slightly higher barrier to folks getting to it will be more persuasive to them than, oh, you know, be a good scientist and let it go which is what you're saying isn't working with you. I think that's too strong of it. Like I don't think I quite said it doesn't work. I think by and large it is working but really what I want to do is I want, I guess there's several prompts. Like I want scientists to put everything out there. Like I think when the data's out there I'm not so sure that there really is much of a theft problem but there is a perception that that'll happen. And so maybe I can help encourage them through that and then just like in getting all the research and getting it to be the norm to put everything you've done out on the web so people can really copy and understand everything you've done and build on it and change it. And there's all sorts of exciting things like if that can happen more in the, you know, sort of open science up and people can, if they've got MATLAB or what have you they can start clicking they've got your data they can figure things out and read stuff on the web and it's not something that's only going to happen in cloistered halls of Stanford. What have you? Which I think is really interesting but so ideally I'd like to have, you know NSF require something like this when they're giving a grant set that you need to go ahead and put everything out there under this license and they have the viral attribution but then maybe that's ambitious. But it's in line with their philosophy in the grant. Terry. I want to go back to Stephen's question about how you get beyond the 16 I don't think that you can draw in your training and expertise to think about how looking forward we could do better. So two possibilities occur to me. One of them is we often know that there, that a critical political event is about to happen. A major election convention. Is there some way in which we could enhance prospectively our data gathering capability so that after the fact we didn't be in a position to better assess the relative roles played by different factors including one or another usage of the internet. So imagine some Eastern European country or African country on the brink of one of these. Is there some way in which you could think about ahead of time what data gathering techniques would be useful? Right, that's one. So the second thing is, as you know better than probably anybody in the room, there's been a lot of fairly fundamental work recent things that sticks. We're moving away from old models of controlling variables that distort your understanding of the impact of a central variable on an outcome. Moving from that well-established traditional model toward different statistical techniques designed to isolate more rigorously the impact of one or another variable. Jim Griner here has been doing a lot of work in that area. So these new techniques offer ways of addressing so even speculation that cell phones are 10 times as influential as the internet is an example deliberately musical of this generalization. Well, that's what statistics is designed to address is the relative magnitude of effects. So what about either of these as a trap that you could use your skills and get us further? So for your first question, I'm not sure exactly, I guess there's two components. What data do you want to gather and then how feasible is it to gather and how do you actually do it? So the data that you'd want to gather, I'm not sure off the top of my head, I know exactly. I'm sure I could probably find out in five seconds in this room what they would consider important. I think one thing that I would say is, it's true that we can isolate where we expect big effects to happen. It might be interesting to, so this is about controlling for factors, which I think is still a normally important part of doing the research. Maybe we can sort of pair areas and find similar, like somewhat similar areas that aren't undergoing the change or doing a different kind of thing and maybe do something comparative. So that's just like a first thought that it might be possible to do something like that. I'm not convinced it would be always possible. But... Otherwise similar regions within a country? Yeah, within the country. We're just, for some reason, we have a compelling reason to think it's similar in some ways that are important to us. Yeah, so maybe we could understand and isolate some effects like that. But this is where, I mean, statistics just, it's not a silver bullet in the sense that it can solve every problem. Like sometimes there are just problems where there's just a huge amount of noise in the data and the techniques we have just can't bring out signal and it's there. So we do the best we can in terms of arrangement, but it may be the case that some of these problems are just actually difficult to measure. But I think it would be interesting to get into actually understanding how difficult they're trying to do it. And then in the second point, I'm not exactly sure what techniques would help. I need to think about it a little more and see if there's something that could obviate this, the traditional sense of controlling variables and so on. But I think no matter what we do, we would end up in a situation where you're gonna have so many confounding factors that are changing rapidly. I mean, it's your traditional social sciences problem. So, I think- But I think Tara's onto something with it as far as trying to figure out how do you, one of the problems with this is that when we're working purely from anecdote and particularly when we're working in retrospect from anecdote, we're basically looking for extraordinary situations, right? In the fall of Suharto, we believe that mobile phones were interesting because, well, suddenly a government fell. And we look around, we saw a whole lot of people with mobile phones. But if we sort of look in comparison, we see several thousand other situations where there have been governments that haven't fallen despite several thousand mobile phones. It might be very, very interesting to sort of look for moments which we think there's generally political cleavage of one fashion or another. So maybe a subset of elections where we accept them to be unusually controversial or elections that we expect to be unusually close. And then do we then start attempting to collect certain types of data going through it? Now, the data collection, I think, is hard. I mean, you mentioned Google search results and a whole bunch of us that have immediately raised our eyebrows because we tried to get those. You ain't getting those. Similarly, call records from government mobile phone carriers, it's not gonna happen as much as we would desperately like it to. But I think that question becomes if we sort of line this up and said, I know that Ghana's election in 08 is gonna be fascinating. What would I want to get ahead of that? And then looking at Ghana's election in 08, who else would I look to comparatively around it? Would I look for similar income? Would I look for similar levels of tele-density? Would I look for similar levels of internet penetration? And then what would I try to collect around maybe a two-year section of elections? If I were able to do it for five nations around five elections, what data would I look for? And then what would I do longitudinally with that data to try to get some results? Yeah, so the data you've described about income and all these targets, that will be my best guess too at this point, like doing just gathering that kind of data. And then I also had the idea of doing it for similar regions like we were discussing. I also like the idea of maybe not even a priori focusing on a hotspot, like knowing Ghana's election is gonna be interesting. But I mean, this is, you know, assuming money's no object. Even just maybe even looking at random places and then get the data for say like 10 random countries or something like this. And then see if we can identify any patterns in the data that lead us to anticipate hotspots coming. I think that would be an interesting way to look at it too. As far as analyzing the data, I think there's several different types of things you could do. I mean, I think what would be interesting is just doing time series analysis and then looking for any kind of anomalous state shifts or anything like that in the data. We could also look at do different types of panel regressions on the data across time and things like that. So I don't, I mean, those are probably things that you thought about, but without actually looking at the data, I'm not sure. And I mean, I wish I was more of an expert in what data's available. I should add quickly too, we gave her literally no time besides the flight over to prepare her speech. So, just sort of, yeah, ways to think about it. Yeah, Jonathan. Question is at the intersection of the computational sciences research license on our open network. We're hoping to figure it out what countries filter what and when and would love to release that information to the world. Some among us are cautious about it because if it works, we might be giving a roadmap to other countries that just tell them where they missed a spot. You know, oh, the Saudis are filtering it, we should filter it too. So I'm curious how you would deal with that maybe as a policy matter, so the SRL and us are very distinct to what we're doing and not really more generally. And secondly, if you can think of any creative way as to how we still might basically share our information with the world, but make it difficult for it to be useful for bad purposes. Oh, that's a really hard question. That's why I thought I'd put it here. Yeah, I don't have an immediate way to release data and somehow not to people who are bad. You know? Yeah, I mean, but you're making an assumption here that when you see that this country has a more strict filtering policy than another country that that will somehow signal and cause fall. But the reverse might happen, right? Like one country might be like, wow, I just look like a real jerk compared to my neighbors. Maybe I'll scale it back or something. So I mean, to me, I don't know, maybe I'm really naive. It's true that the problem could just evaporate. But I mean, I'm just saying, I don't think it's necessarily clear that there's gonna be this race to the bottom of filtering, right? I mean, maybe. And in which case, and if there isn't, then we don't need to be as concerned about it. But yeah, I don't know. Are you thinking maybe like some kind of license that allow, I mean, I guess it could allow like certain countries to give me data and that others, I'm not sure that it's gonna leak out. Well, one of the bigger questions actually is not so much what does, whether the Saudis filtered or the Chinese don't. But what have we tested that we thought might very well be filtered but surprise, surprise isn't? That's actually really interesting to know. And you can see like the hay out of it, academically speaking. Right, and then they don't go. But that really does get into the eugnis to spot count. I'm really disappointed for those of us who keep waiting to get filtered and then you don't get filtered. It's really, I mean, for Saudis, we just, you know, what do I have to do? Yeah, I don't have a really great solution, but I'll have to think about it a little more. And if we accepted funding from an agency that gave us a big pile of money under the computational sciences research license, would we be up the creek? In terms of releasing your data? Yes. Well, I don't really see how in the sense that, well, okay, so there's other pieces of this computational sciences research license. One other little piece I've kind of glossed over in that I said that my concern is the viral attribution, which is really my concern. So I'm dropping things like Shara-like and forcing like the entire next research project to come under the license because my goal is just to get research product out there and used as much as possible. So I don't, so I think maybe you could skirt it a little bit that way and that you bring certain pieces under, but you can kind of do what you wanted with other pieces, maybe. Yeah, so I started calling this, like I do call it the CSRL and I started calling it the, oh no, wait, I think I actually misnamed it. So I called it the computational research public license first and that was crap all the time. Yeah. As far as data collection, if it's a big problem to get say large sets of data from Google or from government, have you considered a possibility of doing something like random sampling by say setting up your own website, Ghana Elections 2008 and you can do the same thing for a survey or something like that? And what kind of potential would that have sort of doing your own? Yeah, no, that's a really cool idea because of course you can see the IP addresses like as far as that is informative. Yeah, I like that idea. So I guess what would be interesting is if you set it up for a variety of different countries and then you just looked at what countries were hitting your site more but you'd have to really control for the link, it has to somehow get into this like communication backbone and people need to be somehow equally aware of it in order to really assume that it's reflective of the pervasiveness of the internet in the country. So if there's something that could make us feel okay about that or at least we had a handle on any kind of bias that it was introducing, which it sounds like we could because we can kind of map these backbones a little bit. Then yeah, then that's a pretty interesting idea. Back to the data question. I'm probably gonna make more of a statement actually, they're not a question but if you can answer my statement, no, it'd be great. It seems like there's a mismatch between what we'd like to observe and understand being democracy and data in order to assess it. There's not enough observations out there. We're looking at the movement of the mob. We're looking at large scale things. We're looking at revolutions and there aren't many of those out there. There are statistical methods used for small probability things but even there, we're looking at the internet. We're looking at cell phones. All the, what we would like to be independent variables are so highly co-linear that would need monster data sets in order to carry that out. There could be perhaps plentiful data in the spirit of what Terry was saying. If we were looking at the actions of individuals, Josh just wrote a case study about Ukraine and if we had observations on individuals saying who went out into the cold on those nights and who did not, that would be fantastic and we'd learn a lot about that. How do you figure out when people are gonna go into the cold and get a sample of them? I don't know, I think that would probably be interesting to think about in that sense. RFIDs. Was that tanky? Subcutaneous, it'll be great. Cell phone traffic. These are the temperature. I mean, the other question is, the problem is just not statistical. I think the other problem is in theory as well, which is I think that the theory is the length of the individual to the mass movement are very weak and in order to put all the pieces together people would need observations on individuals and you need an understanding of what the actions of those individuals in the literature are. Yeah, no, I think that's right. And I feel a little bit like, I wish I could say, yes, with my statistical training all we need to do is this, you know? But yeah, I mean, I think the lack of data is like an enormously convoluted issue. I mean, drawing off of that, when I was thinking about the case that I did about South Korea in 2002 and I was thinking what data do I wish I had? And I wish I had data that connected people's online activity with their voting activity, which is, so I think, I mean, that's probably something we don't really, wouldn't really wanna know about hits because that would just go like hairdressers. We wanna know what was the offline activity of those people that were reading boards. And that seems like something that you would, unfortunately, be more of a survey. Yeah, I mean, that's what, in my mind, I'm coming back to a lot of the questions of this idea of survey as well, which is really expensive and very difficult to do, I think, in countries where you are obviously a foreigner. I mean, I haven't. But yeah, that would be really interesting. I mean, we may be able to find proxies for that. Like if we have like some Ghana, like Uganda or whatever. And maybe some IP addresses are coming back a lot and you can see that these people are engaged and maybe we can get like a sense of political engagement through that, even though that, I mean, we'll have its own problems, but there might be something that we can even do with IP addresses. They're just looking at their patterns. Yeah. One thing that ties into that, the most interesting thing I've seen is the George Washington University. Everyone's been looking at IPDI Institute for what? Institute for Politics and Democracy on the internet. Right, so they were looking first at the 2000 election and the role of what people did online to kind of, or it was how influential they were based on a number of activities they did and it was just an online survey and that someone actually did that in the Ukraine case took the same methodology. But again, that's just a random simply like based on that banner ad or something based on what people can do. I don't even think that's a random sampling, right? Like, no, I'm calling in a case for a survey because it was actually, it was based on a huge list of registered voters but it was only people who chose to respond to the survey. So they totally knew that they were already getting people who were online active, but their point was they did a survey on a lot of other attributes, like what offline media they consume, what groups do they belong to, how often, and they had their voting, the other thing was they had their voting records. So they asked people whether they voted in the election compared to some of them were lying. Or they forgot anything. That's probably the place to start in terms of someone who's tried to mess with methodology. I absolutely agree with that, yeah. Yeah, it seems like it would certainly be cheekless to rise pale. What about exit polling? Because exit polling in other countries? I don't know. An international Republican institute. You're thinking there could be an exit polling question about the internet use to that particular voter. Yeah, there's one, but you know, it's nothing to the chance. Yeah. And maybe cell phone use too. One of the reasons why I ask who does it is that some of them may be media. The United States has certainly networks that are engaged in this, but you hear. And they carry the huge cost of this. And I think it's weighted to allow research enterprise to be built upon and infrastructure that they've already created. Now it's not, you know, elections are not the only thing in politics, but I'm just sort of groping here for some way of. Elections are useful because they're scheduled, whereas most crews and demonstrations. So just in terms of data collection, maybe as good as we may start working on this. I think that pool exists. Well, we've already gone nearly 15 minutes over. So please join me in thanking Victor.