 Hi everybody. I'm Jonathan Citrin, and I'm one of the people behind a new Institute for Rebooting Social Media, hosted at the Berkman Klein Center for Internet and Society at Harvard University. I can talk a little bit about the theory of that center, which is, we are at an odd moment as collective guinea pigs in an experiment that's been going on for about 15 to 20 years now, without a whole lot of institutional review board clearance, in which we see if we completely rearrange the ways in which we meet up with one another, interact with each other, get exposed to links and news about one another and the world, and experiment with who assigns what to whose newsfeed. Interesting things will happen. Really positive things can happen and really awful things can happen in an era in which we're not entirely sure what we would want to happen if we were to decide it collectively whatever that means, and where we don't trust any institution to implement what we would agree upon. We have a lot of dilemmas about what to think about this space and certainly the notion of rebooting it is a way of saying, we should both be taking up issues around incremental and productive changes that intercede and current and allow ourselves kind of the chance to brainstorm around big changes and the true rebooting kind of sensibility and our new Institute hopes to be joining what is already a really rigorous and varied from many quarters effort to join in thinking about this. It's both exhilarating and terrifying that nobody is in charge here. And those who might be the closest to being charged nobody thinks should be in charge. And we're the last people to think we should be in charge but we don't want that to translate into an abdication of responsibility or thought, either. We hope if you're out there listening to this right now they're live or later, finding a link to it on social media if it has not been shadow band. I'm just kidding, I think we hope that you'll join us and we'll be offering over the weeks and months different ways to join the activities the Institute or offer up what you're doing and have a chance for us to help amplify that and vice versa. And in particular we have a call for visiting scholars right now that's up on the website, I'm hoping that someone from our team can pop the link to it into the chat room. And if not I'll find a moment to do that. But we really hope that you'll join us because this is the kind of thing for which it touches nearly everybody even those who are abstaining from social media use. And for which public participation seems particularly vital. And I'm really glad now with the help of Hillary Ross will marks and Madeline Metsui to welcome my colleagues that you see also a raid in the zoom screen who each in their own ways bring an incredible wealth of expertise and deep thought, and pressing questions around these things and baby have no shortage of ideas around these topics as well. And we thought we would focus today's conversation on the private collections of data. In the first instance available only to the likes of say, especially in the US context Facebook and Twitter. And what how to think about the use of that data to understand the platforms themselves, and maybe something more beyond it because if we're to actually assess. What's a problem and what isn't what is just me an anecdote versus what is really a systemic issue, having information having data and having people independent from those who have a great stake and what the data might show seems quite relevant. Again, we have a wonderful group of people to have that conversation I just like to introduce each of them, give them a chance to set a small fire metaphorically speaking, and then open it up to a fuller conversation. Going alphabetically weirdly enough our lead person, it begins with P Nate personally, a longtime colleague who has a wealth of data behind him in his work, thinking through elections in the United States. Nate, you're the James B McClatchy professor of law at Stanford Law School. So great to have you here. Tell us a little bit more that you think might be relevant about your background and anything around what's keeping you up at night what you wish would happen legislation that you might or might not have drafted and whether you think it should pass anything you'd like to share. We're all ears. I'm not sure what the legislation I might or might not have drafted let's let you know the answer may surprise you. I look forward to the time by the way Jonathan when you go alphabetically and Z train is the first I've got even lower on the list. That's right. That's right. So, let me cut to the chase here a little bit, which is that I have for the last five years been working with the platform, particularly Facebook trying to develop a system for outside researcher access. I've both learned a lot and lost much hair in the process, and it has been extremely frustrating and difficult. Not because I mean some of it is because of the, you know, to use a tech term the general Michigan gas that afflicts these platforms but the also just the legal environment which are operating and other obstacles that we confronted in trying to build social science one. Let me which was the effort to try to get platforms to share their data. Let me just start with the 10,000 foot view which is that the current status quo where the platforms are the only ones who can analyze their data. And we have to wait for whistleblowers like Francis how can to blow their whistles is unsustainable. And so that we cannot live in a world where the only people who have the insights on what is most of human experience right now are those who are typed the profit maximizing regulation of these firms. And so therefore I became convinced that government regulation is really the only answer here. It is extremely difficult to craft a law like this, particularly because you want to make sure to protect user privacy. So privacy has to be first and foremost in our minds but I want to make clear that the question is not whether the data will be gathered or analyze the question is whether the only people who can analyze it are those who are tied to the firms themselves. And so the legislation that I put up that people can see my Twitter for you elsewhere, and that now Senate staffs I'm hoping are adapting to their own, you know, preferences attempts to create a, you know, federally mandated system administered by the FTC, which would be a platform to share data on under very secure privacy protecting circumstances with outside researchers that are vetted by a combined process of the of the FTC and the National Science Foundation. There are 100 different ways to skin the cat here right there definitely you could there and I don't have strong preferences except for the fact that these platforms should be subject to this kind of oversight and access that we need to find some ways to break up their monopoly on the insights that can be derived from from their data, and it's independent researchers not those that are selected by the platforms themselves that need to be the ones that have access. I'll just emphasize that there are, you know, as I said it's not just that the details matter they're the only thing that matters it's extremely difficult to craft this which why have been working on this for seven months but I look forward to incorporating any suggestions people have and more importantly I put it up as Microsoft Word documents so people can use it as a template that they can edit and you know call it their own and whatever not I don't own this field. There's a lot of us who are working and many of them on this call. So I look forward to hearing what others have to say. And Nate, can you just explain why it's not like we feel the great press of need for car dealerships to tell us data about the customers and what they bought, or grocery store customers, or I don't know clients of accountants it's different that I hear the sense of urgency and passion for the federal government and the Trade Commission mandating a whole bunch of complicated things for data sharing. So I think there's several responses that the first is these companies are unprecedented in their power and scope. And so they pose unique dangers, both in the information ecosystem and democracy, or at least they are alleged to pose those dangers. We are in a position right now where policymakers are necessarily legislating in the dark, because they don't really understand what the problems are because they have to trust the platforms to express them to that so that I think is one critical, critical fact. I also think that, you know, whether we're talking about antitrust or we're talking about privacy legislation or content moderation and the like. We need, well, let me put it this way, if they are being watched by outside researchers right if there is some kind of transparency, it will change their behavior just by that fact. Right. And so that I think it's quite important that we have a rigorous system of transparency and frankly there are other industries you're right it's not independent research but we do require other kinds of disclosures. There are other industries but these are incredibly secretive but powerful institutions that control, you know, a lot of the communication that's happening in the US and elsewhere. Got it. All right, thanks very much for setting the table, and this was BYOL bring your own legislation and I'm delighted you're willing to share it. So the like, you know, one one dip once for per person we can edit it and do the wiki thing with it. Now be a side. Thank you so much for joining us. I feel like you've had a kind of front row seat documenting for the rest of us not in the room what's been going on out there and occasionally are on the stage as well. Institutionally speaking. Tell us a bit about both the markup and the kinds of stories you've been pursuing and views you've been putting into the public sphere about this stuff. Totally I would love to thank you so much for having me. I come from sort of a different tradition than the research tradition I am the head of a nonprofit news organization called the markup where we do a lot of independent adversarial working into different platforms. I also spent the last decade as a media lawyer where I spent a lot of time doing transparency work like First Amendment right of access work or FOIA work. And so I, I am very heartened by this moment and the framing of this panel, but we're not saying, what's the solution, right, we're trying to figure out what is the right system. This is a system of checks and balances for very powerful actors and it's not the first time in the last hundred years that we've had to do that kind of thing and I take a lot of solace in that right so so continuing to set the table. Right like in the 1930s we saw that banks had a lot of power, maybe enough power to tank the entire economy and then we saw the rise of the sec, and then generally accepted accounting principles, and I know Ethan has a lot to say about inspiration that can be drawn from that. In the 1960s we saw the rise of the administrative state and a variety of actors in the government with extraordinary power that were not elected officials and the desire to have the freedom of information act as some way to like wrench information out of the opaque and at times necessarily going on of the federal government and then also decades of access litigation right first amendment right of access work saying we get to have access to what happens in the courtroom. What happens in court records, which often contain trade secret commercially and interesting information as well as private information and we have a system where we've seen that unfold and that's the area in which I've litigated and reporters that I work with reporters I've represented in the past do quite a lot of work to and then of course now we're in this moment we're like big tech companies you hold a lot of power over our economy and a variety of different ways. And now we're thinking about the system that we need in order to have checks and balances there. And so, you know, continuing from the sort of journalism media frame. I think it's a really interesting setup to be at the table with researchers in this moment because I would say that this is a researchers and journalists have different time horizons, they have different incentives, and they have different tactics of what they're doing and that's how I'll get to what the markups been doing. In terms of time horizons journalists are working on sort of a faster flywheel right we're taking in information and we're trying to report out in if not immediately real time close to real time opportunities to make sense of what is happening in the world so we had an investigation just last week about Amazon and Amazon's marketplace and you know the things that you see on the Amazon marketplace, also at times, many times being owned and sold by Amazon so Amazon gets to be both a seller and a holder of the marketplace what does that mean for the consumer right that's the type of research and journalism that we try to put out to the ecosystem and close to real time. So if you're in time horizon then then researchers get to enjoy right. The other is thinking about incentives right I think it was brandy nonicky from Berkeley I think is the first time I heard someone use the science compliance frame right so thinking about journalists engaging at an oversight as a form of compliance is our north star because it's the pursuit of science. What's super interesting about this moment is for all of the political will calling for accountability. They're not saying so for love of science 2021 is not a good year for the love of science. They're saying so because they want accountability and they want oversight which is a slightly different frame that of course researchers can also claim and should claim but I just want to call out how that's like slightly different in terms of the traditional framing. And what that leads to is a difference in tactics right so for journalists, including ours at the markup including others I worked with being truly independent and seen by the public as truly independent is and so that means that saying hey come on in Facebook is going to have you know some data that they vetted for you to look at to play around in is going to feel unpalatable. That's not going to seem like that necessarily always fits with the type of independence that many journalists would want. And so what that leads us to is a desire for having independent at times adversarial outside research we're collecting data directly from platforms, we're collecting, and I would use the language of news gathering, what we can see and observe from the platforms. There are pros and cons there, right. And so I think the proposals that to me are the most interesting are the ones that create safe harbors and space for that kind of adversarial collection to happen. There's a lot of complications and who's the right journalist who is the right methods who's actually faithful to the sort of privacy protective public protecting logic but that's, I think that's the kind of fire I'm excited to start here and get into. Got it. It's really interesting to see people looking to other analogies and other situations to draw from as you've kind of hinted at and even just putting a stake in the ground around the Freedom of Information Act where citizens and others journalists 10 years later, get properly redacted, appealed and then less redacted unless they do the PDF redaction wrong and then you just take off the layer. Documents from the government the government is obliged to once asked, turn them over is very interesting and also interesting when you think about, as you talk about checks and balances. Journalists doing so for researchers, those researchers of public universities might find themselves having FOIA obligations since their state employees, and often that's been seen as a truly like quadruple edge sword. It was potentially interfering with their work so something interesting to be worked through there, but I even sense just a bookmark it. I'm curious how much you're already parting ways with Nate because Nate saying there should be a state created an enforced way to have regular proactive don't even have to FOIA kind of data flows from the private parties. And I hear you talking about an adversarial system of the kinds of stuff you alluded to browser plugins with people willing to support the markup and then they share with you everything they're seeing from Facebook, kind of slurping data through the door, rather than having a palette of data dropped off by the grumpy trunk driver from Facebook who has to do it because the government said so. Are these complimentary reproaches if Nate's thing that works well. Do we still need you with a grappling hook going over the wall. Yes, everything. Yeah, well because you those are that's a form of check and balance to right if the government gives you something the agency says here you go and we say well wait a minute. We saw something different when we were observing it what explains the difference that's actually a helpful inquiry that its own form of check that happens on whether the agency is also remaining accountable because you know it's very funny to have come from a tradition of media lawyering where the government was the big baddie not giving up the information to move into this moment where we're like oh the tech companies are the big baddies quote unquote and the government's going to give us information it's a little bit of a whiplash right. So recognizing that we need all actor this an ensemble cast, right like we need everyone to participate in order to have the kinds of checks and balances we need, which is why I will give the hungry hungry hippo answer of all of the above, like we need everyone involved in this. Got it. I agree with that. And I'll say that in this legislation there is a provision that provides for immunity for people who scrape publicly public data site but but we can talk about this a little later but if if I could develop the research access for society bill instead of just research you know, just for academic researchers I would have written that but and so I look forward to others who can Is your bill Nate sorry to just going after the rabbit as I see it, would your bill be favorable to clear view AI scraping everybody's photos so they can power universal facial recognition or Cambridge Analytica where people. No, no, no, no, no, that's what I'm saying is I doesn't do it for private firms to do scraping it still is is nested in the academic framework. Okay, I thought you were saying no if you could you would. I don't know I'm saying if I could figure out a way to identify all the legitimate projects that should be allowed to scrape then I would do so the way that yeah, we can talk about that more later. Very good. Thank you. Thank you to Nicole Wong former deputy US chief technology officer, but you have been on both sides of the castle walls here. Having been in the belly of the beast as legal director for products at Twitter, and then moving to a different chamber of the belly to really mix my metaphors here. I think the council's vice president and deputy general council, known. I think it's a wonderful New York Times magazine piece years ago that said you were the decider. Trying to figure out the impossible balancing of censorship and law enforcement demands from governments interacting with the governments through a semi adversarial lens. Okay, I should let you complete your bio. But so curious how you're thinking about all this having seen it from so many angles. Oh, thank you. Thank you Jonathan for inviting me like this is the dinner party I've been at. I'm such a fan girl of everyone's work who's on this group so I appreciate all of it particularly to Nate thank you for actually putting pen to paper to give us something to start to struggle with because I think some of the questions raised by by research and access are really hard right the balance with privacy the balance with misuse by by researchers and trying to fit that and at least our US framework is is a really difficult task. I guess maybe let me start with like the question that that was circulated for this panel which is what is genuine data driven platform oversight look like this is where I started to put my head and I think I apologize because I have a cat crying in the background because she's really missing being able to be in the room with me so apologies. I mean I think I speak for everyone. When we say let the cat in. I think for that question of oversight, I think there are a bunch of embedded questions like oversight of what which platforms, what harms with which public, are we trying to serve who which public interest isn't because I don't think that we can guarantee it's universal. One of the things that came to my mind with, with Nate's legislation is like, are the issues are really looking at confined to us users which is where the authority would vest, or do we actually want to see things at a global scale and how do we decide what the public interest is as a global scale right so I think there are questions I have about oversight and on what set of norms and laws do we rest that oversight. And I think that that then once you can answer those questions you can get to questions like who conducts it with what tools should be required. And I think it's to what end so so I think one of the things to appreciate about the moment of why the research is so important is we're early in the regulatory process notwithstanding the fact that we've been working with this for 25 years. We are early and what you see our policymakers struggling is like, they're not sure what they're regulating right, because it is different to regulate privacy versus human rights versus misinformation and the type of data and tools you want in order to do that research are different so they don't know what to ask for, we don't know how to standardize necessarily compliance with what privacy is the most advanced I think but I think content information is is more difficult. And so I think that the research part of it is actually just probing, where do we get to the standardization and your question about, well we don't ask, you know, accountants and car dealers to unearth all of their information. And part of it is because we have safety standards around cars, we can compare a car that is fit and safe versus a car that is not, and we have groups that do that. And so we don't need to have this sort of research level. I think that they already happened. We need to get to that point. And they just did like sort of incorporate like the hungry hungry hippo theory that I agree, right and I think that just to put that in a framework. I think we want multiple layers of oversight, which means that the companies have to have multiple layers of, of making data accessible I think there's a layer of access for the end user, so that they understand the context of their experience and can manage it for themselves. And then I think there's a layer of access for journalists and consumer groups so like, how does consumer reports do its job it's because we can open up the hood of a car and take a look. And that's something that has to be evident. I think that companies need to have something that looks like that. And the question is like what is the standardization of what they must show for those groups to be able to access. That might be the API is like Twitter's academic API or something like that. And then I think what Nate gets to is like the hard stuff, right the stuff that is sensitive so that we may not want it just freely floating out in the ether so that you know the Russians and the Chinese and everyone else can get to it, but we want verified credible researchers to be looking at hard things. And that happens in a much more restrictive way I think all three layers are necessary. And the hard part is, again, going back to who are the platforms that we're focusing this on, what are the harms we're trying to address, who is the public we're trying to protect, those are really critical. Well, I kind of feel my anxiety level dropping as you kind of complete your introduction because you're like, look, I've sliced and teased it. Here's a way of not making this a big pile of spaghetti. So work on this work on this hungry hungry hippos all layers of things at this Institute sponsored by Mattel. And so all of that seems helpful as people are kind of feeling anxious but not really knowing in what direction to go. I'm intrigued by a quote that you offered up on Twitter a while ago, tech companies need to be upfront about the problems they cause willingly or not I'm frustrated by stale talking points about the promise of technology. Show the work, cynical attempts to play a policy debate or a disservice to everyone in the field. Amazing use of 280 characters like that was a lot of ideas packed into a small package. But I'm, I'm just thinking if you were to hit the trifecta, having been at Google and Twitter and now it's the perfect moment to join Facebook. Just as the ship is sailing high and tell us what you would say, what's the first thing you'd say in the room of Facebook executives gathered about having some free food and drinks and wanting your advice what should they do next in this area or is it all the details in the details. Oh, inviting me to like really step into a harness. I am separating this out from Facebook. What kind of like advice do I give to companies generally because I do, I do some consulting for companies overall. And it does no one any good to try to hide the ball. Right. Like, be honest and and and when you can't say something just say you can't say it and explain why right like, because I don't want to invite a bunch of litigation because people and do you think Facebook has been hiding the ball. I think that some of their blog posts are not that they seem at least I read them as very cynical. This is not industry standard. This is like, we would never do this at Twitter. I don't think it is I think that there are some companies out there that are genuinely trying to be upfront and and trying to balance what we have acknowledged right like there is a user privacy issue there are things that if I say I don't actually know how it spins out right like there are answers they can't they can't there are questions they can't answer. There is data they don't have there there is a bunch of mess that they're worried about. And I think that that's fair. I think it like we should they should be able to say like, you know what we didn't build our system for that. And then get into the honest conversation of like so what's the system we're supposed to be building, but but pretending you got all the answers pretending that like, we'll just add some more AI to that and it'll be fine. Like, that does none of us any good and that's the kind of stuff where I'm like, stop with the toxic positivity about where tech is going, because it's not helping the public discussion. That's what that was my frustration. Got it. My anxiety level was back up again. So thank you for achieving the homeostasis. Very good. Ethan Zuckerman, Professor of public policy communication and information director of the initiative for digital public infrastructure at the University of Massachusetts. If you're joining us from Western Massachusetts right now is that correct on on a beautiful chilly day and a beautiful chilly state office building on our flagship campus here in beautiful Amherst. Well, for one thing, let me just thank you for investing in a high quality microphone instead of using a tinny webcam like at least I am at the moment. It's great to hear your voice and full dimension. In a discussion with Elizabeth Hansen Shapiro. On a podcast, you made reference to social science one which Nate had invoked as an earlier kind of iteration of perhaps an earnest attempt to work on this data sharing problem and you said, one of the things that happened with this report. Is that it's in the wake of a research project called social science one was this very ambitious research effort organized by a pair of just top social scientists to work with Facebook, open up a data set study political influence elections. Instead of Facebook's role within it, despite the fact that the academics who started the project had great contacts within Facebook despite the fact that the people the data was opened up to were a set of accomplished academics have been very carefully screened. They had a really hard time opening up the data many of the people involved with the project ended up just incredibly frustrated, a nicely sort of mistakes were made non judgmental assertion of frustration by all and I'm sure each party to that adventure would be able to describe some frustrations. So how come a chance to situate whatever you're thinking about now in terms of lessons learned from that effort, figuring that most of the people watching now or later well have heard of it. And what you're thinking now as V2. Sure. So, so let me back up on this and say that at the end of all of this what I'm going to be doing is saying yes and to what date put on the table but I'm actually going to focus on something slightly different, which is that I actually think there's an enormous amount of things that we can do without the cooperation of the platforms, if the platforms allow us to do that, or if we can get the legal or regulatory structure to allow us to do that, which is kind of what the be I was talking about around the citizen. Very much on team to be had I'm going to I'm going to tell you a story to get there so. Okay, just as a bit of background on this, I've been building what's now politely referred to as an unauthorized data set for about 14 years here. It is an independent index of news online it's called media cloud it's used by hundreds of researchers out there. We started building it at the Berkman client center, not by asking permission, but by essentially invoking the principle that Google used to index the web. This is a useful thing, and we're going to start assembling this and start doing this. And there are now other collections out there like push shift which is a very high quality index of the Reddit service that are sort of working along these ways. What's basically happened in this space is that social scientists want to research these big powerful platforms because they appear to have such a huge influence on individuals and on our civic life in general. Projects like social science one have tried to hold hands with the platform, get access to the data. It's been incredibly time consuming people have put thousands of hours into it. And at the end of the day what Facebook actually released was bad data. They were missing more than half of it and they had mislabeled it so that people's work around it. And that was intentional it just, it was. No, I, what's amazing about it it doesn't matter if it's intentional. Obviously it's worse if it's intentional but what's extraordinary about it is that the only way that we figured out that Facebook's data was inaccurate was that one of the researchers who had access to it was able to cross check it and say this cannot possibly be a complete data set. And that led to the revelation that the social science one data set had excluded Americans who hadn't expressed a political preference on the left or on the right essentially the middle of Americans. So, we've tried to work with the platforms. You now have efforts around things like data donation panel studies, things like the markup is doing with citizen browser. Mozilla is doing with rally, and you've seen where that gets you it gets the NYU ad observatory, getting cease and desist from Facebook, essentially saying no no no no. You can't construct your own data set, based on data donated by individual contributors. So we're now at the point where researchers I think need to start asserting and finding ways to defend rights, and specifically, we need to assert a right to publicly available data. So let's consider YouTube right YouTube is surprisingly understudied, despite the fact that it's enormously influential. And it's because the pain in the ass to study, you actually have to grab the videos and transcribe them and do text analysis on them it's very very hard. What's an example of a question that seems like you'd love to see an answer to about YouTube that none of us knows the answer to. I don't much hate and extreme speeches on YouTube, and how often is it recommended to the average user. You'd really love to test Kevin Russes rabbit hole hypothesis. And the alternative to the rabbit hole hypothesis is there's racist content on YouTube and mostly racist find it. The best effort we've had at finding this comes out of Brendan I hands lab up at Dartmouth, where he's tracked 1000 users and finds very little evidence for the rabbit hole hypothesis. And I'd really like to do the work of trying to map extreme content across the site. We have a project under the media cloud banner, where we're going to create a random sample of a million YouTube videos and work towards transcribing them and try to create a searchable index out of this and are the argument. Yeah, our argument for this is going to be that we are doing what Google has done for the web and we are simply doing it to YouTube. So, first of all, you need the ability to sort of assert the right to study public data like that. And the second is that there needs to be some sort of right of data donation data altruism what people are doing with allowing users to put programs in their browser and say I'm going to share my data, whether it's with NYU or the markup or somewhere that needs to be protected. We see under some of the data regulation coming out of the EU, this notion of data altruism you should have the right to share your data with researchers in those contexts. In addition to what Nate is doing, which I'm completely, completely supportive of, I want to see legislation around a safe harbor for researchers. And I want to get around the problem that you're absolutely right in putting on the table Jonathan around clear view by essentially saying, you don't want to solve this in terms of saying this is technically possible or this is technically impossible. If you make clear view AI technically impossible, you break a whole lot of other things, including the ability to do research and public accessible data. What you have to do instead is something that the law is actually pretty good at, which is distinguishing intent. There is a big difference between me scraping thousands of YouTube videos to try to find extreme speech, and me generating a database of purposes that I can sell it to law enforcement. And one of the great things about law is that we can distinguish why it is being done and how it is being used. And that's one of the critical distinctions we need to make here. The tech platforms tend to insist. If this is possible this will be abused, but the notion of research for the public good for public purposes is an idea that we really have an obligation to formalize them to support here. I can't tell. At first I thought as you were winding up you were saying that this shouldn't be left to the cat and mouse game of whose grappling hook goes over which wall or how tall the wall it like the public policy should sort of just say what the balance should be between those who want the data and for what purposes and those who have the data, but given the focus on carving out exceptions for altruistic data sharing by citizens, so that the markup and others can do the kinds of really eye opening studies that they do for which platforms and often say but that's incomplete because we have the rest of the data and here's where you got something wrong. I guess is it just all of the above. There ought to be an opportunity to do that kind of ad hoc sampling, and that should not be legally penalized but at the same time. There should be a FOIA like palette of data shared through the auspices of something like Nate's proposal. Nate's proposal is critically important for certain types of decisions where you're just never going to release all the data. Let's take content moderation, right, an open topic and content moderation is whether some groups are moderated more aggressively than others. Let's get political and say that you're much more likely to have content pulled down if you're Israeli and then if you're Israeli, right, I've certainly seen that claim. Yes, that's that's often made. That's very hard to evaluate with public data, because you can't just look at the content that remains on the site. You actually have to look at the content that's pulled down. So that's right so that that's data where you need a method like Nathan's. You could also do this with some sort of audit method, where you have some sort of an organization that is going to come in, look at the data run some tests against the algorithms and say you Facebook are complying with generally accepted algorithmic fairness principles in the same way that a financial audit says that you're complying which generally accepted auditing principles. There's not actually a huge amount of difference between the two you're basically talking about creating a firm to do some of the testing that Nate would be doing. I'm really talking about something different. I'm talking about the stuff that is out there in the public. In theory, go to Reddit and create an index of all the public content in there and that's what Jason Baumgartner has done for push shift. You can if you are glutton for punishment. Go through the different YouTube IDs and find valid YouTube videos and create an index out of them. We are doing a slightly less stupid version of that, but yes, we're doing some variant of that to get a true random sample of YouTube, so that we can actually say things like what languages is YouTube and there's alarmingly basic research on this platform, which for my child and for yours is essentially going to be their exposure to the media universe. And it's not in this case because it's legally constrained. It's that the tools are really really hard to build, and that we actually as a field need to invest in building them and making them accessible to researchers across the board. So back over in the call for a second, with of course it being understood that you couldn't possibly offer legal advice publicly through a zoom video because that wouldn't be a good idea. I'm curious for your academic ruminations as a skilled lawyer in this area. About both the level of genuine risk that Nabiya and Ethan are talking about as they're trying to organize this sort of adversarial I think was the delicate word use data collection is receiving a nasty letter from Facebook just to write a passage and you know we put it on the wall and we're done or is it potentially represent a real problem. And secondly, you alluded to the need for standards and coordination. Maybe not just so there'd be apples to apples data collection across platforms and circumstances, but because there's a lot of jurisdictions here and I'm curious how much, even if a company's wanting to somehow come to an organization, whether under pressure or on its own sense of what it should do. You're looking at the EU and saying, my God if I share so much as the Pantone color of somebody screen. I'm running a foul of the EU privacy directive and does it. This is the kind of thing that for, you know, ideas like Nate's we've got to do some kind of global coordination or can we just us does one thing and the rest of the world adapts. I'm going to travel one road, because I think actually Nibia and even will speak better on the issue of scraping than I will but like the one part where I'm really interested and I don't have a good solution yet is the data donation idea which I support. And I also think is fraud. Like, and I, but I believe that there's a way to do it. So, so I think about data donation like in the medical context, right. Patients donate their data all the time. And we have a system for doing that and doing it safely so that research can be done and advances can be made. So like, I believe that this is possible. I also think in the social media context, because when I donate my data, it's might not just be my data. Right. It's all the other people who are in my networks data to sometimes or the people who are shown in my video because if you've got a plug in in your browser that be a maid that shows her researchers everything you see if we're friends, you'll see the full account that the public doesn't see and then send it over. Or if I port my Twitter feed right or whatever purpose to the ad observatory like I there there are other stakeholders involved in that and I guess I'm wondering how researchers journalists are thinking about that problem, because it does feel to potentially real problem in the hands of the wrong, wrong researchers and one of the comforts I take and I know that, you know, like, Nate solution is a very like cabin to very strictly like vetted researchers, but that also from my first business feels a lot safer, right like that that there will be both appropriate scope of research and accountability for the breach of trust with with the subjects that you have. And I don't know how to, I don't know how to balance this out but they do trouble me a lot with this process. Okay, so now Nicole is anxious, I'm wondering if either of the via or Ethan can put that to rest. And I think my lights just went out, but everything else is going fine. So, I think one of the things that is critically important to make data donation work well is to have careful privacy review. So, for instance, when the NYU ad observatory started building their plug and they had Missila come in and do a thorough privacy review and came out with some suggestions and otherwise a clean bill of health. We, I think, perhaps as as as a group have have some healthy skepticism of whether IRB is entirely up to being able to do this work both because some of the most interesting work is being done by activists and journalists, but also because there are really technical issues that may require literal code review. At some points, fairly soon, we probably need a body of researchers who are doing privacy reviews who are doing code reviews on code before we're throwing it in the browser. You're saying you need oversight for the oversight. No, I'm saying that the browser plugin is doing the oversight of Facebook, and now you're saying the better oversight of what you're doing. It's interesting that researchers might actually successfully regulate ourselves that if we actually built a professional industry body and did the work of reviewing each other's code and reviewing each other's research strategies. I think that's probably more believable than believing that Facebook is going to release all the data that we need to see about the internal research. It's critically important to call but I don't think it's a game stopper in the ways that it otherwise might be. Yeah, I want to jump in to say that I do, I do think it's a hard question and it's a hard question that we grappled with in the law, although I don't have the optimism of legal brilliance that Ethan service surface. So, we face a version of this question before on a different scale, when we were confronting the digitization of court records, right, when court records went from being things coistered away in a courthouse to now being available on pacer to pay and all of a sudden get access to every single exhibit to for everything for all of these lawsuits that would have a variety of information that yes, perhaps is relative to the, to the parties at hand but now available to everybody right so sort of picking up on the structure of the question that Nicole presented of like people who have nothing to do with this right with a dispute at hand or the research at hand all of a sudden having their information available. And there you know there's a regime of redaction and your responsibilities and structures and standards but what that surfaces is that you don't necessarily need gatekeepers but you need some form of gatekeeping and the gatekeeping can be agreed upon standards, right that like some of the references that in order to play in the sandbox you got to agree to the rules of the playground. It could be, and I think lots of people retreat to this saying some institutions are appointed with the gatekeeper authority and academic institutions could feel more trustworthy than journalists although, you know, lots of people can be researchers or academics and cause all kinds of trouble, but I do think it's worth thinking about what that gatekeeping function should look like that is both maximally inclusive but gives a some escape hatch to say yeah you know what your methods are for commercial competitive reasons right you're actually scraping all of this information to build a business like clear view AI that's not in service of the public, you can't play. Or you, we don't think that you actually have the capability of safeguarding this data in a way just technically we don't think you have the capability, so you can't play. We're going to have to create those rules but and I think we have to accept there's going to be some gatekeeping function that plays in there. Very briefly. Who's the best gatekeeper do you think what would that look like institutionally. Well, you can't ask a First Amendment person about gatekeepers. I hate that you just said it has to happen. So gatekeeping, someone has to gatekeep focus, the focus on standards is the one that feels the most comfortable to me right. That is really looking at, can you certify in a credible way that you are following the set of procedures that are agreed upon, I think where it gets really hard and this is where I want to raise your anxiety level again but agree with Nicole is that like, who do you think is it a government agency is it self certification is an independent body like the bar for lawyers or whatever the equivalent is for other fiduciaries who hold information. I mean I think this is like that this is what makes it hard but I think there are options. I think that a really interesting question that I struggle with right is when I look at the body of media law, like a lot of a lot of standards are made of mistakes, right like something goes sideways and it's like, yeah maybe we don't want to do undercover journalism, or like or some mistake prompts a correction, and in this universe. What do mistakes look like and what's the level of mistake that we're comfortable with as we stumble towards the right balance in this moment I think that's a, I, that keeps me up at night. Well, maybe this is a good opportunity to put it back to Nate because here you've offered up potentially something writable in the stone of law that would mandate forms of data sharing is it. Can you provide any form of gatekeeping or vetting for what your law would provide that the companies have to share vetting of the consumers of that data or it's kind of on a public page, knowing that there've been rare instances. I think in which FOIA answers have been titrated so that they don't just get released to the public like there's a reading room and you have to physically present yourself at the reading room to read it and you can't have a camera with you that kind of. So there are a few answers there. First, so there's the scraping provision of this proposed bill and then there's the the secure data access provision right most of what one focusing on is the secure data access provision which does have a whole system of vetted researchers modeled on the NSF sort of procedures that one would use for grants and the like and research projects the other analogy is is the sworn. This is called like the sworn researcher access to census data. And so there are these analogies that are out there for this but but my, you know, remember what I'm trying to prevent is another Cambridge Analytica. Right, we've been talking about this as if the, as if the people who want access to these data are all in the sort of public spirit here or that it's easy to distinguish between good and bad people when it comes to access to the data. So, my view is so so, but but I support everything that that's been said so far I would like to have many different tiers and strategies for access by researchers as what however we define them in journalists right. I just can't figure out how to define some of those other categories I also it is it is quite challenging, and I would even be self critical the bill on this when it comes to scraping. It's not so easy to define what is public and what is private and people need to sort of understand that. And if you if you just declare that everything on Facebook is public or something like this. Then there are ways that they can as you went through password protection or whatever else to start sort of changing the nature of the platform in order to prevent it from being completely scraped. You know the way I anticipate this is that there is a vetted procedure for the secure access that there's much lighter touch when it comes to scraping but it is still my proposal is still limited to university researchers that doesn't mean it shouldn't. I would love to see it expanded and Ethan and I have been on some other calls where where there's some other proposals out there. I do think it is very hard actually to to make the trigger be the intent of the researcher because the platforms don't know what the intent is right. And so they're going to be in a position of enforcement where they might actually even be liable for the kinds of data that might be scraped by someone on the outside they can't be in the middle of a policing this they're they're going to be overly restrictive so I was just saying I guess Cambridge Analytica was an example of a cat's pod you had an academic fronting for I know we're just about a time and you especially may have to dash let me just ask a final question which is, I feel like so far our conversation is presuming that if we can get the data flows going in the, all of the above kinds of ways. We'll have a chance to start knowing better what's going on on the platforms how it's affecting people, and then start forming public policy interventions where they might be reasonably called for that's kind of the background shared assumption here, and I just want to test that out as a foreground The question is, is it still then the kind of idea that everything not prohibited should be permitted. Until we detect a problem through good data collecting analysis and cogitation, or is there areas where it should be everything is prohibited until it's permitted I mean I guess I'm just trying to ask like the fundamental value thing is innovation move fast break things and is this move fast break things but then fix them, or is it not that. Well, I look that that's a that's the most meta meta questions there I mean that that that means. No, but part of it is is like if you're talking about like the design of technology right then of course there should be rules as to you know prohibitions at the front end as well as at the back end right. I don't know that's that's what I'm wondering. You know, a look I mean this is we deal with this in in you know cloning or CRISPR, you know all kinds of ethical challenges and so we want to, you know, change differences, you know, have ethical restrictions at the front end as well. The point about transparency legislation like this is that it will both have a chilling effect on bad practices at the company because they will it will be harder for them to keep it secret. And it will inform public policy so if you want to deal with the content moderation or other issues that we can do so in an intelligent way. Got it. Any other thoughts on that question among Ethan McColl or media. I just want to flip the script a little bit. The Silicon Valley approach to this is basically said, let's try all the maximally profitable ways of organizing people online. Right. At the end of the day the innovation has been around how much attention can we capture and monetize and how much data can we pull from you in the process. It would be really helpful if we were innovating with some other criteria in mind. One useful set of criteria would be, do these tools make us better democratic citizens, do they make us better neighbors do we take care of people online in better ways. Those aren't things that are well valued by markets, those are things that are sometimes valued by states. So for me, I'm really interested in these questions of how we get more data out of these platforms. I also feel like we're sort of nipping around the edges the real conversation we should be having is what do we actually want social media to be, and then how do we build the structures to get to that note new form of social media. For me that's a much more interesting question, then these questions which largely come down to fixing the problems of trillion dollar corporations. And frankly, everyone on this call is underpaid in sort of doing that work, we should actually be doing the work of imagining and building better systems that are better for us as members of society. So instead of asking what should the regulatory framework and how preemptive should it be for profit maximizing companies looking to invest big in this area and make lots of money. Maybe it should be an entirely different framework to begin with. By all means, regulate by all means increased transparency, but some non zero portion of our effort needs to be around this question of what would good digital public spaces look like. And if we were willing to get really creative and use public money charitable money, all sorts of different ways to do it. How would we design and build those spaces. Great question, Nate, we know you have to bounce. Thank you, thank you, Nicole and the be a let's just last words on this front delighted to hear them for the agents. I don't want to plus one what Ethan has to say, you know we're in a moment where Web three is rising we have to understand how we got here in order to build something better with granularity and regulations can change the incentive as it should perhaps, but I think I'm really interested in making sure that we do not repeat the mistakes that we have made in the last 25 years as we build a whole new infrastructure so the time is now for all of this. Very good, Nicole you get the last word or your cat. I have no last words I had a cat that was my last word. No, I this is such a great conversation I do you think we're like at the start. Right. And so I think recognition for all of us but also in our communication with the public that we are at the start is essential in terms of like what we make open and what we research but I also agree with the importance of Ethan's work and sort of having a digital public infrastructure or some alternative to the world we live in now is we have to have a positive statement of what we're going to be, not just continually playing defense with what we have. That terrific note. Thanks so much I hope we can convene again in some interval and see what progress we have made or not I somehow at this moment at least feel less anxious. Thank you all for this discussion and for lending a start to our modest effort, joining so many others including those by the folks here, and another thank you to Hillary Ross, Will Marx and Madeline Metsui for organizing and prepping for this panel today. Cheers. Thank you. Thank you.