 All right. Why don't we get started? Hello, everybody. I'm Jonathan Zittrain and I'm so pleased to be welcoming all of you to our session today where we will hear from the participants in the 2020-2021 Birkman Klein Center Assembly Project fellowships. This marks, I think, the fifth year that we've done the assembly program. The idea was to bring participants together from academia, industry, government, civil society, and across all sorts of disciplines to work on problems arising from technology that touch the public interest and that are really hard and that might call upon multiple sectors or parties or types of thinking to make progress on. And they're every bit as pressing as they are difficult. And the original theory behind the program encouraged by our colleague Jordy Weinstock was that if you had just the right amount of a little bit of structure but not too much and managed to yank people out of their day to day wherever they might be working in a day job or occupation and kind of put people in a new configuration and load them into a particle accelerator, figuratively speaking, and try to inspire some collisions that good particles might result. And that has borne out over the five years of this program where we've touched on in different theme years issues of privacy and security, ethics and governance of artificial intelligence. And we've also recently disinformation on online platforms to be sure not that we've solved all this stuff, but it's just been fascinating to see people who consider themselves already somewhat expert or steeped in one of these sets of problems or the relevant fields and try to address them, and to see folks of that sort, encountering some of the people and projects that have coalesced through this program and say oh, I'm thinking of a new way of looking at that stuff. For each of these over the course of the five years we've had more than 75 fellows, and out of that together, there's been, I think just under 20 projects and every year one or two of those projects has continued beyond the life of the program, whether proof of concept or beyond, trying to continue addressing the problem that they originally identified. And I think a number of people as you'll hear from them today, have said that the program can change their own professional trajectories and ideals about what they might want to work on. But this pandemic here, instead of picking an entirely new topic and starting sort of from ground zero. We invited five teams who were continuing elements of work they've done in earlier iterations of the fellowship to come back for further support we matched them with on the Harvard campus created kind of a new sort of all star cohort among everybody were able to facilitate some small grants and offer some opportunities to share their work for feedback with other relevant players some of whom might be able to help implement some of the ideas they've had. So, today, we'll be hearing from those five projects. And before we do that. I just want to first give a huge thanks. Not only, as I already mentioned to Jordy wine stock who kind of had the initial vision for this program. Today, the best Hillary Ross, David O'Brien and Sarah Newman, who have really helped pull all this together and create just that right amount of structure and inspiration and creativity to get things really cooking. Over the course of this project and this initiative. I'm also extremely grateful to the advisors, both in past years and most recently in this year to include on these projects Cade Crockford, Mary Gray, James Mickens and Margo seltzer, who have each spent their time. offering honest thoughts and suggestions on what you're about to see tonight. So so grateful you could be here and that will be. This is by way of warning recording this for posterity. And I think now back to you Hillary again with thanks to introduce our five teams. I think there's probably a banner already up. Oh yes that's the thing the project fellows there's the five things you'll hear about today. And then I think it was probably already up there. The opportunity to put in questions in the q&a tool, and then at the very end of each of the five presentations will have time for possibly advancing some of those questions to the teams. So thank you all and thanks to our assembly fellows. I'm just so pleased at how much you've really brought it this year, and at the work you're doing on questions that really matter. Hillary over to you. Thanks. Thank you so much Jonathan. And so I'm Hillary I'm a senior program manager at Berkman kind for the assembly program. I'm really thrilled to be introducing these five projects that are working to improve the state of disinformation and privacy, and the governance of AI technologies. As Jonathan said all of these projects started during assembly fellowships over the last five years they've been working on their own and then they've rejoined this year to dig deeper and further some piece of their project. And so tonight they'll be sharing about their work broadly and about what they've done over the last few months. So before we jump in, just a huge thank you to our staff team as Jonathan said to Jonathan himself to the fellows are advisors and the assembly alumni community with a particular shout out to tonight Jail and Michaela Lee for their peer advising with project teams this spring. As Jonathan said, the flow of tonight's event over the next hour these five teams will be sharing about their work with brief intros in between for me, then we'll shift after their presentations to Q&A in the same order that they're presenting up top. And that Q&A will be moderated by Jonathan will be collecting questions as we go so please please share your questions through the Q&A tool as you have them and I will remind you to do that as we go also. So with that, I'm excited to get going, and to introduce the AI blind spot project, a blind spot came together during assembly 2019, working on the ethics and governance of AI. The team is Anya Calderon, Hon Chu, Dan Tabor and Jeff Wen, and we asked each project team you might have seen the promotion for this event to create a riddle for their project and theirs is, we all have it. So make a blur. If we don't spot it, harms will occur. So I'll hand it over to Dan from here to un-pangle that. Thank you. I'm excited to kick us off by presenting AI blind spot, which is a tool for advancing equity in AI systems. I'm presenting this, you can go to the next slide. I'm presenting this on behalf of Anya Jeff and Hong. The four of us have been working on AI blind spot for the past two years with the mission of dissolving barriers between those who build AI systems and those who don't, particularly those who advocate for particularly civil society actors who advocate for more equitable AI systems. You go to the next slide. I'll go by introducing the AI blind spot framework and what it represents. And then I'll discuss how we take an art-based storytelling approach to change the narrative around how these AI systems are used. And then dive deeper into a specific case study of tenant screening algorithms and show how we're using AI blind spot to improve how these algorithms are utilized. You can go to the next slide. This is AI blind spot framework. From the very beginning, the framework was designed to mirror the steps that a data scientist would typically go through when they're designing a model or AI system. There are four stages to the framework of planning, building, deploying and monitoring a system. And these represent a typical data science workflow. And through each stage we identify over sites where unconscious biases can give rise to structural inequalities. You go to the next slide. There are two, as I said, it's two dissolved barriers between those who build systems and those who don't. And in an effort to do this, we always strive to create accessible content. Two years ago we designed a deck of cards, which you've used in interactive workshops and different conferences, just MozFest. And the latest iteration of AI blind spot is that QQI, you see the right way of actually call out. We go to the next slide. Al is a childlike being with a human body and a gear for a head. And Al represents the idea that just like children are reflection of the values of the people and environment around them, AI reflects the values and priorities of those who create them. We don't believe that AI and technology are inherently bad, but when they're implemented, they often represent the values and systems of those who create them. And as a result, they often reflect the structural inequalities that exist in our society. You go to the next slide. This encourages people to ask the question, what kind of AI are we raising? If you go to the next slide. In the past few months we've transformed the original AI blind spot icons into these illustrated scenarios. And these illustrations represent how the blind spots would play out if AI were a person. We plan to create animated versions of these illustrations, which you could use in social media and educational campaigns. And we're already using them in interactive workshops, including one that Hong is doing in two days at Stanford as part of their tech and racial equity conference. If you go to the next slide, the workshop worksheet you see here is an example of an activity participants will do where they identify what is Al, what type of AI system it could be law enforcement or candidate screening or robot assisted surgery or anything at all. And then identify who are the stakeholders who are designing Al, who's harming being harmed by Al and what types of values and systems is Al reinforcing. And then most importantly, how can you change that story? How can you create a more equitable version of Al? And I'll give an example of that by going into tenant screening. If you go to the next slide. Just to give some background, tenant screening algorithms are a billion dollar industry. 90% of landlords say to use some type of algorithm to screen potential tenants. And these algorithms are almost entirely unregulated. They use, they take access to data from people's credit scores and employment history and criminal history, but really virtually anything they want because they're unregulated. And many critics have argued that these algorithms disproportionately harm black and Latino communities. So one thing we're trying to do is, excuse me, is to, excuse me, let me rewind a bit. Not only does this deny an individual housing, but it can create a feedback loop where because an individual has now been denied they have that on their record. And it can also which affects their likelihood of getting housing in the future. It can also affect their likelihood of getting employment because the lack of housing and this feedback loop can persist for generations, particularly as the federal eviction moratorium is lifted. So to go to the next slide. This goes back to the worksheet that we did I showed a second ago. This is an example of what participants work through in the workshop, where they identify the stakeholders who are influencing Alice design. Not just those who use the algorithms but other organizations like consumer data industry association and related companies like those who do background checks and identify communities who are harmed and what values and systems are being reinforced throughout, such as society's history of housing discrimination, and then how you can change that narrative. One thing a lot of companies communities are trying to change narrative but one thing we often hear is the community struggle where they don't really know how these algorithms work so they don't know what changes to call for. So another thing we're doing is designing a case study in which we walk through each blind spot and show how it manifests itself in the form of tenants green algorithms and what changes you can call for. I'll show you a couple examples the next two slides. If you go to the next one. Success criteria refers to the idea that an AI systems metrics for success, determine whether it's a specific metrics determines whether it's successful or not. But when you try to optimize for one thing you're often going to give rise to harm because you're ignoring other priorities. Just like in the case of Al, you know you can evaluate whether the hose is successful based on whether he can water a plant, but then you're ignoring the fact that he's not putting out a burning building. In the case of tenant screening algorithms, if you go to the next slide. These algorithms are likely rewarded for minimizing false positives. This means that you want to make sure companies want to make sure that when they recommend a tenant, it really is a quality tenant. But that means you're making a statement that you're willing to make a mistake and potentially screen out those who actually are qualified. And in doing so, when you make that mistake, you're most likely to mislabel those who come from disadvantaged backgrounds who really are qualified, yet are being denied housing. So a specific solution in this case is to not just require some type of audit which is very vague and really could look like anything, but specifically require an audit that requires companies to measure whether false negatives are being equally distributed across communities. If you go to the next slide, another blind spot is explainability. And this represents the idea that algorithms are complex maze of decisions, and it's very difficult to understand why the decisions are made, even to those who design them the algorithms are often a complete black box. But the people who didn't design these algorithms have a responsibility to be able to justify decisions that are made, particularly high stake decisions that affect individuals well being including their access to housing. If you go to the next slide, applicants who are denied housing currently don't have any way of knowing why they were denied, except for knowing that their credit report was involved. So they have no mechanism to understand the reason, unless they can't really contest decisions that are made. And this is a problem, particularly because these algorithms are vulnerable to inaccurate data that can lead to false recommendations. So a solution here is to give people the right to understand why recommendations are made so that they can correct any inaccuracies and contesting decisions. So that's another solution that we're working, I give you two examples working through all 11 black spots and looking at what they look like in the case of tenant screening and what solutions we can recommend. And we're doing this in collaboration with the National Fair Housing Alliance. If you go to the next slide, tenant screen is just one example but you can visit our website to where you can access our workshop materials if you're interested in designing your own workshop for different areas or connecting with us to potentially do a case study on a different area. We purposely designed a blind spot to be very to be able to adapt it to any type of AI system and quickly running out of time but I want to go to the last slide just to quickly acknowledge. And also to as our collaborators they're designed justice studio have been great collaborators in creating this newest version of Al. Also Kate Crawford our advisor who's given us excellent guidance through assembly, as well as Vincent Lee to want to petty and Serena Oduro who participated in user testing as part of this latest version. Last slide is thank you. Thank you so much Dan and thank you to the AI blind spot team. It's really great to hear about your work with civil society organizations and broadly. So now I'm really pleased to introduce the clean insights project which came together during the first year of assembly in 2017 working on privacy and security. The next riddle is what takes what you need, but nothing more to give the knowledge you want without gathering a horde and Nathan Freda Freda sleeves the project and I'll turn it over to him to share more. Excellent. Next slide please. And next. So we're going through. Today on our agenda we'll be talking through the problem of analytics. A lot of you might know, think of analytics and the context of the thing you'd put into your website, but in fact sort of measurement of data and signals is everywhere. And then we'll talk about how we've tackled a more private solution. What we're doing working with developers and designers today around consent user experience and then how we've moved the project forward through assembly. Next. So at assembly 2017 I came into this cohort as a independent open source privacy preserving kind of focus developer it's the work I do with my team of guardian project. And on my team where people from Apple and Google and Square and you know people that love data, but they listened to my struggle and the things that I was uniquely challenged with next. In our team project, we have been working for over 10 years on building privacy preserving software. We work with projects like tour and signal and building crypto databases and messaging apps in human rights and humanitarian context. But we don't add analytics we don't measure our apps because there was nothing that we trusted and this is a big struggle that we had at the time and still do today. Next. You might have heard recently in the news about audacity or if any of you are bloggers or podcasters use this software. Well they were acquired and they're kind of in the same boat that we are guardian project they have this great open source community and support but they don't know how many users they have and what features are need to be improved or where people are struggling. So their new owner after they got acquired just dumped in Google analytics and Yandex Metrica and proceeded to piss everyone off. You might have also heard in the news about Apple and Facebook and others fighting around the new moves that Apple has made around locking down app tracking. And really what we've seen there is that most analytics and kind of things around usability are tied up with advertising identifiers and everything's kind of a big mess if you actually just want to responsibly learn how to help your users and improve their experience. It's very hard to find the right toolkit to do that. Next. This expands all the way to smart cities and contact tracing and health systems and traffic and everything in our lives that's being connected and instrumented. So this is really an existential problem in our lives where everything is constantly monitoring us and the way that these are things are being implemented is often very privacy invasive. Next. So on top of all this is that the consent experience for the user is confusing and hard to understand and there's full of bias and blind spots as you'll hear about today and that consent and all of this data can often be weaponized and turned back on to users in unexpected ways. So a big issue there. Next. So next please. So we have from the 2017 cohort I've been shepherding this project along and expanding and building a new team around this who's been working for the last year and it's been an amazing process so we built a team that has a wide variety of skills. Next. And we held a really fun symposium last year online, but brought together a number of different interested developers and designers and data scientists to think through how to what people should know about data, how they should use it what developers need and started building from there. Next. We also did a number of interviews and reports with open source tool teams who need this code like we do across kind of internet freedom and human rights spaces and published a report on that. Next. Fundamentally the idea of clean insights is that yes data is amazing. But if you just hold on to all the data becomes a toxic asset and there's a lot of liability there. What we want to do is separate out the true knowledge and retain privacy and then that's the actual valuable thing that you're looking for. Next. So we have a number of ways we approach this. We talk about this on our website and a number of places but our focus really is just like take what you need. Next. So we have a way we have toolkits to allow developers of any sort of thing to plug into our toolkits for enabling this. Next. Again a lot to think about here but we have kind of a different process than if your developer who uses tools today where you just sort of activate and forget it there's actually some thinking and planning to do at the forefront but it leaves you much better off in the end. Next. So this tool is available today at clean insights.org and on GitLab. So we've actually shipped a lot of working code and specs and we're really proud that this is starting to be adopted and implemented. Next. If you're a developer and know some JavaScript. This is kind of what it looks like. The interesting thing is that you define kind of these aggregation periods and the start and end dates and these windows of measurement. So we have the idea of consent built in and how you can get it from the user much like you might request a permission for us use the camera on a phone. And then we have the idea of measuring events and measuring views and we have ways that those get aggregated and averaged on the device itself before it ever leaves and goes to a server. Next. So videos and more and things that you can learn about on our website including me and others talking about how to tackle this and a great new blog post on our consent UX which I'll talk about in a second. Next. So consent, we had to rethink this too. And as you going back to the audacity example. The other thing they really failed on it is they just threw up a big dialogue box with a huge paragraph of text about all the things they're going to measure forever. And they thought that was good enough and it's not good enough. So we really have some new thinking around how to engage your community, talk about the benefits for them, show them the value of what's being measured and what it's going to look like and allow them to, you know, measure it and participate in different ways in different times and so really thinking about again how you collaborate and co design and co measure with your community is important. Next. And I'm so excited we have one of the coolest open source decentralized app stores probably the only one. And I'm excited is now implementing this and here's a super privacy centric app store that's, you know, trying to rival the lockdown systems of Google and Apple. And so they didn't again have any sort of tracking or analytics and measurement we work with them to find a way across decentralized app stores a way to measure and understand where their users can be better served and everyone's happy about it. Next. So we had some new outcomes. Next. We wanted to add more computational privacy support. So like new techniques beyond kind of the framework we have now and so we're trying to figure out which one. Next. And we decided with the support of our mentor, Margaret Seltzer that we should add all of them really that we should basically add all sorts of filters around measuring events and batch measurement filtering, and then plug all sorts of things in together with researchers to figure out exactly what is useful and then bring that to market to our developers. So we're really excited that we found a way to sort of not lock ourselves into one system and to find new ways to collaborate moving forward. Next. And adoption awareness is obviously important next. We also realized we needed better back end integration across different infrastructure. So that's really important that we're not locked into our own proprietary close back in and so we're really excited that say with a civic deployment of this, we could integrate into something like Tableau or pull into our systems. So next. Lastly, we have a lot of value that we can offer out of the gate to developers, more events and symposiums happening in the future. And finally, next. We're so excited to have our own merch. I drink a lot of coffee to give myself insights and tea, and we're actually going to have our own coffee and tea that you'll get for free if you come and participate in any of our events, or try to implement clean insights if you'd love to hear for you and stand by for that great benefit. So that's all. And yeah, you can reach me at lots of places and find us at clean insights.org. Thank you. Thanks, Nathan. I can't wait to get my bag of coffee. Great presentation. It's great to hear about your progress and implementation of clean insights. For our third presentation, I'm thrilled to introduce the cloak and pixel project which definitely turned into a riddle as what hides you so you can be seen. So our next item is comprised of three members fresh and green, Tom Miano and Danny Pedraza. They originally came together in 2018 working on the ethics and governance of AI and evolved out of an earlier 2018 project called equalize. Before I hand it over to them just a reminder to share your questions, questions, thoughts, comments through the Q&A tool so Jonathan can post them to the teams after their presentations. And with that, over to you team cloak and pixel over to Tom. Thanks, Larry. We're we're cloak of pixel brand new name brand new identity and very loud yellow slides. Next one please. In 2001 subset of the equalize team came back to build on the momentum that we had established and continued iterating on the prototype that we had launched in in 2018 and adversarial attack on facial recognition. In the course of the semester, we began to consider additional aspects of the larger problem space of face detection and recognition models. Some other folks had also been working on on the problem and so we felt it was time to look at the problem to a more broadly representative perspective and so we wanted to develop additional interventions to advocate for responsible use of FDR and highlight the risks while others are continuing to deploy the technology. And so defining risks in this space makes them more transparent so that ordinary people are aware of these issues and can make better decisions that's really the goal of cloak and pixel. Next and over to Tom. In the world today, everywhere there are more and more cameras collecting data. With that there's growing automation and machine learning built on top of this data. This data can reveal highly accurate and detail information about the individuals captured, while simultaneously being fundamentally flawed and susceptible to failure. The problem because the level of surveillance going on is often opaque to ordinary people, and they have very little control or say in it. Additionally, the information collected is often not in their interest. Finally, both the positive and negative applications impact and impacts of the surveillance affects different communities disproportionately. Well documented though notably clandestine example of this is clear view AI. Since 2017 clear view has amassed over 3 billion photos of individuals by scraping social media sites without the consent of those sites, or the users who posted those photos. Among multiple issues with this is that us law enforcement agencies are using clear views tools without consistent guidelines, regulation or policy defining appropriate use, and without clearly defined quality assurance practices. Next. And so as Tom alluded this technology affects people disproportionately and I wanted to highlight an example here. Robert Julian board check Williams, his case may be the first known account of an American being wrongfully arrested based on flawed match from face recognition this was published in by the by the New York Times. So it may have been a surprise to Mr Williams that he was being arrested since he did not commit the crime, but it is no surprise as we have witnessed this continued debate in the systems used to surveil communities and to identify people for prosecution face social recognition has been used by police forces for more than two decades and recent studies by MIT and IST and others have found that this technology, while works relatively well on a subset of the population does not work well on demographics that haven't been included in the data sets notably people of color and females. And so in part because of this lack of diversity, the images used to develop these databases are lacking lacking signal and that really was the genesis for our project in in 2018 equalize next slide please. We took equalize established in 2018, the year Tom Gretchen and I met and push forward with an additional group of folks a larger group was built as a prototype to fight pervasive surveillance at the time. There was no published work on adversarial attacks to print from from facial recognition. And so we wanted to give something to the to the public we built the functioning prototype to court face detection, and to really invigorate the missing discussion around the power imbalance that the social platforms have over their over their users. The tool was successful in reducing the detection. The largest vision API is available. The example shown here on the slide is equalize confounding Google cloud, reducing the detection rate quite significantly. What also came about was that we want to highlight the levels of consent that users could could express. And so really wanted to look at what was possible in terms of the control of the data the discussion and debate what was desirable and who should who should control it. Next slide please. And although we took a technological approach, the cloak of pixel iteration now it's found that the solution isn't just isn't just technology. That's how there burns from the open rights group often says in presentations. If you add digital on top of a thing that is broken, you will have a broken digital thing that deploying and implementing technology of face detection and recognition has gotten a whole lot easier and will only continue to do so. However, the understanding of its social impact once this technology is available in the world has gotten much harder. And so to that next slide and I hand over to Gretchen. So we thought, not only about the harms of face detection face recognition but the potential harms of overreliance on tools like the one that we help pioneer. So if you start with the harm of face recognition, and then imagine a user using an adversarial tech tool to protect themselves. What happens when that tool fails. So we looked at an example where the face recognition was working, not not working like the example. Danny brought up, and a federally protected witness was found after using an adversarial tag tool that failed and posting their image on social media. So through that example, you can see that really it was the overreliance that was a problem and we looked at three areas of the law to see how does society expect tool makers to interact with and communicate with users to design and manufacture tools like chainsaws. If you look at product liability. There's a lot about failure to warn consumer protection about unfair deceptive acts of practices in commerce negligence is part of common law about the protection of others against unreasonable risk of harm. All of these come together and say, tell your users about the risk of failure and tell them especially about the kinds of failures that they could not predict. The adversarial attack tool should be expected to fail tomorrow or the next day, even if they work on every single system today. Because those systems are constantly changing, they're getting more data, the whole architecture might change. So we presented a poster paper at ICLR machine learning conference. And then next slide. How can we position the work that we did this semester how do we position the work that we did with our teammates in 2018 in a larger area and there's three main themes that I see one is thinking about the societal impact of cheap easy access to face detection recognition models, which even just in the last few years, barriers to use have plummeted. The levers of control and points of intervention. So public communication user choice and tools platform companies as partners for user protection, developing adversarial attacks and other technical research and engineering and taking inspiration from and influencing the law. And then the third is, it's not just face detection and face recognition. So the whole question of control of information data and how it can be analyzed and how it is analyzed and who should have that control and then back up to number one, and to the sort of what is the impact and what are the levers of control. Next slide. Thank you very much. Wonderful. Thank you quick and pixel team, excited to see your work thrive and evolve. For our fourth presentation, I'm so happy to introduce team data nutrition project which came together during assembly 2018 working on the ethics and governance of AI. The team has expanded since then and now includes Kasia Shmolinsky, Sarah Newman, Matt Taylor, Josh Joseph, Kemi Thomas and just your coffee. And they're going to speak to the riddle. What is something you can't eat that can still be nutritious. And before they go on I just a reminder that you can share your questions through the Q&A tool, and we will pose them to the teams after this round of presentations, and I will hand it over to the team to kick things off. Hey folks, can you hear me okay. Yes, we're good. Awesome. Hi my name is Kasia Shmolinsky I'm the lead for the data nutrition project which we also called DNP. We're super excited to walk you through what we've been working on. Next slide please. So we're a team that is actually constantly changing and growing I just want to make sure that even though not everyone is speaking today that you get to see the faces of our lovely team, as I will be representing their work along with a few others. Our mission as a team is to empower data scientists and policymakers with practical tools that improve AI outcomes. And we do this by building things and also talking to folks. We try to be as inclusive and equitable as we can. We really try to, you know, walk the talk because we think that's really important. Next slide please. So today just to tell you a bit about what we're going to talk about. I'm going to give an overview of our approach and also our impact, and then going to hand it over to Matt who will walk through some of the improvements we've made on the tool that we call a label. And then Neumann will close us out with some of the outreach work that we've been doing and a few exciting announcements. Next slide please. So the problem that we are trying to address here. Actually, I love the way that Cloak and Pixel put it they said if you add digital on top of a thing that is broken you will have a broken digital thing with like a giant like smiley face. And this is basically what the slide is saying. You know the problem that we're trying to address here is that artificial intelligence systems that are built on bad data will have bad outcomes. And what I mean by bad is that if you have data that is historically biased or has some issues in completion issues, composition issues, then the model that you build on top of that data is going to recommend, you know, have recommendations that actually exhibit the same issues. So one example I'll just pull out from this slide these are really recent media examples there's a lot out there on the right hand side Amazon created a hiring tool it used historical hiring data and historically, you know women and people of color were not hired as frequently as men. You know, regardless of whether or not they were qualified. And so when Amazon created this tool on that data. They immediately started to say the same things as was in the historical data which means that it was discriminating against various groups including women. And the other examples on the slide are telling some of the same kinds of stories. So these are these have real harms and a lot of these models are already out there making decisions and that's kind of when the problems are identified. So we go to the next slide. The opportunity that we saw as a team, when we were first in assembly, which is a few years ago now was actually to try to identify the bias before the model is even trained. So currently like I said you go all the way through to the end you deploy the model and only then do you notice there's something wrong with it. That's problem that's problematic for two major reasons right the first is that there might already be people who have been harmed. And the second is that it's really expensive to go back and try to retrain that model. And so we said well hey maybe there's something that we can do at the point at which someone grabs a data set to interrogate the quality of that data set. You go to the next slide. The analogy that we ended up using was a nutritional label for data sets in the same way that I can kind of pick up a can of Coca Cola and I can look at what's in it and see if it's healthy for me to consume. We wanted to do the same thing for models and data set so before a data scientist decides to use a data set, can can they actually look at what's inside the data set through a nutritional label and decide whether it's healthy for their model. And so in 2018 we launched a prototype and a paper. And in 2021, while we're in this in this fellowship actually we ended up launching the newest version of the label which focuses a lot more on the use cases so the intended use of the data set, is less on kind of a generic solution for everything. So in this example here on the right hand side this is a melanoma classification challenge data set. This is a data set that we worked on with a partner out of Memorial Sloan Kettering. And we said basically what are you trying to use this data set to do, and you know that the most highly the use case that's mostly used for this data set is to identify melanoma from images. So we pulled that use case right out and said, you know, is this data set about humans is this data set has it undergone ethical review quality review, and what are the kinds of harms that could come out of this what are known issues and mitigation strategies. So that's the most recent version of the label to go to the next slide. Just want to point out that we've been seeing the impact of our approach methodology and standard we're really excited about this. I won't say too much about this. But basically we've been having a lot of conversations with large tech giants, a few of them are here but also other organizations nonprofits governments in some cases. Our methodology has been used by RAI which is their responsible AI Institute in their responsible AI certification so we're kind of part of the data standard. And we're also as a standard becoming cited in places like NeurIPS which is a conference, where they say if you're going to submit a paper and it is based on a data set then you should think about documenting that data set with a data set nutrition label so really excited to see how our work is getting out there. I'm going to I'm going to hand it over at this point to Matt who's going to walk through the work that we've done during assembly on the label itself. So yeah so during assembly and we've mostly focused our label improvement work in two domains. One is getting feedback on the current version of the label. And another is work trying to build an ingestion engine so think kind of like TurboTax but for getting information about data sets to try and automate some parts of the label creation process, not the whole thing but just some parts. So when it comes to label feedback workshops basically we've been wanting we did a lot of changes like Kasha mentioned from the 2018 version of the label, and we wanted to get feedback on the qualitative direction that the label was going so thanks to assembly we were able to get a lot of focus group set up. And we're still sifting through a lot of the feedback but some big takeaways we have so far are that it's really important to prioritize information that helps people build trust when they're trying to find out about a data set and whether to use it. Also, while the qualitative information is super helpful to bring in to the conversation, having some views into the data associated with that qualitative information could be really helpful. On the ingestion engine side we've been prototyping what questions to ask to help surface some information to add to the label as a starting point since most of our work building labels right now has been working directly with data set owners and data set creators. And we've also been doing some research into how to build the technical infrastructure to make an ingestion engine and I'm going to pass it over to Kemi to briefly talk about the technical infrastructure research that she's been doing. Thank you Matt. During my research, I wanted to make sure that we use the best tools for the ingestion engine and ensure that it will provide the best options such as maintainability, scalability and empirical storage data for the architecture you want to build. And I also wanted to alleviate a lot of the unnecessary constraints that some tools frameworks and other technologies contribute to the way we want to develop our code. So through the influence of AI globals form and their architecture and personal research I was able to conclude and choose a mixture of Ford facing and imminent technology. And the next steps now is to just use these technologies together to build our ideal architecture. Now we can go to Newman to talk about the exciting progress in the children's book and podcast. Great thanks Kemi and thanks everybody on the DNP team not all of us are represented here but just also a shout out to Jess and Josh who are equally part of the team. And what the last thing we're going to share today is our educational work so alongside the label work we're doing and the research and the feedback workshops, we've been all along really invested in public education about these issues. We've hosted AI de-mystification workshops, we're thinking about public messaging, not dissimilar to the way that in addition to having food nutrition labels on food, you also want to educate the public about the consumption of food and the risks associated with the consumption of certain kinds of foods. So, as we have a number of creatives in our group we've always had this kind of creative bent to our work as well. And so we have a podcast in the works and there's more information coming on that I'm not going to share too much today. But you will be hearing about that soon and you can get announcements if you sign up on our website for email updates which are extremely infrequent there about once a year. And then the other thing we're working on is a children's book it's called I'm not a tomato. And these are some sketches. They're not final at all from the book that's in the works and it's a story about a red round thing. That rolls down a magical mountain and about the adventures and exchanges that it has along the way. The story is inspired by how both humans and machines learn and really underscores the importance of diversity in training data. So we're going to be announcing the launch of that book soon I would say this summer, early summer and you would like to get a notification when that is live and announced. There's a form here for you to fill out and we'll just send you an email blast when that is up. And lastly and probably most importantly, we want to thank our advisor Mary Gray for her support, all along the way, as well as other advisors that have been helpful to us including Jay Z, and James make ends, and then huge thanks and props to the assembly team primarily Hillary and sends out sends a way for all the terrific work they've done to make this possible in a pandemic. Thank you. I'm really exciting to see your work and excited to read more about the children's book and share with the children in my life. So, we have reached our fifth and final presentation from team dis info decks, which came together last year during assembly 2020 working on this information. This team came together during the pandemic and has continued working throughout the pandemic so just big shout out to them for for building something in the midst of a difficult situation. And the team is Clem Wolf, Rona Turan, Jenny fan, Neil Unger later, Ashley Tolbert and Goulson Harmon, and they are going to answer their riddle, where do you go to answer what you don't want to know. And before I hand it over to them just a quick note that after this will shift to q amp a for all the project teams in the order that teams presented so last or not last chance but chance to share questions before we jump into q amp a and then you can keep keep sharing questions as we go. But, but get your questions and now so thank you so much and over to you team dis info decks. Thanks so much Hillary and thanks for the shout out about organizing a project in the middle of a pandemic it was certainly interesting. And, as Hillary mentioned we came together last year and with a lot of different backgrounds so we're from journalism communications technology, and policy and design, and, and although it's just me presenting today it really has been a massive team effort since last February so shout out to that. And so the question that we came together to address as a group was how do we help people working in the disinformation space to better access and analyze all of the publicly available information about disinformation so that they're fully informed and that they can best address the problem. And this is what we've created it's called dis info decks and it's a database of publicly available information about disinformation campaigns. And it currently includes disclosures that are issued by major online platforms, including Facebook, Instagram, Twitter, YouTube, Google and Reddit. It also has third party reports that come with those, which are from graphic idea for lab and Stanford Internet Observatory. And it's really designed for anyone in the disinformation space people researching influence operations tracking disinformation networks across platforms and generally exploring broader trends. So just just to go back really quickly I just want to talk about why we created the database because I think it's kind of important so we initially as I said said to help journalists that are covered disinformation and there are these two major problems that we saw. The first is it's a really new space in journalism so you're talking about newsrooms pivoting to cover a space that's rapidly evolving involves new skills, new verification methods, new vocabulary. The first was that there's a lot of information out there for journalists to navigate so you've got academic papers media reports platform releases civil society groups, and the information we found was quite disorganized it's hard to know where to go where to start. So we decided to focus on that aspect of the problem, putting everything into one place and sort of navigate the disorder about the information disorder. So just just really quickly what what do we mean by disinformation. It's how we would define it as disinformation is a kind of falsehood that's fabricated or distributed deliberately, presumably with the with the purpose to do harm and I suppose in real life things are a little bit messier. No one has a truth detector no one can map out all the things that should or should not be in a comprehensive database about disinformation. So we realized we needed to find a place to start and a small place to start. And we decided to start then with the actions taken by platforms and those working with them. So this obviously is not the whole story, but we feel it's a reliable data set as in the platforms may not tell the whole story but whatever they do publish is, you know, measured by lawyers, it's reliable. And even then though, having found this data set we found that it was actually a little bit messy and hard to navigate. So here's what you need to know about platform disclosures in one minute. And so Facebook, Google, YouTube, Twitter, they all release information periodically about what actions they take about disinformation campaigns or influence operations or information operations that might call them. And these are typically actions based on behavioral signs they've noticed so they might call it an authentic behavior deceptive practices. And the definitions vary across the platforms. And so do the frequency of communication so some are more frequent than others in their disclosures. And also the formats and the lens vastly differ across platforms so things don't emerge in the same way. And then sometimes these operations are attributed to a country or to a government or a specific entity. And sometimes there are very clear targets this you know this targeted, you know, the US for example and then sometimes not sometimes that information is not established. And so that's the platform disclosures. So then, beyond that, there are a small group of open source investigators and currently we include graphical DFR and Stanford that have access to the information. Before these these are released and what they do is they write up their own reports where they might give you more context and show what they were able to establish in addition to the platform. So that's what we currently index and our first challenge was mapping out all of the data in a way that made sense for readers make sense of all that, and to quickly navigate and compare it. So, just kind of dig into the weeds a little bit more on this. There's also kind of an understanding the relationship between the different disclosures and knowing where to find them so this is this is one example there's a Twitter disclosure in April 2nd 2020. It tells you about a specific network of accounts that was removed by Twitter violating their platform manipulation policy, which you may be interested in as someone who tracks this field or wants to understand the strategies behind these influence operations. What you may not know just by looking at it is that it's related to a campaign that Facebook removed at about the same time related to an Egyptian pure firm. And here by related we mean that it's there's at least a strong suspicion that we're talking about the same network. And you may not also be aware that it's related to a prior action taken by Twitter, by which we mean it's likely that this prior action that was taken in December 2019 was from the same network, or the same entity. And here's one more action from Twitter as well that may as well be related, and this time getting back to September 2019. So, how does this all tie together well one piece of good news is that there's another report that hopefully does this and that's from Stanford. Also from April 2020 that outlines how all of these actions are related. So the database does is it sort of brings us all together so you can immediately make those connections between them. And you, and I suppose just more quickly analyze the information. Next slide. So here are our goals, and we wanted to find reliable public information, we wanted to help create connections between different operations, we wanted to make it easy to merge this with other data sets if you're a researcher and you want to do that. And, and also just fundamentally answer some basic common questions, for example, how many operations involve country X, or show me all the information about operation one. So when we started the fellowship this year, we had a very basic website online, and we quickly realized that it needed some major upgrades. And this is what we've been focused on for the last few months. And so there are two views that you're seeing here the card view and the table view, and which are now available at this info desk or if you want to go to the next slide. So the entry in the database opens up to a detailed network card, which has all the general information such as which platform was affected which countries were involved as well as all of the related disclosures produced by social media platforms and the third party investigators. And you can also copy links to specific networks which you could not do before this fellowship. Also, we worked a lot on and shed it to Jenny for fantastic work on this and filtering so, you know, by country of origin, you can filter by name of company by date of disclosure by entity or individuals involved, and, or by policy violation as well. Next slide. Yeah, so our latest database improvements really helped clean up previous data and allow for filtering of named entities, target origin countries policy violations which were previously quite quite difficult to navigate. And so we hope this, this paves the way for the work visualizing the data that's kind of our next goal and, and also just sort of making the database and website a little bit more accessible. The next slide there. So our next steps. We're updating the website. We're doing some user testing and some more user testing. We're also hoping to reach out to the community so more, you know, journalists who work in the space researchers and people who work in policy, anyone sort of related to that so if that's you or if it's someone you know please send them our way or visit the database and let us know what you think. And we're also working on a few researchers are working with the data at the moment so we're going to hopefully showcase that. And, and we're going to do some more outreach to the platforms and open source investigators to say how we can better collaborate. And then finally something that we, we've been talking about for a while, and if we were to add new sources what would the new sources be and we don't have an answer to that question yet. Just want to run to the next slide there. Yeah, so that's it. To end, we would love to hear from users as I said, we're still ironing out the King's on the website, but that being said, if this is useful to your work or if you want to use it and get in touch or just get in touch please just send us an email. And I also want to mention that we're an official partner of Carnegie who've been incredibly supportive and insightful and they've really helped us to get where we are. And who we're going to continue collaborating with and then also just a massive thanks to Berkman again for inviting us back it has truly helps to get to a place that we just wouldn't have been otherwise. And thanks to Jim Mickens as well for his incredible guidance. And yeah, thank you. Wonderful. Thank you so much, Rona. And thank you to all the teams and I'm going to pass back to Jonathan to moderate our Q&A. Thank you, Hillary. It might well be apocryphal in which case it should probably be into Synfo decks but Mark Twain is reputed to have said that everybody's always talking about the weather but nobody ever does anything about it. And I'm just so struck with each of these incredibly tight incredibly detailed. And I basically say this as a compliment even though these days maybe it isn't erudite presentations that kind of grapple with very real problems on a, you know, down in the weeds rather than just with grand pronouncements which of course also have their place. Just so exciting to see what you've been cooking up collectively and to contemplate what might be able to happen next so just a huge thanks for the seriousness the detail orientation, and the commitment to building a world that we don't have yet. There's 20 that, you know, 25 years into the build out of this digital space. There's still so much up for grabs, and both in a positive way and more often portrayed and understood in a negative or invasive way and what to, what to do about it. So we have a little bit of time to just visit with each of the project groups again with a question. And for our friends from blind spot. AI, I guess this is kind of a two parter, which is, first, do you aspire to kind of more of a product or a service and by that I mean, is it. Can you drop off these kinds of really helpful, tangible ways that somebody who is about to implement or build an ML system, and expose people and their data to it, can actually stop for a moment and have a very constant way of kind of walking through it, or is it more like a good housekeeping seal or certification or some kind of thing where a company can say to the world hey hey this independent group. We've used their tool and they've given some form of blessing, however powerful or dangerous that might be. I'm curious what you're thinking about on that, and the second part only loosely related more of the educational function is from Amelie Sophie Vavrovsky, and she was curious about blind spots workshops with civil society organizations and you know what kind of ground you get into tech policy questions, such as how to regulate AI more generally, and how you balance technical complexity with accessibility in your workshops. I know Dan if you want to start with that. Yeah. Yeah, I guess I can maybe start with the first part I think, certainly when we were getting started we had ideas about some sort of certification. Yeah, potentially a product we would create that would certify companies saying that they had passed you know, a blind spot assessment. But I think in the course of working over the last two years we really found that it resonated more with civil society organizations that were in a position, like I said of wanting to advocate for changes, but really needing some specific recommendations that they could be advocating for. And I think that that was really a greater need right now. So I think that's, we've sort of pivoted in that direction sort of away from, you know, maybe certifying companies whom it just wasn't quite resonating with the same degree who may need something something different something more like a product to certify them into providing a service for civil society organizations, but it could ultimately turn into a product benefiting that audience as well. I think developing a product of a blind spot is I think maybe a stage we haven't gotten to yet but potentially when we're invited to do back to do assembly a third time maybe we could do that. I don't know what to take to move a car off the slot today, but that actually perfectly tease up Amelie Sophie's question about alright what would I expect to find at a workshop like this. I don't know if Hong is able to jump on because he's actually leading the workshop that is being done at Stanford in two days so he's more directly involved with it. Is he able to join. I can't speak to that. So, on Thursday we're running a workshop that's kind of a beta version of this one called product in the sense that the workshop has a script has a presentation has a participate a worksheet for participants to fill out. And we're hosting about dirty several society folks, and having them basically go do this workshop. And if it does work out well, a plan is to, you know, it's already open source but our plan is to train the trainers to have as many people be able to run this workshop on their own for other folks, and kind of get out of the way and have a community take ownership and evolve this workshop so that's the idea of spreading into as many people as possible. And regarding the complexity versus kind of the accuracy of the content. I believe the blind spots are pretty technically accurate and robust because we have Jeff, and Dan, data scientists who help, you know, come up with these blind spot spots and explain them so we believe it's very, you know, very clear but also very based on the technology in the science. We're happy to get any help if anyone can help us. Great. Thank you both and I don't know if there's a way to once again kind of blast into the chat room. If somebody wants to reach out to you all it might have been briefly in the slides. Certainly feel free to do that. But thank you both and thanks to the whole team for this effort. Let's move on to clean insights. For which so many interesting things at once I love how Nathan you just like dropped as a parenthetical the f droid store it's like yes it's a decentralized safer marketplace which people wouldn't normally take those two things together and that was a parenthetical in the larger presentation, although one that jumped out to me given the current epic battle between epic and Google on the antitrust implications of Google's app store. And I actually bring it up now because that's a great example of when people talk about remedies for problems that exist or what it would look like to build some guardrails when amazingly there are none and so many of these areas, something in the store is an answer to the question of well is there any other way to safely have third parties offer apps for your phone and you're producing that answer and similarly for clean insights you're producing a kind of set of tools meant to make it not much easier to undertake the complex work when you're not really getting paid to do it as a company of worrying about your subscribers or your consumers privacy, even as you're trying to make the most that you can that's the insights part of the data that you're impacting so my question here is two fold first. Do you think that there could be once you have demonstrated the utility of these tools. I don't know if they include differential privacy that was a question that mod of Sharma had to like briefly speak to that, but by demonstrating these tools. I think that it offers a roadmap to regulators to then insist upon the use of something like that, since they exist and it shows that you can, or is that kind of too specific, I don't know if you know the Europeans. The years really regulated cookies cookies specifically in browsers and was that kind of a wise move to get that much in the weeds or is that way too specific for regulatory perspective and I don't know, of course the team may have different views on that but I'm curious if you have any thoughts on that and second is that yes no screen that Apple in its fight with Facebook and others has now offered up in iOS to say hey do you really want to be sharing your info would you like to request not to be tracked. Is that screen, something that you think Apple should be revising that if a company were willing to do the sorts of limitations that clean insights would have them place on how they collect and use the data. Would there be some reward in a differently configured screen from the likes of IOS, so that you wouldn't have as many people opting out and apparently the statistics say a ton of people are maybe unsurprisingly opting out when presented with that screen. So, that's what I was curious what you all thought. What we have around a kind of consent engagement process is definitely not this, you know, binary, one time opt in. I think that's going to fail and the numbers are proving that it's failing and. In fact, what that was the same depending on your, sorry, hoping people would do, but it's wildly succeeding in the right kind of way. I think what's confusing about that it is someone could actually implement clean insights today in an IOS app and not be prompted with that because that's specifically around an advertising tracking identifier usage. So Facebook is being a bit disingenuous in that on that front, but we, it also is going back to fdroid fdroid actually is pioneered the idea of tracking code being an anti feature if you go into the fdroid for an app it'll say these are the anti features anti features that this app contains, and they've worked with exit exodus privacy saying how much tar they have exactly they just right in in your face and so the fact that that also we've we've said well. Now that you're implementing clean insights how do we talk about this if you're you've kind of we've crossed the threshold for you. So I think there's definitely a lot to think about there with that and we're awarding kind of a more nuanced approach and on the front of regulation, as well as differential privacy. Today, a lot of what we see with differential privacy are is implemented. First, you still gather all the data, and then you put the data in a private super secure database and then you implement data differential privacy at the analysis stage when someone wants to query the data. And there's still cases where maybe you know it's done in the right way that's got to you know it's epidemiological studies, and that's how you have to implement it. But there's so many cases where no that you know the people are trained to handle that there's so much liability there isn't you know as we've seen time and time again these databases get you know infiltrated exfiltrated. So I think we are hoping to show that you know you can still achieve the outcomes you want and needing to understand if your product is succeeding with a much more constrained way of measuring. And we just need to make it as easy as Google Analytics, you know, so that's been our approach so I mean, I think regulation, yeah, should be part of it, though I think there's you know for trained professionals that have a reason to gather all the data. I know they're open differential privacy project at Harvard for instance is operating at that stage and they have a really solid approach so it's got to be granular and in the weeds. That's where we're at. And finally just working off of a question be Covello just put in which was like this is so cool how do we help raise awareness etc. I'm certainly curious your answer to bees question along with it. What's the receptivity been so far are you seen as a friend or as a buzz stomper by companies that are working with a lot of data. And companies don't really want or need all the, like everything that something like Google Analytics provides you know they just want a few things there's a few things they need to understand. So they get they we present we talked to them, they're impressed with like how broad we thought about it from the way we aggregate and average and consent stage. And then they're like cool this sounds great it's it's a little bit like, you know, HTTPS or TLS in the early days was, yes, of course, but it'll, that's too hard on our servers we don't have the budget to, you know, x crazy. And then it just happens and so I think, you know, we're not seen as a foe I think in some ways we alleviate this anxiety the liability. They're worried about and they can reduce the what is seen as increasing these surveillance of Google and Yandex and others so we're definitely like, Oh yeah, that's awesome it totally makes sense. And then when they look at their budget, they're still struggling a little so we're, we're trying to do as much as we can, especially in the civic and humanitarian tech pro bono and I think it'll just take more time to get folks to adopt us like a good old HTTPS and end to end encryption so we'll be on that train and hopefully we'll we'll come come right along with it. Wonderful thanks so much if there's more, you're thinking people might do to amplify or assist please put it into the big chat room and look out for the coffee so we'll get a tweet storm going and drink our coffee. Yeah, it's coming. Terrific. So I referred to cloak and pixel, and that team. And it's funny I part of my reaction to this extraordinary project and approach is a point that Nathan was just ending on which was, there was a time when the web web connections were all unencrypted because it was just seen as two owners for servers to pull off encrypting every connection even Google at one time said it would just be too hard to do that and regular Google search. When you interacted with Google.com for web search was not encrypted and then at some point thanks to the good work of people it wasn't the weather. Everything did get encrypted it's I think we're at like 90 some percent and a random sampling of web links which interestingly mooted the big legal debate in large part about deep packet inspection since there's a lot less you can inspect when the packets are encrypted. And that makes me think about we're still living in a world that is like unencrypted web search or unencrypted website transfers with respect to releasing our photos online that if at any point any photo of you has a label attached to it with your name. Then any future photo of you whether or not on the web at the time just a brand new photo extracted from surveillance camera or from anywhere else like a picture of you walking down the street or attending a protest can now be associated with your identity and that's of course the point you all were making with clear view AI and I guess then the question back to you would be how automatic would you like this kind of invisible masking through adversarial attacks used in a productive way here how automatic would you like it to be would you like. A standard camera on an iPhone or an Android phone to automatically apply a little bit of this adversarial fuzzing so that wherever the photo goes next. It'll be a lot harder to match it up back to you through these new facial recognitions ditto for when you post a photo on Facebook and Twitter just as it strips out location data that might normally be captured and embedded. Should they be doing this kind of fuzzing, and is your answer affected by the point that Gretchen was making towards the end of the presentation that this is kind of cat and mouse that broken shield paper is pointing out that it's, you know, you can't really lean too hard on this technique, because you never know if there's a way to crack it later so curious how you all are thinking about a retail use of this versus a wholesale use. Well, I'll definitely let Danny and Gretchen throw in their thoughts here too but I think my first reaction would be to say, I take it as even a step back from what you were just proposing. You have to be automatic in the sense that that policy and the perspective of technology companies and anyone building products like this is user privacy first where you have to before you can even begin to collect that type of data, you have to get informed about the data you sent from the user. Right now we sort of live in a world where it's like, I don't like the idea that we would even have to use an adversarial attack to try and proactively prevent third parties from using machine learning or other processing on the data I'd rather nothing be none of that data be collected and that individuals have to provide that that data to someone who wants it in the first place. So that's a very, I think that is an extreme position not extreme in the sense that it's necessarily unreasonable but extreme in the sense that it's very far from where we're at right now so to get closer to where we're at I do believe that I'd like to see it implemented at the platform level level so for example, you know with Facebook and and clear view as we had talked about harvesting their data I'd like to see Facebook try and devise more more clever ways of protecting that data, perhaps by I mean, you know, there's information that you can only access by logging in and there's information that is only available if you manually make a public yourself. And that's how a lot of platforms have I think Campbell that sort of thing. But I think by and large, ordinary people don't really expect other people to come and just collect their information. And I think what people also don't expect is for that to be done in mass and the impacts of when you begin to collect things in mass, how you can actually figure out things about people that even you may not have even told people where you were, or who you were but by being able to create a social network which is what Facebook has done, but for third parties to be able to create a sort of network from information like Facebook and other sites on online being able to draw this, this picture about individuals that they haven't offered themselves knowingly. That's where I take issue with it. I would just, I'd like to see more done on the platform level. Outside of something like a tool like equalize, or some of the other tools where the user has to manually do something themselves. Danny or Gretchen, do you have anything you want to say to that. Yeah, so I think absolutely I would like to see this sort of security measure, much more automated, I think individual consent and the burden on individuals to opt into any kind of security measure that they want, especially when they have to go further into like a really simple opt-in but I think the presumption should be that users don't buy posting something on social media, for instance, they didn't mean by that that they wanted every possible use by everyone in the world to be fine. And we have a lot of precedents for that in intellectual property licensing, you don't have to give it all away, and no one assumes that you gave it all away by allowing one use by allowing a limited license. The fact that adversarial attacks won't always work is no reason to say that they shouldn't be one of the layers of security that we use, because you could say this about any cybersecurity issue like cybersecurity is a huge problem. Like, we see major problems in the news, like, in the last few days we've seen some major problems in the news, and it's, you know, exponentially exploding but we don't say well just because whatever you're thinking about using now is probably going to fail if not for you for somebody else we don't say well don't try. Yeah the existence of locks doesn't, locksmiths doesn't mean you shouldn't use locks or a mask might not be 100% effective but you still might want to wear one. Yeah well if it's going to be like locking your door you know well it'll make you know make them attack the other guy instead. So this and you know we had some other methods that Tom implemented this semester, the steganography where there's an encoded that you can't see it a human user won't see it, but it would be easy to uncode or a watermark. So there's various kinds of signals, which are then engineering solutions, which if you combine them with either public opinion pressure or law, then there could, and then combine that with an automated things will be done, you know with these marks in less otherwise. Yes. And, you know, and society doesn't want you to which is not necessarily the same. You know process these things. If they have these marks and they will have these marks unless someone specifically said they wanted to give you permission. No, it's just genius to offer a technical proof of concept so that it gets out of a stale dichotomy that says look either people are sharing their photos or they're withholding them if they share them this is just part of the cost of it. And this is a way of saying no you could share some but not all. And if somebody wants to really go to lengths to get around it it's showing how much they're trying to override the desire of the person, reducing the photo. You should probably move on to the next project Danny I don't know if there's anything you wanted to just throw in at the very end or. No, I just just, you know, saying that the engagement and the conversation aspect of it is also like really an important part of all of our projects I think because like all of these issues are things that we should all as citizens be talking about, sort of like civic duty if you will. And that's that's that's the only point I was going to make. No, and nicely connects to a point that Laura cast made in the q amp a about how to educate technologists and consumers about equity and privacy at scale, and to do it as much by showing, rather than just telling, maybe as a part of that puzzle. Great to mention that function the project is serving and really all of the projects. Terrific well onward then to the data nutrition project, for which it also feels like such a reminder back to Nathan's observation about the web pre encryption that the idea that the state of best practices for even big companies, using machine non data sets is not to have any labels on the stuff of enduring quality, including if they get the data set from some public source or from somebody else it's really like Christmas fruit cakes getting passed around. Without even an original tin indicating what's in them and the idea that like that's the state of the art in 2021 is just mind blowing. So it certainly shows just how much of a role this project could play. And I guess this is kind of pivoting off of Joseph Ben Simon's question, wondering how hungry companies are for this kind of thing do they see it as a hassle. It's good enough for government and corporate work. It's really hard to label stuff don't let's just keep going as we go. Or are they open to it, and should regulators be open to saying, you're going to be mucking about with big data sets like this and making recommendations or decisions for you or your customers. Based on them. Maybe you need some basic labeling some curious, again, a kind of similar question how how should this fit into the ecosystem. I can, I can start and I'd love to hear what others in the team think because we've all had conversations with different folks. It's a great question. I think it's also changed over time. Our conversations I think we're more aspirational at the beginning and I feel like there is now a greater appetite for cult self regulation, or just like internal standards, especially in large organizations where you have many people who are touching the same data sets and maybe the same models, because people are quite frankly worried about risk and liability. And so one way to kind of want them to be. I think so, I would like them to be. Yeah, as someone who's also a practitioner in industry I think that that's a good idea. And I think that having something like a label is actually a nice, a nice tooling solution that doesn't require huge regulation from outside to say you have to do it in this particular way, but maybe allows folks to show that they're making. You know, they're trying to provide something along those lines. And so I think that it's definitely a moving, a moving industry and a moving ecosystem but I'm definitely seeing at least from the conversations that we've been having increased appetite for something and that's also why we're not so tied to the DNP label as a standard specifically but also as a methodology, or even just as a conversation to say you might want to, you know, track different things on your data sets internally, and on your models internally, but really you should do something. And that's kind of I mean you're right there's just like nothing so even just saying you should do something seems to be a good move in the right direction. And others on the team have, have heard as well because we've all had different conversations. Can you guys want to jump in. I agree with Kasia on that. I think that, as long as there's an awareness and a demand from the consumer. It should be available for big tech companies, because their main priority is profit and if the demand is for what we do, which is the data labels, then I think that it would help a lot. But we also live in an era where we're kind of blind to the data and information and a lot of before going into this team I wasn't even fully aware of all the practices, unfair practices that were being implemented in these companies and I think that a lot of the consumers aren't aware as well. And it takes a lot of research and actually just being in that industry and and following them to actually know that. So I guess the bigger question is like how do we create that awareness, so that the average consumer is also concerned. Yeah, it connects right back to Laurie's question around education, either so they can somewhat self protect or even more important what you're suggesting is be aware enough to pressure the companies and the those who regulate them to create some idea of like hey maybe we should be doing some of the basic stuff that would respect the gravity of the kinds of technology that we're building. Any other thoughts before we move on. I would just add that in, I think we have a mean there's some very well intentioned people at some at the big tech companies of course, and they also because of their business models have sort of mixed incentives not the individuals but the companies themselves mixed incentives. So I think we have this multi pronged approach in terms of raising public awareness and having public demand, also doing broader education including to to those who are working in the spaces. I'll just, oh sorry. Yeah, no real fast. And I'll just add that you know companies also are in monoliths and pressure can be both external and public as well as internal because a lot of people we talked to our data scientists so maybe. I don't know what say like the executives of a company might want but a lot of the data scientists we talked to really do want this kind of stuff. Yes. Thank you all very much. I've been informed that there's nobody barging into this zoom room for the next scheduled use. So we'll be able to so long as people have the time to take the five more minutes to make sure we check in with this info decks before we adjourn. So, speaking of that over to this info decks, and I think a similar curiosity about how much you have found you're just trying to make the best hey you can out of what the companies already share in their disclosure reports, versus trying to encourage them to say a little more say a little bit more consistently, and try to create a true public good cooperating rather than competing in this area, around what they're seeing and when and trying to do it in a way that the public can see just sort of whatever Council of Elders there might be that Twitter and Facebook get together to compare notes at the security level to be doing in a way that everybody can look into and I'm just curious, your sense of the willingness and appetite of the companies to look at something like this info decks and say where have you been all our lives this is great we're in will do more. I can kick off there. And so I suppose yeah that's definitely something we've been thinking about a lot. And as we said it's a really evolving space so the frequency with which these disclosures come out very widely. And as I said as well we're also looking at contacting the platforms ourselves are working more directly with them so for us really it's been sort of trying to build something on the fly in an area that's constantly evolving. And the platforms themselves are constantly evolving their own disclosing practices as well. And to your point as well about the clean, you know the data and the fact that they all call it different things, you know might be similar but they all call it different things that was a big challenge and something that Jenny helped us out quite a lot with was trying to streamline the data in a way that we compare it. But in a way that didn't distort the underlying data so I suppose just just to answer that it's definitely something I think the platforms are still working through and it's something that we're so working through as well we're trying to be flexible in terms of our database and how we enter data and hopefully at one stage we'll be able to get to a point where the data can be submitted to us that's a hope and aspiration for us, but right now it's it's currently very manual. Yeah, it's a really complicated problem. David O'Brien was remarking on how hard this sort of thing was five years ago and I'm not sure if it's gotten easier or harder since then probably vectors in both directions Jenny. Oh, sorry, yeah, wasn't just commenting on that but I think, especially the idea of like the concept of a network is something that hasn't been really familiar, like even from last year we were thinking about this from a disclosure or report perspective and this year we really shifted gears to describe more of like a network of activity and that's something that I think even the platforms are still kind of getting to and it's something it's one of our limitations from being able to extract broader ideas like cross platform for example. So I think the industry will have to kind of co-evolve with our project as it goes on. Got it. I might see you've appeared I don't know if you want to say something real quick. Oh no was just. It was a mistake. Absolutely. Other than that I have a wonderful job in this space so I can speak here. Thank you. Thank you all. Well, I, I think many of us were struck by the Heather Burns observation if you added digital to a thing that is broken you have a digital broken thing. Of course it's faster and cheaper. It kind of gets to an observation to that was just made that it may be that I think this is Newman who was observing this that this isn't just a matter of official corporate policies that we can try to get the companies to subscribe to inform or a certain person that would require all the companies to meet a certain baseline level but the role of kind of bottom up work and people within the companies who are building things that are digital and that may have started out broken and how much they are to see as within the scope of what they're doing whether or not they've been asked to buy the their bosses to shine a path towards hey I didn't just make it digital I kind of either fixed it or made the flaws more apparent that we're already there. There is a way of respecting the uses to which this will be put and the people's lives that will be touched by it and thinking about the ways in which there's there's becoming to be more awareness among tech workers of the power that they may have role in which that they might play is a way of seeing a latent professionalization of computer science data science modeling that the lawyers and the doctors have too long thought that may they may be only had a quarter on to see these kinds of bottom up solutions be offered and contemplated at all levels of those who might be able to make use of them is extremely gratifying and inspiring. So I just want to thank you all both this team but more generally everybody from the cohort this year for not being tomatoes for saying we're people this is technology that's affecting people and seeing that there are problems and being ready to work on them and shine a path forward for the rest of us so thank you again for all the work that you've done this year. Thanks again to our staff and advisors for augmenting it and I really expect that this will just be a way station on the path towards improvement of our digital world and a model for how we might assess and improve our digital world alongside all of the other models we have to try to put some guardrails and even a direction change on some of this stuff. So thank you all again Hillary why don't I turn it over to you in case there's any logistics I have forgotten but thank you all for a terrific session and for all the work that's gone behind it really amazing. Yeah just echoing you Jonathan thank you so much thanks everyone for joining us and this is recorded so I will live on the event page on site so if you want to share it with anyone or watch it again through the slides again it will be there as a resource and thank you so much and have a good evening or good afternoon morning wherever you are.