 Everyone, hope you enjoyed your lovely tea break. The next session is hosted virtually by Ashley Boyd from the Mozilla Foundation. So welcome Ashley, thank you for joining us. Hello everyone, thank you so much for having me. I'm Ashley Boyd Mozilla, Senior Vice President of Global Advocacy. And I'm so delighted to join you on this last day of Wikimedia. I'm pretty sad that I am not there with you in person in Singapore, but here's hoping that I can join you in person next time. I'm actually joining from Montreal because Mozilians from around the world will be gathering here soon for our all hands. And one of the reasons why I'm so excited to join you is because Mozilla and Wikimedia have so much in common. At the core, our mission and our work is centered on community. And a hint, I'm gonna be talking about community and AI today. All right, we're gonna go to the next slide. So let's get started. The official title of my session is AI Systems and Transparency. One of these is a really hot topic and the other is not getting the attention it deserves. Of course, you know that AI is the hot topic. It's so hot that it seems like conversation and debates about AI are everywhere. You might have seen headlines like these on TV or in your social media feeds or stories like these in your news outlets. I'll be honest, the explosion of interest in AI has taken many people, including myself by surprise. Until recently, AI was a distant technology solving big problems like finding cancer or mapping stars in the galaxies. But now AI is readily available in consumer technology and ready to write memos and papers for us. And even though few people have actually used generative AI yet, it's already impacting our thinking about AI and our future. So we've entered the deep end of AI whether we're ready or not. So now what? Is this technology good or bad for humanity and for individuals? Of course, AI can be a force for good. One early example is how AI is advancing scientific developments in healthcare. As this headline shows, a recent research study found that AI could detect breast cancer in mammograms as accurately as a radiologist. This translates into more cancer detection and better care. And as someone who has breast cancer in my family, I feel relieved and hopeful about an advancement like this. On the other hand, we're also aware that AI can cause harm. We're seeing more and more examples of harm as AI systems generate more media coverage and scrutiny. For example, AI systems can deny entry to people desperately in need of refuge and even threaten everyday security tools like passwords, a vulnerability that could have far-reaching economic consequences. So clearly AI has arrived and it's not going anywhere. And we also know that AI can have both positive and negative impacts on people. In this new era, we're gonna have to work really hard to ensure the positive impacts of AI outweigh the negative. For this reason, we at Mozilla have been investing in trustworthy AI strategy since 2019. To advance the trustworthy AI feature, Mozilla is investing in new technologies, pushing companies to improve their AI-enabled products and funding people who have innovative ideas in this space. I always get asked the question, what does trustworthy AI mean? For Mozilla and for our work, trustworthy AI is the future in which people have agency within AI systems. Agency can be as simple as knowing when you're reading AI-generated content or understanding how an AI system decided whether or not you're eligible for a bank loan. It's also a future where people who create AI systems are held accountable for the impact of their technology. And this is where transparency comes in. Transparency is a necessary precondition for accountability. How can we hold AI builders accountable if we don't know how their tools work or what impact they're having? I'm gonna stop and linger on a really important phrase. That's really important to me, the people who make AI. Don't ever fall into the trap of humanizing AI because that's where we lose our fight for accountability right from the start. People build and deploy AI systems and it's people who must be accountable along with their companies or organizations. I'm gonna be honest, advocating for and winning transparency in AI systems is really difficult. Even getting transparency measures like nutrition labeling on food products was difficult in the US decades ago. And what went into food was pretty straightforward comparatively. Food companies fought accurate nutrition labeling because they were worried that if consumers knew what was in their food, they would make different choices. They were right to be concerned. Studies have shown that nutrition labeling has impacted people's behaviors. Transparency in the food industry also happens at the systemic level through regulation about how food is sourced, stored and packaged. So let's consider how transparency within AI systems can be meaningfully applied. The information we want to be transparent will drive the accountability measures we can pursue. Right now there's a significant lack of information about how AI systems are built and what impacts they're having. And in the context of AI currently, there's transparency, little transparency at each of these layers. In some cases, the builders of AI have this information and are actively trying to keep it hidden from others. Even we're concerning some of this information, particularly about impact is likely unknown even by the builders themselves. Without more information, we can't improve these systems or stop tools, AI tools that are fundamentally harmful. That's why transparency is so fundamental and a key element of Mozilla's work. All right, well, this all feels a little overwhelming even to me, even though I work on this every day. But I want to talk about the really exciting part of this story, and that's where I promised we'd end up. The hope lies in working together as a global community. It's something Mozilla and Wikimedia know how to do, inspire and enable people to create knowledge at every corner of the world. In short, people are the answer to the question, how will we hold AI systems accountable? AI is here to stay, but so are people. People must always sit along with AI systems to ensure they improve our lives, not diminish them. But what does that really look like in practice? Especially since I just shared how difficult this transparency game really is. I'm gonna share two different examples of how Mozilla is mobilizing our community to advance transparency in AI. So the first example I'm gonna give is a Mozilla project called Records Reporter. And I'll tell you a little bit about how this project was pursued and how we got started and where it's ended. In 2019, we were growing increasingly concerned about YouTube's role in spreading this information globally. It goes without saying YouTube has a huge impact on the way people consume and think about information. But YouTube's impact on spreading this information wasn't getting much scrutiny compared to Facebook at that time. To validate our concerns about YouTube's role in misinformation, we asked our community to tell us about their experiences with the YouTube recommendation system, including whether or not it led them down rabbit holes. Content rabbit holes are those experiences where you search for one type of content and keep getting more and more extreme versions of that content. Some people get out of rabbit holes and recognize what's happening, but they can be very, very difficult to come out of and some people go deeper into them. The stories that our community shared with us were highly concerning. They included examples of being served misleading and irrelevant content on YouTube, including conspiracy theories, violence, hate, and body-harming practices. These unwanted recommendations couldn't be easily controlled and the effects were toxic, if not dangerous. But these stories in hand, we reached out to YouTube's executives directly and asked them a really simple question. Will you let independent researchers study how your algorithm promotes disinformation? They declined and seemed quite upset that we were even asking. In response, they issued statements that amount to two, trust us, we've got this. But as I like to say, the era of grading your own homework is over. Clearly external scrutiny was needed. How exactly we would get information from a global platform without their consent was less clear. In the end, what we created is what we see here, a tool called Regrets Reporter. It's a browser extension that mobilized over 30,000 volunteers globally. And the way it worked is volunteers donated their YouTube viewing history, of course, with their consent so we could study how its recommendation algorithm works. This study was groundbreaking and developed a new crowdsource method for holding YouTube and other platforms like it to account. Our findings showed that YouTube recommendations promoted misinformation about COVID-19 and other dangerous topics. It also revealed that non-English content was most affected. In short, people who used YouTube in languages other than English were more likely to get recommendations with pros of misinformation. Of course, this probably isn't a surprise to you and wasn't to us, but to have it validated through this large crowdsource project was really fundamental and really groundbreaking. To make matters worse, much of this content violated YouTube's own guidelines. In short, their own system was recommending content that was not allowed by their own platform policies. This all painted a dire picture. YouTube's AI systems had no effective human oversight. I'm pleased to show that this work had a significant impact in a couple of different ways. First, the scrutiny of YouTube's policies and practices has increased significantly since 2019. YouTube has pledged to make changes and some studies show that these changes have had positive impacts. Additionally, the Regrets reporter research had an impact in regulatory spaces. For example, it was referenced in the EU Commission's Digital Services Act proposal and then that proposal has now passed and the EU Digital Services Act ensures that researchers have access to platform data. This will help grow the amount of public interest research into algorithms and their impact and help drive changes in these products. We're really proud of what our community uncovered, but of course we had more questions. One key question we had was, what is YouTube doing within the platform to respond to people's feedback on content? YouTube provides a thumb down way of connecting with content and telling them whether you like it or not, but why do we keep getting the same content even after we use the thumbs down dislike button? So we use a similar community-powered platform to study whether YouTube's user control features work. Again, tens of thousands of community members shared their YouTube browsing history and actions with us. Unfortunately, we found that the dislike button did not work consistently at all. In fact, the dislike button is like pushing a button in an elevator and going nowhere. This study put pressure on YouTube to give people real agency in their platform. It's also attracting the attention of regulators who are interested in the areas of deceptive design. And we're not stopped yet. We're next step, we're gonna be developing a crowdsource research tool to study TikTok's algorithm and stay tuned as we'll be recruiting contributors sometime in early 2024. This growing body of work illustrates how people can hold AI systems accountable by acting as independent watchdogs that scrutinize AI, steer regulation, and inform product changes. Now I'm gonna show another example of people creating pressure for AI transparency and accountability. This example shows impact from a different angle. It's the power of creating an alternative product that can be transparent from the start. And that way changes industry norms. And you all know a thing or two about these kinds of projects. So I'm gonna introduce you to Common Voice. Common Voice is one of Mozilla's most beloved projects because it's developed and maintained by our community, like with the media projects. Common Voice is the world's largest open-source voice data set. Mozilla developed this platform for a simple reason, a growing language, the growing language inequity in voice-enabled projects and products. Most voice-enabled AI applications don't recognize non-European languages and dozens of other underrepresented languages. This means that people who speak most languages in the world won't have full access to information. After launching, seven years ago, Common Voice now includes voice data in 128 languages and provides the data set for free to small developers and communities who wanna build technology for their communities. But we're not stopping there. This is a relevant case study about transparency because Common Voice audits and transparently reports on our own data limitations and how data may introduce bias into models trained on this data. This is not a common industry practice. And unfortunately, our audits demonstrate a lack of gender parity in our voice contributions. And also because the sentences read aloud by voice contributors are drawn from open licenses, they tend to be outdated and not fit for many modern conversations. We're not satisfied with our own work to audit Common Voice's data. Through our Voice's awards competition, we supported people and projects who are building auditing tools that can spot and stop bias. For example, Common Voice is used to power the public interest projects like the Embassa AI Chatbot. The Chatbot provided vital information about COVID-19 in local languages in Rwanda. It was trained using the Common Voice dataset and these voice contributions were gathered by local community members. And I've got great news, everyone can contribute. So I have a short video from my colleague, Jessica Rose, of the Common Voice team to tell you more about how you can set up a new language in Common Voice. And after that, I'm happy to take questions from the audience both about Common Voice and other aspects of my talk. So let me make sure I can get this going for you. Hello, everyone. My name is Jessica Rose from the Mozilla Common Voice team. Today, I'm going to be giving you a short tutorial on how you can get a new language from Common Voice. Right now, Common Voice has 128 languages, live and new collection. We also have 21 new languages in the Sendence Collection and platform localization basis. While 120 sounds like a lot, it really isn't. It's a drop in the ocean of so many other languages. And that's why we need knowledge creators like Wikimedia volunteers and all of you at Wikimania to help us have as many languages as possible on the platform. Let's look briefly at how a new language is introduced. Are y'all ready? There are three steps to add a new language. First, request one by filling out a simple form. Then, localize the Common Voice website using our localization platform, pontoon. And lastly, collect or write copyright-free sentences for our voice contributors to read. For this tutorial, let's look at how we might add Javanese to the Common Voice platform and data sets. First, let's check to make sure that Javanese isn't already in the Common Voice platform. I don't see it. So let's request it using that simple form. You'll need to send our team a message telling us about your language and remember to include an email address so we can get in touch if we have any questions. Let's just use the Common Voice email address for now. Include an ISO code if available and any other links that might help us better understand aspects of your language. The ISO code for Javanese is JV. And I know this because I got it from the Javanese Wikipedia page, so I'll include that as well. Once this is sent off, a member of the Common Voice team will help add Javanese to our system. The next step is to localize or translate the Common Voice website for Javanese users. Let's get some sentences into the system. I'm doing this in English because I wouldn't be able to write good Javanese sentences. We're using the sentence collector tool. We can use it to write a new sentence or import sentences from copyright free sources. Each sentence should take roughly 10 seconds to read out loud. Now, once other community members have viewed the sentence and validated, it will go into the text corpus for voice donors to read. Once Javanese has 5,000 validated sentences, voice contributors can begin sharing voice data by reading the text prompts presented. This data will be shared freely under a CC0 license to help make digital products, research, and services better understand Javanese. After samples are collected, they're verified by other community members and then sent out into the world via the dataset. Thank you. Together, we've taken our first steps to teaching computers how to better understand Javanese and made voice technology more inclusive. Can't wait to see you on the platform. Hello, everyone. My name is Jessica Rode. There we go. I hope you appreciated that quick tutorial about common voice and feel inspired to add Javanese or other languages that you'd love to see in the common voice dataset. You can use this QR code or visit Mozilla or commonvoice.org to get started. And I think that you got a sense from Jessica that there's a lot of support and a lot of community connection in the common voice platform to help you along the way. So we hope you'll join us. And I will take any questions that you might have for me. I'm going to hand on my facilitators in the room there for helping me know whether there are questions and where to look for them. Hi, Ashley. At the moment, we don't have any questions here in the room. Wait. All right. Well, I can wrap up then. Yeah. Thank you so much for having me. And I hope anyone will reach out if they have questions about any of the projects that I shared or other transparency and AI initiatives at Mozilla. Thank you so much, Ashley, for joining us. I know it's late your time where you are. Thank you so much. Thank you. Bye-bye.