 The camera even my mic turns off so yeah, okay, so I'll start. Thank you for having me. It's so awesome to be here at false Asia again My name is Bob Reyes, and I'm from the Philippines. I'm a Mozilla refs mentor and tech speaker Yeah, if in any way between my my presentation my audio gets clunky, please let me know by the chat Just a short note how we started Mozilla is an offshoot of the Netscape Communications thing so it started decades ago for those who do not know for the kids out here Listening tonight. So yeah in Mozilla.org is from Netscape. So if you are using far Fox, thank you And that's basically you're using in one way or the other Netscape As an open source and nonprofit organization Mozilla has a mission and our mission is We want to build a better internet and for us to build a better internet It is because we want to ensure that the internet will remain as a global public resource That is open and accessible to all to have an internet that truly puts people first So that individuals like you and me can shape our own experiences so that we can be empowered safe and independent when we go online Here in the Philippines we have been operating since 2009 as an online community And we have a leadership composed of Mozilla refs regional coordinators and core members What we are what what's keeping the community busy is that we are localizing far Fox Both for desktop and in mobile to different languages in the Philippines like Tagalog, Cebuano and Hiligaynon And we are also advocating free and open source software online security privacy through means of social media online events and participating in lawmaking procedures and many more Right now we are populating a learning management system with contents related to localization Privacy web development and Ross and when we talk about Mozilla, we are best known for far Fox and To those who do not know This must this may came as a surprise, but when we talk about far Fox We're talking about the family of products where we have the browser Far Fox monitor and a whole lot more it also far Fox also has versions for mobile devices both Android and iOS As well as so this is the latest version now as well as we have a service called relay It's basically an email Relay service or forwarder if you do not want to give Websites that you do not trust your real email address so that you can remain somehow private And they will there's a way for you to stop this websites from spamming you We are on social media just go to link three slash Mozilla pH to know where to touch base with us Okay, and now for my talk Mozilla common boys It tries to answer One particular problem. Okay Common boys is our initiative to help teach machines how real people speak Okay, some may say that there are data sets that are readily available. Yeah, some of them most of them are not free large companies Especially those who are in the internet they collect and Produce data sets, but most of them majority of them They will not be available to us common people. Okay, if you have a project that deals with machine learning AI and related to voice You may try to seek Permission and ask for data sets from these companies, but it will be hard Okay, so and we at Mozilla thinks that this somehow suffocates innovation. Okay, so common voice is a project To help make voice recognition open and accessible to everyone The website is at common voice that Mozilla that org now The data sets produced by the common voice project is Publicly available and it's free it it is powered by voices of volunteer contributors from around the world People like you and me who want to build voice applications can use the data set to train machine learning models for free Why? Most of those available Data sets right now. They are biased in what way they over represent white English speaking males and That's why most voice-enabled technology will not work at all for many languages and this is where It does work. It may not perform equally well for everyone And when we try to create awesome apps that deals with voice, okay we need to determine how much data is needed and Say you are going to create an application that is command that will use command-based models You will only need somewhere between 1 to 300 hours of data. Okay This is for limited or fully known vocabularies For example, if you need to create a voice assistant without general queries as simple as car infotainment controls play Next song something like that simple media and navigation commands You will need a data set of voice that is somewhere between 1 to 300 hours Now if you are going to create something that needs limited vocabulary continuous speech recognition like Those who will be those apps that will be needing technical speech Then you will need a data set somewhere between 300 to 1,000 hours if you want to create something that will need Near human accuracy for automatic speech recognition or ASR and it will depend on the language You will need somewhere between 1,000 to 2,000 hours. Okay of data sets And lastly if you are going to create something That will have very high quality general large vocabulary Continuous speech recognition model you will need at least 2,000 hours up to something like 10,000 hours and if we know how many hours in a day are there and If your data is limited to create certain data sets, it will be hard to to collate all of these requirements That's why we intended to crowdsource this so how? Mozilla Common Voice As I mentioned is a crowdsource project So it works like a conveyor belt and it has two parts the text corpus and the voice corpus If one process fails the production rate or quality drops. Okay, so Monitoring Dedication time and crowd is needed for this project to work. So for the for the text corpus What we do is we add sentences And then we qa or we control we sentence control. Okay, and then for the voice corpus We record sentences. We read whatever was inputted by the text corpus And we also listen. Okay to validate. Okay later. I will have a demo for this Common voice uses conversational text or speech And it is general purpose. It is not specific to an application method or model and Just a quick note The data set produced by common voice Is not clean yet. Okay, so it's not a clean data set But for both text corpus and voice corpus entries so anyone can suggest Text sentences that may be used for common voice project. They should be Creative commons under the creative commons or cc0 no copyright reserved and they should be in the public domain uh just a quick Requirement the text corpus collection must be a maximum of 14 words with up to 100 characters And for the voice corpus to work the collection will be of a minimum 1.5 seconds to a maximum of 14 seconds Right now what we're doing in the philippines is that we are trying to Have a tagalog listed as one of the languages in common voice right now. We don't have it yet but Thankfully, we have farfox the web browser already translated into tagalog Tagalog is one of the languages that we have here in the philippines and it is widely spoken uh And then common voice in tagalog is currently under development with only 821 sentences out of the 2000 Required so for a language to appear there should be at least 2000 sentences that Is accepted and for validation by those who wish to contribute to the project So this is how it looks like from the dashboard perspective The common voice tagalog website is 45 localized and we have 821 out of the 2000 Sentences required. Okay, so there is a sentence collection website Where in people can Contribute and suggest texts to be added for validation or recording later on So we need your help whatever language you are speaking um If it is still not listed under the common voice project Then feel free to request for that language to be Added okay, if your language is already there in the common voice website Then this demo will be for you. So i'm going to uh to conduct a short demo Uh, okay, i'll switch to my other screen Okay, so this one So when you go to common voice that mozilla.org you will see this okay, so you'll be Uh presented with two tasks that you can do it's either you speak or you listen. Okay, so let's go with Listen first Okay, switch So when you click on lesson, okay, it is when you try to help us validate voices. Okay, so you will be presented with texts, okay I'm using the english language here and then you can click on the play button You will hear recorded Contributed voices by volunteers if you feel if you heard that They they they read Whatever sentence is presented correctly, then then just tell us yes, it's correct. So that we are able to train Okay, or the machine to say that this is properly Uh spoken by whoever contributed or volunteered to record your voice Okay, that's why we are we keep on telling people that we need their help We need your help For you to donate your voices to us by the common voice project. Okay, so it's something like this if I heard it Okay, so I heard it arrived. So yes, and then you will be given five and take note I I haven't logged in okay It is not required for you to log in to to be part of the project But if you want to track way or contributions, if you want to know how many Sentences you have validated already, then we suggest that you sign up in the common voice website So here and log in you may use your farfax account to To to sign up or to log in to this website. So that is for the lesson portion for the speak portion This is where we ask you as our volunteers contributors To record their voices. So you you will simply have to hit the microphone button and then I record your voice reading the particular sentence presented on screen Okay, so and once you're done you can go on and per task you will be given something like five items and again I did not log in To to do this particular task. So it's optional for you to log in Uh, but if you want to track your participation, then please do so log in Now for the data sets can click on data sets from the common voice website and You will be given The options for the languages and even for the version Okay, so we are currently at version eight and for the languages This are the different languages currently available for download So this is english the last update was in january 19 2022 The file size is something like 70 gigabytes and it has an overall Total of 2886 hours and validated ours is 2185 The format is mp3 and it's the split is something like 24 percent is 19 to 29 per Years of age and 13 percent is between 30 to 39 with 46 male and 16 female So, um the data set entries are consists of unique mp3 And corresponding text file. So if you have projects that will require something like this, uh, you may want to try it Okay, and then hold on I'll be going back to my other slide for common voice Uh Resources and links you may go to common voice that For for the data set you can go straight to common voice dot mozilla dot org slash data sets Uh for the sentence collector if you want to contribute Uh sentences that are in the public domain and uh under the creative commons Uh to just go to common voice dot mozilla dot org slash sentence slash collector um, if you are speaking, uh The following Filipino languages We ask for your help, uh for for us to localize different mozilla products including far fox uh in desktop and in uh android Uh, if you are speaking different language And you want to be a part of mozilla localization. Just go to pontoon dot mozilla dot org and search for your language Anyone can join uh and contribute to this, uh awesome effort And if you have questions, uh, yeah now