 Hello everybody. I'm Wicke Lucas OO. I'm an administrator and bureaucrat on Lingua Libre. I'm also a Wikimedian in residence on this project and I'm happy to be talking and presenting Lingua Libre at this online Wikimania. It's my first Wikimania event and I'm really glad to be here. I'll start sharing my screen. So we will just ask ourselves what is Lingua Libre and how to take part in it. First of all I will just talk about the project and explain it in a nutshell. Then I will demonstrate how it works. We will look at its impact and then it will be your turn if you like to. So the goals of Lingua Libre is to document oral linguistic diversity in order to feed the Wikimedia project with audio content and to bring new linguistic communities to the Wikimedia project. It originates from the French chapter Wikimedia France and was made with the help of contributors from the Wiktionnaire. Some linguists and speakers of French regional languages. And Nicolas Villon, the creator of Stuka which was a PC software made for recording pronunciations. And Lingua Libre was founded thanks to a project grant from the foundation provided to Odyssey and some funds from the French Ministry of Culture. It works like it's a wiki like other wikis so it runs on media wiki and it has its own custom skin which is quite fancy, blue LL. Wikibase is installed on Lingua Libre which means we have one element for each speaker, one element for each language, one for each recording and it allows us to make sparkle queries on Lingua Libre. The main object of Lingua Libre is media wiki extension which is based on voice activity detection. It's called the record wizard. We'll talk about it later. And after that all the files are sent automatically to comments via authentication on behalf of the user and the data sets can be downloaded in zip files but it's a work in progress. They're not quite up to date. The advantage is to reduce the steps in the classic workflow for recording audio. So as you can see usually you have to record words in a dedicated software then to clean them in another software then you have to upload them one by one to comments and then you will have to edit each wiki you want to put them to. And on Lingua Libre you just have to start the recording, pronounce the words one after the other and the software based on the voice activity detection will just cut between your words and then it will trim them and clean them and upload them automatically to comments. And then there's a bot, I'll talk about it later, that will edit local wikis to add your pronunciations to it, to them. So this bot is called Lingua Libre Bot. It's coded in Python and you can see the code on GitHub. If you wanted to work on your wiki you just have to fill in a request on this page on Lingua Libre and of course it will need the bot status so you will first have to start a discussion on your local wiki and vote for its status. I will now do a little demonstration of Lingua Libre. So you need a wikimedia account for this so if you're not logged in it will just redirect you to comments and via the oauth token you will just allow Lingua Libre to interact with the comments on your behalf so you authorize your microphone. You can check if it's working, test. I don't think you heard it but I got a feedback telling me it's okay. Then you can just fill in a speaker profile so as you can see I have several speakers. That's very useful when you want to record several persons without having them to create a wikimedia account. So I will just use my main speaker so that's the default from my account. Here you can just enter the languages you speak. So now I'll be using English and for each language you can select level of proficiency. So English is not my native language I just said I was I don't know average or good speaker. Then you can select your place of residence and the license that we'll be using for on comments for your files. The next step is the is choosing the words you want to record. So there are several ways you can just enter them like this. So let's say dog cat. So if you just want to record a few words it's okay to do like this. But we also recommend to use the other options here. So you can use a local list. I don't know if there's no list named fruit. For example there's a list made by Olaf Polish Wiktionary contributor which is edited by a bot. And for more than 70 languages it just ranks the words that are described in in the more the biggest number of Wiktionaries and that don't have audio pronunciations on common. So here you just see all the words. It's more than 300 words and once for example I will once I record all of these words like four or five hours later. I will just set the list with the other other words that are not that don't have all just so that's just too much words for now. You could also use the nearby option. You can just enter your coordinates and it will just from wiki data. It will just fetch the items around you. For example a church or a hospital or a park and you can just pronounce them and it will go on to the item on wiki data. Thanks to Lingua Librebot. You can also use the wiki media category for example linguistics from Wikipedia. You can also change the source. I'll just take 20 words and then you can just record them. You can also choose a dictionary. But for now I'll just enter some words manually. So for example say fruits, apple, cherry, orange, lemon, lime, pineapple. There's the possibility to shuffle the list. For example if the list is in alphabetical order you will have risks to have a list intonation which is not wishable. So shuffling the list may help to have better audio in the end. And make sure you're recording in the correct language. When you did recordings with several languages it's quite easy to make this mistake. Let's say record the words that are actually in English and pronounce them in English but with the wrong language set. That could be a pity. By doing this I just erased the words. Let's go to the next step, the studio, the recording studio. So you just have to press once to start the recording and then you will see it will just trim between the words and it's really easy to understand. So I'll start once and I will just stop when I'm the end of the list. Orange, cherry, grape, apple. So I did nothing with my hands it just went through the items of the list. I'll go to the next step. So I have the feedback in my headset and I can just decide to antique a recording if I don't like it before sending them to counts. I could also go to the previous step if I want to re-record one recording. If the recording failed, one of the recordings you just have to retry and usually it's enough to pass through. Okay, so let's check the recordings and comments. So as you can see they're here and they have all the information about the language that it comes from Ingo Libre and there's a link to my QID item on Ingo Libre. Also they're put in categories and comments. So you have here all the recordings from Ingo Libre in English and you also have them by speaker. For example, you have all my recordings here. Okay, I think it's okay for demonstration. Let's go back to the main presentation. We'll look at the impact of Ingo Libre. So we'll talk first about Ingo Libre community. It's a worldwide community. As you can see, we have speakers on every continent, mainly in Europe. It's because it started in France. We had many workshops that took place, unfortunately, before the COVID crisis. So all around the world, like in Canada with Attica Mech at Wikimania. Some recordings in Cameroon for Wiki in Daba. And on the right you can see the team of Wikimedia France and some contributors of Ingo Libre. Excuse me. Looking at the statistics of Ingo Libre out of more than 500 recordings and more than 120 languages. We can see that French language is still more than a third of all the languages of all the recordings. And that's more than two thirds of our speakers are male. So the road to diversity is still long, but we have hope and we want everyone's voice to be heard. Looking at the statistics of the total amount of recordings, we passed the 500,000 recordings last month. And as you can see, there's almost 100% growth rate each year. So the total amount is almost doubling every year. Then we will look at the use, how the recordings are used in and out of Wikimedia projects. So on Wikimedia projects, they can be used to illustrate words or names on any project. So, for example, names on Wikimedia, but words also on Wikionaries. They're used on Lixims, on Wikidata. Anything you could think of, for example, on the left. It's a dictionary from the Summer Institute of Linguistics of Caccikel. And some contributors just added recordings made with Ingo Libre. They added it to this page of the dictionary. On the middle, you have the French dictionary entry for the word for it. So there's plenty of pronunciation there. And on the right, you have an example of an exam with the pronunciation by one of our most prolific contributors, Tito Dutta. Outside of Wikimedia projects, the recordings are used, for example, on Mozilla's speech-to-text engine, which is called Deep Speech. I'm sure they are used as part of the French version of Deep Speech, among other datasets like Common Voice. They are also used on some dictionaries like Le Dico des Ados, which is a Libre dictionary for young people. So some contributors are recording words on Lingué Libre, and then they are using it on this dictionary. Also, the Dictionnaire des Francophones, which I'm working on as part of my residence, will also soon be including some recordings from Lingué Libre to highlight the accents of the French-speaking world. On the right, you have what it should look like. And there's also plenty of perspective like language learning apps, flashcards, for example, anything you could sync up with our recordings. Do you have questions about this presentation before we go to an interactive session? I'll just go back to the streamer. How does the application handle regional dialects in a different languages? Actually, there's a possibility to add any new language as long as it has a Wikidata QID. And for example, let's take the example of Occitan, which is a language spoken in some parts of France, mainly. There are several dialects of Occitan. For example, Langdossien, sorry, just click on the wrong. So here's the Wikibase of Lingué Libre. If we look at Langdossien dialect, there's a QID like other languages. So instance of it's just a language slash dialect so anything can work. And it has its own ISO 6393 code. And the recordings are categorized in this category, which is also part of the Occitan category. So indeed there's the possibility to have information about dialects. Does the bot make any kind of pre-processing? I'm not sure. I understand exactly the question. I'm sorry. The bot is working from the Wikibase. So it's just fetching the latest recordings, checking if they are already on, for example, the dictionary page. And if not, it will just add it. It works twice a day. And its maintainer is Poslovic, who presented Lingué Libre yesterday during the Agathon. How accessible is the project for someone without a high quality mic? We noticed that even using some recent phones, like smartphones, can really end up with a good quality. So as Lingué Libre is accessible on mobile version, you could record it from your cell phone if you don't have a high quality mic. The most important is to have a quiet room, like no noise from the street and from around. No television turned on in the background. But then there's no problem with using a cell phone or just your default built-in laptop microphone. So I think that answered how to record from mobile phone. You just have to go to Lingué Libre.org on your smartphone. There's no mobile app, but it's a mobile version of the website, which is quite accessible. There's a question in French, so I will just take the time to read it in French and translate it. So what's our experience with other linguistics communities from minority languages? There are recordings that are made from linguists, I think. We talked about the projects to some linguists, some fieldwork linguists, but I'm not sure they did fieldwork linguistics with really minority languages. But when we work with these communities, indeed, we mainly use, for the moment, short lists, like Swedish lists, and not full stories, as you mentioned. Even though Lingué Libre can record texts, it's mainly used for short words, phrases, and maybe sentences, but for the moment it's more like lists of words, and it's really efficient for reading lists of words. So the Swedish list, or I don't know if you have a lexicon of fishing in a specific language, you can just help the community reading it. But of course you need someone to rely on to make this list first, because sometimes speakers don't know how to write in this language, so that's kind of a problem we're facing with Lingué Libre. So that answered another question. You can record wiki articles from Lingué Libre, you just have to set the threshold of silence that triggers the voice activity detection. It's set on, it's really short as we're regarding list of words, so you can just put it to five seconds or ten seconds when you're reading full text, so it will not cut in the middle of your text. Are there checks for trolling? What if a malicious agent records a non-existent word? Actually, that's also a problematic question, and we are just beginning of treating this like treating all the types of issues, in fact that we can encounter with Lingué Libre, so we're kind of trying to make a patrol on Lingué Libre. Sometimes it's just we don't have much vandalism on Lingué Libre for the moment, hopefully, but we have people that misuse the software. For example, as I mentioned, they record words in the wrong language. It's quite easy to make the mistake. I did actually one time, I recorded 100 words or 1000 with the language set on finish instead of French, and that causes some problems like for you have to rename all the files on comments, and we're figuring out how to handle that. Actually, yes, it also deals with sign language. Sign language is so French sign language, any sign language you can think of. If you just select sign language, the record wizard changes, and it will ask the permission for your camera, and you will just be able to record words in sign language, and I think it works. There's no detection, so you have to cut between each sign you're making, but then the videos are uploaded to comments just as audio. What if a word is already recorded? You mean by a speaker or by the same speaker or by another? If it has already been recorded by a speaker, and you still enter it to Lingua Libre, if you already record it, it will just reset the file on comments with a new version, so that can be useful. If you mispronounced, you just have to put it back to Lingua Libre, and type it again, record it again, and it will just update on comments. If it has been recorded by another speaker, that's the main goal of Lingua Libre is to show the diversity, so it will add it to the dictionary or any project, at least on comments, and you will have a diversity of pronunciation for the same word. There is an interest in atypical recordings, indeed. For example, you say you're French, but you live in a German speaking part of Switzerland, so you probably have an accent like anyone, like everyone, and it's interesting, so you just have to type the place where you learned your language, your languages, and it will be interesting to hear how you pronounce words. Yes, there's an interesting question about words that are written the same, but pronounced differently with different meanings. We handle that with a parenthesis, so you type your word, like, for example, the letters L-I-V-E, so it can be pronounced live or live, depending on the word. You just have to type live, and you have to precise the meaning between the parenthesis and the bot. Lingua Libre bot will still be able to manage and to put the file on the wiki. The bot that added pronunciation, I don't think it's broken, but if you have precise questions about the bot, I invite you to ask them on Lingua Libre bot on our website or to ask them to Boslovic, which is the botmaster of Lingua Libre bot. I think it's working, but it's only adding recordings on wiki.data if you used wiki.data query. I forgot to mention that during the demonstration, but you can use wiki.data queries and PetScan queries also to generate lists of words to record. If you used wiki.data query, it will just automatically add the recordings to the current Spending Lixing. I don't know if there are new questions. So proof-recessing follow-up. There are several easy ways to improve audio. Denoising. Does the bot pair from any of those? Actually, it's the software that processes the audio. Sorry, I misunderstood your question before. It's the software that does that. The bot is only modifying wiki pages to just take the recordings from comments and adding links on any wiki you want. How can we reach Lingua Libre? So the address is lingualibre.org and it works like any wiki. And if you want to contact us, there's the chat room. But I'll just go back to the presentation soon to talk about this, to how you can be involved. Some last questions. So any pre-work needed to add pronunciation file to Wiktionary? If you want to manually add it, you just have to make sure your local Wiktionary uses a complete template for audio recordings. So a template that indicates at least the country and the place where the speaker is from. And maybe the phonetic transcription. But if you want the bot to edit, indeed you will have to describe us how it is supposed to edit because it works differently on every wiki. And then we'll try our best to make it work on your wiki. You can also submit a comment on GitHub if you know how to code in Python and you want to do it by yourself and we'll just review the code and put it on the tool forge where the bot runs. It could work in incubator wikis, but for the moment it only works on a few wikis like French Wiktionary, Oxytown Wiktionary and soon some new like it works on wiki data. But so we will be adding during the summer new wikis. We can use non-wikimedia sources for words. I don't think there are special author rights on just a short list of words. So you can use if there's a list like you mentioned on Twitter, you could use it as a source, but you will have to prepare it for the record result. So I'll go back to the presentation to talk about how you could help us participating in Lingua Libre. I'll share my screen again. So the whole content of Lingua Libre can be translated. You can learn more on Lingua Libre at help translate. So there's a part that can be translated on a translate wiki for the interface of the record result, etc. There are also media wiki messages that can be translated, but though you have to pass by the administrators, so you can just submit us the text that we have to put on Lingua Libre. You can also translate, for example, welcome messages, etc. that we use for newcomers. You can help by creating lists in your own language. There's a page that explains, a help page that explains how to create lists. So, for example, if you want to create a list of fruits in English, it can be useful because, for example, if you ask for the category fruits in English from the dictionary, it will fetch the whole category with fruits that you don't even know. So it can be a problem when you want to record a list. And when it's done by humans, it's just the fruits that everybody will think of. But if you want to be exhaustive, you just have to, you have to, like, using the dictionary category will be better. So if you want to make suggestions, the chatroom is yours. And if you want to start technical discussions or just read about technical discussions, technical discussions, there's a tech board, Lingua Libre. You can click on all the links in the PDF version of this presentation. That's already on comments. There's a link in the effort pad. As I said, you can help us improving the code of not only the Lingua Libre bots, but all our code are on GitHub, sorry. And you can submit bug reports and feature requests on Fabricator. It's quite a full Fabricator as many people are submitting the request and the bug reports. So we try to keep it tidy. There's the link to the form you have to fill in to request the bot on your wiki. And yes, if you want to get involved, the best way is to spread the word around you, spread the word about Lingua Libre on your local wikis, and to record your voice or other people's voice. So that's the next step, some advice about recording birds' voices. So Lingua Libre works best for the moment on Firefox. We're having issues on Chrome. It's not impossible to use it, but you might encounter some slight clicks, audio clicks on your file. So we try to figure it out. So please use Firefox for the moment to have full chances of good quality audios. So as I said, mind the noises around you. Turn off, for example, the sound of this session if you want to record in parallel. And you just have to follow the steps of the record wizard or just read the demonstration slides from this PDF version. So it will just explain step-by-step what I just did in the demonstration. And please check, listen your audios before uploading them. So to be sure, to have a nice quality audios. So if there's a problem, you just untick it and it prevents sending bad quality audio or a mispronounced word into commands. And once you send them, you can just listen to them on comments and it will appear on the main page of Lingua Libre. You should. Are there any new questions? I'll just stop sharing my screen. Don't think there are new questions. Would you like to contribute to record a few words on Lingua Libre for the last minutes of this session? If you have any questions while you're recording, you can ask them on the etherpad. And I'll try to give you advice. Would it be possible to have a Wikidata exam generator just adding the language? So I don't really see what you mean when you ask this question. Like a generator that fetches the exams from Wikidata and to create lists of words to record. Yes, that's a good question I see there. If your language is not currently on Lingua Libre, you can just ask the administrators either on the chat room or on the administrator's notice board to add it. And it will just take a few minutes for us to add your language to Lingua Libre as long as there's a Wikidata element for it. And you can also add it by yourself. But it's just that it will take a few steps to edit it manually, basically just like when you create an element on Wikidata. And there's a gadget for administrators to just add the quickly languages just using the Wikidata QID of the language. So if you have the name of languages you want to add to Lingua Libre, you're welcome to put them in the etherpad and we'll try to add them as quickly as possible. It can be right now actually. For example, Hebrew, I think Hebrew is available on Lingua Libre. Yes, it is. The local QID for Hebrew on Lingua Libre is Q397. Are there plans to have Lingua Libre pronunciations auto added to the English dictionary like on the French speaking one? We actually, I think we tried by a few years ago, but some English dictionary contributors weren't happy about the possibility of people to record words in a language, even if they're not native speakers. But that's actually something we can work with because as long as the speaker mentioned his or their level of proficiency in a language, if the community of Wikidata only wants native speakers, for example, we can just change the code for this specific wiki to only upload integrate native speakers pronunciations on their wiki. Or we could also change the local template for audio records to display that's actually what we did on the French dictionary like a few months ago. We added a parameter to specify the level of proficiency. For example, if it's just a beginner in this language, it can still be better than nothing to have their recording if the word didn't have any recording before. Or if you have plenty of recordings for the word, once you heard all the different accents or many different accents, it can be interesting to see how learners accents are from different parts of the world. So the answer to the question about words spelled the same way but pronounced differently is parenthesis. For example, you can, for example, in French, we have the letters F I L S, it can be pronounced FIS, for example, so it's just translated by son, so let's say child. So you just precise the meaning in parenthesis and when you record another one, it can be fill. So the plural of the word fill, which is thread, you can just do it like this in the same language, of course. That's just for the example. So I think we're getting at the end of the presentation. I'll take last questions, but I think there are no questions left. So I'll just say thank you very much for having attended this talk and I hope to see you soon on Lingua Libre. Goodbye.