 Bonjour à tous et à tous, je vais commencer ma présentation. Ma présentation aujourd'hui est sur le voie recordant et les langues avec Lingua Libre. Lingua Libre est un projet dans le mouvement Wikimedia. Je suis Hugo Lopez, Wikimedia dans la résidence à l'université Toulouse, en France. Je suis professionnel pour les années 10 et j'ai été actif dans le mouvement Wikimedia en 2004. J'ai été expérimenté en créant l'éducation de l'éducation et la ressource. Et même si vous ne me connaissez pas comme une personne française, j'ai été expérimenté par mon ancêtre, j'ai été expérimenté par la nation Occitane. Et c'est relative à mon speech aujourd'hui parce que je ne parle pas Occitane. Et c'est pourquoi j'ai été invité à Lingua Libre. Parce que la langue de mon ancêtre n'a pas été transmissée à moi parce que des raisons politiques et historiques. Donc pour aujourd'hui, on va parler un peu de la langue de l'université, puis de la langue de Lingua Libre. Je vais faire un démon et puis je vais discuter sur les problématiques et les limitations qu'on a. Donc pour la langue de l'université, à l'extérieur dans la base de cela, vous avez une grande famille de langues, mais vous avez aussi environ 7000 langues across the world. We have 350 Wikipedia, so we just have 5% of these languages on Wikipedia. And if you zoom in, you could see language, 7000 languages, and not just 15 families, 7000 languages. And even in some countries like Indonesia in particular, each valley can have its own language. So this linguistic and cultural diversity should be better documented. So the dimension we are interested by with Lingua Libre is to document languages, but also within a language you have accent, regional accent with different place, different pronunciation like British English, American English, Indian English and the English language should not be always represented by British and American. We should be represented, give visibility to every community which speaks this language. You also have different voices in France, my case study. We have different pronunciation with older people and younger people because the older people, the French language was not their native language. So they have a strong influence from their background language like Occitane or Breton or Alsatiane. We also want to document the diversity of genders, so we want different voices. So for language diversity, the diversity of our language, their word expression voice are probably documented and accessible. So what we want to do with Lingua Libre is to record, share and make visible this expression at large scale in an easy and quick fashion. And that's why we made, we developed Lingua Libre. So for a shortened traduction, Lingua Libre has its background history. It's come from in France for the past 150 years. The government have prevented the local language from developing and from being spoken. So now most of them 20 out of 25 are nearly dead. And five are still spoken on a daily basis. So the French government, under the influence of European laws, started to support our local language. And we created Lingua Libre via collaboration between French University in Strasbourg and Wikimedia France. So we created the recording tools to document first Alsatiane by linguists, by academics. For e-learning purposes. Now Lingua Libre of course is available for all languages supported by Wikidata so we can record all languages on the world. And we even have a Wikipedia page in about 10 languages. So you can find more information on Wikipedia as well. For the concept, the idea behind Lingua Libre is to... So we have languages and their pronunciation and their voice. We want to document them. We want to document them. So we have the speaker. We speak this language. We want to make it face single-page apps. So Lingua Libre, where you can record this language. This create data. So we create quickly a set of audio files that we can use later. And this audio file data can be used by e-service like e-learning or Wikictionary and so on. I will make a demonstration now of Lingua Libre. So basically the interface looks like that when you enter the recording studio. So... So when you go on Lingua Libre you can connect here when you are not connected you connect here with your Wikimedia Commons account. So here you log in. And here you have access to the actual tool. When you click on record, you access the tool. Here you can change the language so you can put in Chinese, Spanish or all the whole set of language. It's accessible on Translate Wiki so you can also translate into your minority language. Once you access the tool, you first do a test. So... So I start the test. Addishat. Addishat. Addishat. Addishat. Addishat. Addishat. Addishat. Yes. When we unplug that work. Yeah. Addishat. Addishat. We will try like this. So when we get here we are at step 2 the speaker. I will do my best for making the demonstration and explain to you. So when we are here we are 5 steps to define the speaker. So this application was built by linguists. So we want to define who I am. So I can use my Wikipedia username. I'm saying I'm a male because for statistics and this kind of present. And here I can add a language so I add Occitan. I'm a beginner and my learning place is Toulouse. This is my place where I live and I'm okay to share this data so I go to the next step. You can add all a bunch of languages but if your language is not in here you can contact the administrator. Here I can add a word. So by example I do means hello and goodbye or I can load here I'm on Occitan and I can load a list of word. So this one which is already in the system which was provided by other contributor. Okay. So now I have a list of word to record so I can speed up my work. I go to the recording phase now we are in the actual recording studio. So we have some shortcut if I want to move faster jump some word. Return to some word if I want to delay or just keep some word so I use the arrow to go faster on the space to stop the recording. So now I will record this. Okay. I press this button and I will start the recording. Aliskera Atardera Bola Kakasbar Consolament Desaaba Desaiba Divagator Engolidor Espira Pinter Goteja Javeliza Marginal Navigatriz Particularitat Polidor Raspaudia Raspaudia Rimadis Cirmenthejar Tolerable Aliskera Atardivada Bola Cascanthejar Consolant So you see the speed of the recording and if we are fluent with this language we can go faster. I'm not fluent. Oops. On va au next step. I press pause. This is a review phase. Okay. This is okay. If this one is not okay, I can uncheck it. So I will not send it to commands. And then I can send my 24 audio commands. Okay. And you can see here, they are sent to commands. So this is directly sent to Wikimedia commands. Okay. So that's done. That will be on Wikimedia commands. I turn back to my presentation. This is the idea. We generate data. We want to generate data so we can send it to commands and then it will be automatically updated to to dictionaries, to several dictionaries French and others. And we can also download this data for example. If you go here, I can then look for Occitan and here I have the Occitan language and I can download it. And there is already 14,000 words in Occitan. And there is also some statistics about Occitan. So and this page is accessible here. Okay. The dataset page. I will go back. Oops. So now we have data to feed some web apps. We already have some web apps. And now I still have a few minutes to talk about the limits. So in the past five years, we recorded nearly nine over 900,000 recordings May default on various kind of language. We have large language like France. We also have a Filipino and this kind of language. And we also have a smaller language like Surui, which just have 2,000 speakers. We also have sign language. That's a new project I will talk about tomorrow. And so the question are different with this language. For example, the smallest language sometimes don't have writing system. So that's harder to record and document. But on the statistics side, we made some data visualization of our progress. And we can see that most of the language, of course, are larger language. But now we want to, we have to and we want to move to the smaller language. And of course, we still have a base with Europe, which is very strong, especially France. India also has a strong community and West Africa, which are the three plus active. Although one of the shortcoming we have is when we send the audio automatically to the dictionary. Right now it's mostly sent to the French dictionary, Kurdish, Swaley and ODI dictionary. But we need to make more bots to send to other dictionary like the Chinese dictionary and so on. So there are in terms of quality of the data, we have between the large language with larger population and the minorities. And we also have huge base with 90% of our audio which are still male. And of course, that's really younger people but we want elders to contribute because they have different accents. We also have statistics for the coverage of major language, large language and minority language. The largest language are covered at 2.3 so Polish, American and so on. For smaller language about 2 million people, we start to have a third of them but when we are about marginalized language, of course where there are over 6,000 smaller language we just have 1% of them. So we have to be great there because it's where the linguistic diversity is. And so the objective we have is to create more one of the objective I have at least is to create to to nurture more services like deductionary and revitalization of languages like OXITA and like other minority language. We have some friends from Taiwan there is a great project in Taiwan for revitalization and we have to learn from this team from this team in Taiwan. And I think the Wikimedia movement is a key place if you want to document the diversity of language because as you can see outside with the map of the the attendee of the Wikimedia. We are a global network and we are focused on culture and so on. I cannot see. We are focused on culture so the global community of Wikipedian is ideal to reach every corner of the globe. And that's about my presentation and you can contact me later for question and answer. We have time for question and answer.