 Hi, I'm Sebastian. I'm a developer at Wikimedia Sveria, the Swedish Wikimedia chapter, and I'm going to talk a bit about creating a tool that lets people listen to Wikipedia, which is something we've been working on over the last few years. So the Wikispeech projects, there are two main projects that we've had recently. The first one, the initial project started in March 2016 and finished in September 2017. It was funded by the Swedish Post and Telecom Authority, PTS, and the goal was to create a text-to-speech solution for Wikipedia, which is to say, have a tool that reads the text out loud for you automatically, sometimes called TTS, which I will be using in this presentation, or speech synthesis, which is hard to say. The second project was a follow-up project, though it was somewhat separate. It started in September 2019 and finished earlier this year in March 2021, also funded by PTS. And here the goal was both to create a tool for collecting speech data and also finishing up the work that we haven't quite finished in the first Wikispeech project. I'm going to start by talking about the text-to-speech component, and that is the main subject of this presentation, and then I'll come back to the speech data collector towards the end. The goal for this tool was to create a text-to-speech tool for Wikipedia, so it's geared toward Wikipedia, but we tried to still keep it open for other Wikimedia projects and also have as much parts as possible reusable in a wider context. It should be easily available for everyone. You should need to install anything or have a powerful device to use it, and it should be open source as well, of course, like everything else for Wikipedia. We wanted to have it extendable to multiple languages, so we made it modular so you can add different languages of different text-to-speech software when you find them or when they get made, and we also wanted the community to be able to improve as well, just like Wikipedia and the other projects. The motivation for the project was that there is a significant potential consumer's uses of Wikipedia that prefer or require listening for different reasons, such as visual and cognitive impairment or lack of education, which means that reading is either hard or in some cases impossible. We wanted to make a freely available solution. Of course, it shouldn't cost anything being on Wikipedia, but also it shouldn't do anything else that a lot of free, well, solutions that don't cost money do like collecting data and such. And it should be tailored for Wikipedia, that is to say the format, the articles, long text articles most of the time. The implementation that we went for was an extension for MediaWiki, which is the platform that runs Wikipedia and the other projects, and well, most of the other projects, as you say. And an extension is like adds extra functionality in some way. There are quite a few of those running. I checked earlier on English Wikipedia, for instance. There are more than 100 of them currently running that runs for all users, adding a little, improving the usage in one way or another. And the heavy work should be done server-side and not on the client. As I mentioned earlier, you should need a powerful device to be able to do this. Just a modern browser that's enough to run Wikipedia should be able to also run Wikispeech. And we wanted the community involved as well, of course. And the way we wanted to do that was with a lexicon, a pronunciation lexicon, which is a part of the tool that has transcriptions for words, which helps the text-to-speech to pronounce things correctly, which is especially useful for foreign words, loan words, names, and such. And these should be edited. Community members should be able to edit these to improve how it sounds. And also, we wanted a fair amount of caching in various stages. Mostly, the heaviest thing we do is the actual generation of the audio from text and saving those search runs so that next time you listen to a page or someone else comes to listen to a page, they're already there. They don't need to be generated again. They can just be sent of network. It means it's quicker for the user and it saves workload on the server side. We ran into some difficulties. One was finding good open text-to-speech engines that we needed to actually generate the audio. It was also lacking documentation when it comes to deploying things on Wikipedia, which was our goal from the very start to have this as a feature on Wikipedia after, of course, it's gone through the various testing and beta features and whatnot. And also, unfortunately, in the end, Wikimedia Foundation were not able to support adding this, adding Wikispeech to Wikipedia. The result after these two projects were that, as I mentioned, extension for MediaWiki, that anyone who has a MediaWiki can download and install. It also requires a text-to-speech service called SpeedShoid, and that's what does the heavy work of the generating the audio. And that is fairly decoupled from Wikipedia, so it can be used in a wider context as well. And since we ended up not being able to at least yet install the extension on to Wikipedia, we had to make a few last-minute changes to figure out how we could try to have that run anyway. And we ended up with a gadget solution. So anything that runs on Wikipedia, nothing needs to be installed on the server side, it's just things that runs in the web browser in a client. And I will show a bit of that work in the next slide. And then we also have the interface where I did the next one I mentioned earlier. It is usable, we're using a bit of development, but it needs a bit of love and a bit of polish to be ready for general use. So the gadget I talked about, I'm going to briefly go over how the workflow is both with the extension in the intended way and then how we ended up with the gadget. So this is very simple to put how the extension would work. You have the web browser that talks to the wiki, let's say it's English Wikipedia, and if then the wiki speech extension were installed, then that would communicate with speech to get the audio and set it back to use and present it and all that. But since we could not do that, we ended up doing something like this instead. And as you can see now, there are two wikis in the picture. So the wiki from earlier, that's English Wikipedia, was in this diagram called consumer wiki, and then we have the producer wiki. And it's a producer wiki that has the extension installed, not the consumer wiki. So what this means is that the web browser, it gets a tiny bit of gadget code from the consumer wiki and then fetches most of the resources from the producer wiki, which then does the communication just like the only wiki did in the previous example with speech to get the audio and all that and presents it, which means there are a few extra steps, but it's not widely different from a user experience point at least. There are a few things that works a bit different. There are a few things that we haven't quite gotten to work at, but as a whole, it mostly works. And luckily we were able to reuse most of the code so we didn't have to do that much work to get it to this stage. So what we have now is a producer wiki and this speechoid service hosted by us at wikimedia, Swedish wikimedia. And it's hosted on a pretty small server, which is enough to run it for now at least, but we'll see how long that will last now that we started testing because we had done that on Swedish Wikipedia, which allows to get some user feedback and find more bugs and also test the system under heavier load than we've been able to do before. And currently we don't have a developer, a project for developing wiki speech for that. We have an internal maintenance project, which is a bit smaller, but it's enough to at least keep track of the bugs recording them so we know what to work on later and at least fixing the worst things hopefully, things that may break that we hadn't encountered before. And for the future, we're looking for project opportunities to improve on what we currently have. And before I go to speech data collector, I'm going to give a quick demo, a very quick just let you see and hear what it's like today. So I'll go over here and over here we have English Wikipedia. As you can see down here, there's an extra little panel. It has the player buttons, so you have a play and you have a skip back and forth by word and by sentence, a few buttons for help and feedback and some settings. I'm just going to take the today's article and we're going to listen to what that sounds like. So I'll start again and I'll just press the button there. Ring, oozle, she's a bird mainly in Europe. The ring oozle, tortoise, tortoise is a mainly European member of the French family. It highlights what's red. It has a medium sized brush. The male is predominantly female by sentence. In all but the northern most in the world, if I want that. And I can also, if I want to listen to a specific part of the text, I can select like so and get a little pop up there and I click that one. Let me just load that sentence. And that's very briefly what it's like. If you want to listen to a bit more, there's a link to a demo wiki at the end. I'll show you. So that's that part. Let's go over quickly the second component, the wiki speech data collector, which was part of the second project and the goal here was to create a tool for collecting speech data. It should be, we wanted to have it easy to use by the wiki media community. So integratable with media wiki wikis, but not necessarily running on say wikipedia since we had trouble with that in the past. And also we wanted more tools to enhance and reach the data like annotations. And such things. The motivation for this was that for speech technology applications, such as text-to-speech, but also speech recognition and other things you need, usually you need lots of recorded speech to get a good result. And there are not that much available. There are some projects and initiatives working on that, but it is expensive to collect, produce this kind of data. But hopefully if we could make it too easily enough that the community would be willing to help out with crowdsourcing, that should go a long way. And also that could in turn improve the text-to-speech part of wiki speech. So the implementation here was again the media wiki extension and we have some manuscript generation which makes the recordings more efficient. You don't need to read as much text to have something that you can use specifically for things like text-to-speech. And we have the basic implementation in place like the recording, the storage and interfaces and such, but it's unfortunately not quite ready to be usable as it is. So what we're looking at for the future because we want to continue improving wiki speech is improving the text-to-speech. There are plenty of features that we would have liked to implement but didn't have time for. So doing those things and all the bug reports that may come in during testing and expanding into more languages. We currently have Swedish, English, Arabic and I believe Basque, I'm not entirely sure how well it works but those are the languages we have today and we would like to expand that into other languages especially languages that have poor support for text-to-speech today. And also finishing the implementation on speech data collector which then in turn could help collecting speech data to make at least help getting a bit of a way of enabling more languages for the text-to-speech as well. And that is what I had. So thank you for listening. Leave the links up here. If you're interested to visit any of the pages, there's a documentation and the demo I talked and if there are any questions there are a few minutes left. Okay, if there are no questions then thank you again.