 Hang on just a sec, let me share my screen. So in Discovery we've been looking at language identification as a way to do cross-wiki searching. In particular, you know, if we have a query that doesn't do well in terms of number of results that it generates, fewer than three, what we've been going with, then we want to look and see if it happens to be in another language. We have looked at some of the data and this happens relatively frequently. And then we want to, you know, we have a tool called TextCat which does the language detection. And then we can use that to search another wiki, you know, if we determine that the query is in French and they're on the English Wikipedia, we can get the user results from the French Wikipedia and present those if there are any. We wanted to make a tool so people could mess around with TextCat and do the language detection and see what's happening and try it out for themselves without necessarily trying to find a particular query that, you know, doesn't get any results in one wiki but is in another language and so on. And so the link is in the etherpad. I'm going to be using a local copy which is a little bit snappier than the ones in Tools. But basically you can go in and you can choose the languages that you want to consider. There's a default set here. And then you type in a query. And then it gives you results. In this case it's unambiguously English according to TextCat. The scores are relative to each other. They're actually cost functions which means that the lower score is better. And they vary wildly depending on the length of the string that you're trying to identify the language of. And so we normalize them here for the demo purposes. Anyway, the demo itself is relatively easy to use. You can select the different languages that you want to consider. So if we turn off English here we can run it again and then if it's not in English it looks more like French and a little bit like German. We have a number of models that are available. We have models that are made specifically for detecting languages using text of queries because the language that people use in queries is different from just normal text. We also have some models that are based on wiki text which is more formal. We have a little initial set of languages here but there are a lot more that people can use and play around with. There are more for the wiki text language models than there are for the query models because the query language models are harder to generate because it requires some manual review and work to filter out queries that are actually in the language that you're trying to model. For the wiki text we just pull lots of text off of the wiki in question. And then in addition once you've actually run the query you can just click and you can see the results that would be on that particular wiki. So there's French and then there's German. We know that this is actually, let me get rid of these, it's actually in English. This is not a particularly useful query but another thing that we have that I really like is that we have the ability, we have some demos built in that sort of walk you through the features and the limitations of TextCat and it goes through, it selects languages, it runs queries. This is a Europanto which is a weird sort of made up language. It's just a mix of any European language that you'd like so it's really, it's just random words in different languages. And then the demo walks you through that and you can learn more about how a TextCat works and try some different variations there. I'm just going through this really quickly to sort of demonstrate how the walk through itself works. And we're hoping to get, this will allow people to mess around with TextCat, try it out for themselves and give us more feedback about how we can use this language detection in improving search results. Cool. That's it. Cool. Hey everybody. I'm going to show you something that we have planned for the Wikipedia.org portal page. This is going to be fun. The entire screen sharing, you guys can see my screen. I'm going to assume that's a yes. Yep. Okay. But okay, what's different here? First you might notice we took away the slogans below each of the links over here around the globe and pair that with what we got right now. Yep. No slogan. One slogan over here. Now why do we do that? Well, this is the first part of our effort to get more translations, get the page translated at all. So what we're going to do is that we're going to try to detect the user's browser language client side and then rearrange these links around the globe to reflect the user's preferred language of their browser. So right now I just had the default setup here. I just have English, but if I go into let's say Chrome settings and I don't know, let's try. Kazakh is cool. So let's say we just add, nobody really does this in real life I know, but people around the world, they download browsers in different languages. So the first language would be already set up for them and we reload the page then. Oh, we got a link here. And now we have the Kazakh Wikipedia as the first link over there. So this also works for if people go crazy, go nuts and have like multiple languages set up. Let's move to the top just so we can see the logo if it has one. Oh, yep. There we go. Looking pretty good. And yeah, right now we pulled these translations from the Wikipedia's official subtitle page, but we will get them into Translate Wiki so that the community can translate and offer slightly more appropriate translations if they're needed. So yeah, that's what we have planned for the portal page for now. Oh yeah. Thanks, John. Dimitri. Hey there, everyone. So this is a 10% toy project I've been working on. This is a web app, which is a bit of a departure for me. I've been pretty entrenched in the native app world. I feel like it's time I caught up a little bit on web development. Anyway, one thing that I'm interested in is digital preservation. And part of that means being able to read and process file formats that may have gone obsolete, or if you have a file that was created by some software that's no longer maintained, things like that. So what this does is it provides a framework to read files, local files on the user's PC, and break down the format of the file and do whatever we want with it. So I'll show some examples of what I mean. So this supports drag and drop via HTML5, so I can take a file from my desktop and drag it onto here. So I'll drag a JPEG file to begin. I dropped it on here. And the first thing you see is it displays the actual graphical contents of the file, and then some vital information about the file, Wikipedia link to the file format of JPEG. But then you get this detailed breakdown of the actual file format, which in the case of a JPEG file includes the XIF metadata. This is all done locally using JavaScript. We have the camera make and camera model, and both of these are automatically wiki linked. So if you click on this, you get the Wikipedia page for that model. All of the attributes in the metadata, when you get to the thumbnail, the thumbnail is processed as its own image, and the JPEG structure of the thumbnail is recursively passed into the JPEG format module, so you get the structure of the thumbnail itself. So that's kind of what I'm talking about. So this supports things like raw camera formats. I have a raw file from my camera here. I'll drop that here. And because raw formats are usually TIFF formatted, this uses pretty much the same processing as the XIF data from a JPEG file. So you get the thumbnail and all kinds of metadata along with that. So let's see, we can drop an mp3 file onto here. That's an mp3 file, and you get the album art of it from metadata. And there's ID3 information. You get the album name and the artist name. All of these are also wiki linked automatically. And of course, there's a JPEG structure done recursively of the thumbnail image. So what about file formats that are obsolete? So like for example, in the older days of UNIX, there was a graphics format called PPM, Portable Pixel Map. Not used at all anymore, but I have a few examples of PPM files. So it's like a drop on here. This is a PPM file, and this can process it. So this builds a bitmap from the file and loads it into this canvas element, HTML5. So this way, you're able to load old graphics formats in the browser, even though the browser doesn't support it. Another obsolete format is PCX from older days of DOS in the 90s. There's a PCX file, things like files that are not obsolete, but still not supported by the browser. For example, in medical imaging, they use a format called DICOM. And I think I have an example DICOM file here. There you go, it's from an MRI image, and we've got some metadata from that. Let's see, recently someone came up with a graphics format called BPG. That's supposed to be better compression than JPEG, still not supported by the browser. But we can load a BPG file, and there it is. So anyway, this is all done on the client side. And the way this is working is when you drop a file here, it begins by reading the first few bytes of the file. And based on that, it can guess the format of the file from the signature bytes in the beginning. And when it knows the format, it loads the correct JavaScript module for reading that format and passes it through that module. And the output of the format module includes this kind of detailed breakdown at the bottom and any kind of graphical representation that we can load into a Canvas element. And so this could have applications in not only preservation, but something like steganography or forensics, you know, if someone hides a message in the metadata of some file format, this can reveal it. And I tried to make this extensible, meaning that make it easy for someone to create a new format module, and it's just like that to be able to support some file format right there in the browser. And that's it. Hi, Steven. Hey, everyone. I'm Steven. I'm an Android developer. Hopefully, everyone can see my screen, are they? So back in the day, there was a program called eSpeak, and there still is a program called eSpeak. It's about 10 years old, and it's written in Z-code, and it does text-to-speech for you. So I'm going to unplug my headphone jack for just a moment and apologize if there's about five seconds of echo, but I'm going to try to set it. Potatoes. Potatoes. So that's the kind of – I hope folks were able to hear that. That's the kind of thing that eSpeak does. It takes a word like potatoes and says it out loud. eSpeak also has support for IPA, the international phonetic alphabet, but it takes kind of an odd ASCII-5 format, not the conventional eSpeak format that you'd see on Wikipedia pages. The other problem is, as I mentioned, eSpeak is written in C. Well, there was a project called M-scripten that takes C-code and converts it to JavaScript code, and folks who play around with this sort of thing have done that with Quake, for example, the video game Quake. There is also an eSpeak variant that is kind of in development. I have a local copy that I had to make some changes to. Probably I'm just not using it right. But the long story short is that you can actually run eSpeak in your browser, or if you're using a web view like the Android app does, you can run it in app. So I started working on providing text-to-speech for IPA, the international phonetic alphabet pronunciations, and I wanted to demo a couple of those. It's still very early in development. I did a little work on this recently at a team get-together, and I'm hoping to do some more work on it in the future to make it the mapping more robust and just the overall development commit-worthy. So here's a few demos of that. I hope folks can kind of see I've got a window into my Android device looking at Barack Obama, and if I click on the little IPA icon here, so audio doesn't come through unfortunately on this program I'm using, but I'm going to crank my phone way up. Let's hear that again. So I'm going to tap the little IPA icon again. Hopefully folks could hear that. Otherwise this demo was going to be really boring. So if it didn't come across, there was text-to-speech performed to say Barack Hussein Obama. I'm going to do a few more examples, and this next example is good because it demonstrates something that doesn't work on here. One of the mappings for doing that conventional IPA to this ASCII-fied version isn't quite working. So when I tap on this, I don't think you can hear it, but there's like a subtle gasp. The eSpeak just doesn't understand what I sent it out, so I have to kind of fix that mapping. The second pronunciation I think works for this page. Honolulu. Honolulu. I'll do one more. So I guess places in Hawaii are good for this sort of thing. I'll do. Hopefully that was correct. If not, then I'll have to work on the translation logic or maybe fix a few pages. Anyway, I hope that this will be coming soon. Thanks, everyone. Thanks. Baha? Hello. Let me share my screen. Can you guys see it? Yes. Okay, great. So today I'm going to talk to you about my experimentation time project. It's a little plug-in for the program called Lyphria. Lyphria is a Linux feed reader. I think it works on Windows too, but I don't know. What it is, it's basically an aggregator of RSS and Atom feeds that lets you read feeds offline when you're not connected. It's like offline version of Google Reader, which is not functional anymore. And I made a plug-in for this program that uses another program called OnCloud. OnCloud is a combination of different well-known programs such as Dropbox, Google Calendars, Google Contacts, and everything that is personal to you. You can set this up on your local environment, on your server, and you can have all your data for yourself, and you won't be sharing this information with anyone else. And this program has an app called NewsApp, and this is actually the RSS reader for OnCloud, so you can save your RSSs into OnCloud and read them in the browser like this. Here you can see I created some feeds here. I have GNOME folder, Wiki folder, and I'm tracking some news about the Barack Obama article, which I got from this link. This is the Wikipedia link that gives me the history of edits for the Obama article. So the problem with this NewsApp is that you have to be online every time you want to read some news, but sometimes that's not the case. So I plugged this NewsApp into the LifeRia offline app, and now I'm able to read articles offline. So let me show you how this works. This is the LifeRia app, and as you can see, there are some default feeds that you can click on and read. Now we're going to add OnCloud. Oh, by the way, if you have seen it, I have set up this app on my local host, so I'm going to add this URL to LifeRia. So we're going to create a new source. It's called OnCloud News, and username, password, and the URL. So it started getting the data, same folders, GNOME, Wiki folder, Obama article is here. It's Obama feed, I mean, there you go. I think it's missing from something, because these don't look great. Let me restart this one second, OnCloud. Yeah, something's wrong. I was testing something else, but anyway, so for OnCloud, right now, you can see that there are no starred items, and here you can start some of them, like three items, you go back and refresh. You see three starred items, so it syncs back with the OnCloud app, and same thing with red items, like OnCloud here is bold-faced, which means it's not red yet, so I'm going to click on it, make it red, so I've read it, it disappeared, so I'm going to go back here and refresh it, and you can see it's not bold anymore, so it's red. There are many improvements that I need to make. The code base doesn't look good, but it's a good start, and I think this will allow me and others to read articles offline. So that's my demo. Thanks, Baha. Does anybody else have a download that I'd like to share? Okay, I think that is it for today. Thanks, everybody, for your demo. Have a nice day. Thanks, Sam.