 All right, we're going to get started here in just a minute. There's pizza in the back, and yeah. All right, so I guess let me show you guys the Cernotron. Basically, so I'm Eric. I work on search relevance and other things related to that. And one of the difficulties with search relevant is doing offline analysis of changes to search results, figuring out if a search result is better or worse, if changes to the search engine are improving things. And so one of the things that we're using to improve search relevance is having human-scored queries where basically humans have gone through query results, the results that are combined from us, from Google, from Duck.Go, and from Bing, and are grading which results should be at the top, which results are useful, and which results are completely useless, like your search is bad if you return these results. So this is going to be at the Cernotron.wmflabs.org. It's just a standard WMF Labs thing. When you go there, you'll see basically this screen. We give you a little information about the queries. Basically, we choose an incredibly small fraction of queries, a couple hundred a month to import into this. These are audited by engineers at WMF to make sure that they don't include PII. It turns out that actually a reasonable percentage, something around 20% of queries actually have some form of PII, whether it's somebody searching for their own name or whatever else. So you go here. If you click Get Started, it'll basically log you in. If you've logged in previously, it's not going to ask you anything. It's just going to send you right to the instruction pages. If you have never logged into this previously, it'll drop you to meta.wikimedia.org. And it'll give you just a little like thing that says allow Cernotron. And basically, that's our login process. It gives you some brief instructions here. Basically, we have four levels of grading. We have irrelevant, maybe relevant, probably relevant, and relevant. The differences here, so irrelevant queries are basically the irrelevant results are basically ones that just have no relationship to the query. You don't even know why these are being shown here. It does turn up quite often, actually. Maybe relevant are things that have some relationship to the query. But you don't think that they're good answers in any way. Like if you searched for San Francisco and you got an article about some random thing that just happened to have the word San Francisco, that would either be irrelevant or maybe relevant, depending on how tenuous the link to San Francisco is. But these are something, just make a judgment call. You don't have to try too particularly hard to figure out which one is going to be. Just something that makes sense to you is fine. Probably relevant are results that you would expect to find on the first page of results. These are things that seem to have a reasonable relationship to the query. They seem like they might have an answer to the query. And then the relevant one is specifically items that you would expect to see as the top couple results, like the ones at the very top of the result page. So if I were to search San Francisco, the page for San Francisco is a relevant page. Like that's an exact match. It's good. Other things that might be relevant, the Golden Gate Bridge might be a relevant match for San Francisco. But again, use your judgment. There are a lot of different queries. So anyways, let me switch over to the scoring interface. At the very top of the instructions page, the button just says continue to scoring. That'll drop you over to here. Basically, this tells you what the query is. Let me actually switch to, so like this particular one, the query that the user had was induction. Now feel free, copy the query, send it to your favorite search engine, figure out what it is. Like induction cooking, you might know what it is. But if you're not familiar with the query, go ahead and search the web for it. Get familiar with what this query is asking about. This happens all the time. There's another query that's in here that's a road built by Marcos and Aquino. And there's a reasonable chance you don't know what that is. But if you were to do a Google search for it, or if you do a .go search, and you look around, you might get a better understanding. And that gives you enough information to start reading results. Reading is fairly simple. We have two interfaces. This is the classic interface. This is kind of an engineering-based interface. Basically, you have the different quality results there. And you get a page title and a snippet of the results. So like this one was induction cooking. This is talk induction heating. It's up to you whether you think it's a relevant thing. Most of the time, though, I consider talk pages to be irrelevant unless the query was actually asking about talk pages. Because talk pages, they don't usually have the answer to your query. I followed you because you would use Wikipedia. Talk pages, they're more about discussing the article itself. So you can click actually anywhere on the thing. Just clicking multiple times cycles it between the different results. So clicking it once to choose a relevant. The next one, induction heater. Again, it's hard to say, but for induction cooking, the induction heater is actually a significant part of it. So we could probably call that a probably relevant result. Or it may be a relevant result. There's a little link right next to it. It's the word link that'll take you to the page. And you can read the Wikipedia page or maybe do a find on page and look around and get an idea of if this thing is particularly relevant. The other interface that we've put together for today, here you have a button that says switch to card interface. This is, we tried to gamify it a little bit. So this is kind of a solitaire for rating things. Essentially, you're going to have, so again, this is induction cooking. The electric stove is a relevant result. So you just click relevant over here. It'll drop the card over there. Induction heater, inductive heater, is a piece of the thing so that could be maybe or probably relevant. But so you can just click around them and cards will move around to wherever they go. You can, if you're not sure, you can drag cards back up to the top, yada, yada. You can switch back and forth between classic and it'll remember what you had scored so far. When you submit the results, for this interface, the button is all the way at the bottom because we would like you to score at least 80% of the results. Actually, if I were to submit this now with less than 80%, you will get an error at the top here. It says, receive 20 ladings, at least 28 must be rated. That's because we're requiring at least 80%. If you don't know what 80% of them would be, you're not sure if the query's too hard, there's a skip this query button. So you just click skip this query and we will give you a different query to rate. We'll never show you that one again. Some queries are hard to rate. If it's hard and it seems like it's going to be too much effort, feel free to skip the query and somebody else will be given that query to rate. We're basically, the code behind the scenes is going to basically give out each query to two people. And if you don't finish it, it'll just give it out to somebody else. And so our goal is to get two ratings for each query that we put in here. Actually, we would like to get five, but we're starting off with two. And if we can get everything scored, we might do. And the idea there is that we want to be able to average together because people are going to have different ideas of what's relevant, what's not relevant. And having two judgments on the relevance of things helps with our process. So that's discernitron. That's basically what it is. Again, the URL is discernitron.wmflabs.org. I can go back to the first page so it's easier to read. Well, it's not much bigger there, but the spelling is right up here at the top, discernitron. Are there any questions? Also, we're asking for the rest of this thing. If you guys would try either on your phones or on your laptops, rating things. I didn't show you, but this actually does. So the card-based interface doesn't work too great on mobile, but this one does if I know how to make it. There we go. So when you get this one on mobile, it basically changes the interface a little bit. You stop getting all the things. You can still just click on it and it'll rotate in different scoring levels. So irrelevant, maybe probably irrelevant. So we're hoping for the rest of this. People can try rating things. There's going to be myself, Dan, Deborah, are going to be around here for any questions you might have. We also love to get feedback on how easy or hard it is to use the interface. If you have ideas on what could be done better, what could make it easier to use, what things got in your way. Our goal here is basically to get feedback on how to make this a better tool, how to make it a tool that people might be willing to use, something that maybe on your train ride into work you could score one or two queries or something like that. Yeah, so that's all I've got for the presentation. Enjoy the pizza and feel free to ask questions. We've also got a here remote. I don't know if anybody remote showed up, but Tray is on the Google Hangout. He's a relevance engineer in Discovery, and he'll be answering questions for anybody either in the Wicca Media Office channel or on the Hangout itself, whatever's comfortable for you. That's awesome. Thank you. Yes. Yeah, please. Have you thought about making it a ranking interface, to drag the results into the order that you think would be best rather than selecting one of the four categories? Because just having done some, it does seem like it might be a bit easier rather than saying, is this result probably relevant or maybe relevant? Might be easier to say, is this relevant more relevant than these other ones? No, actually, I haven't thought about that. That's not a bad idea. Most of this, we base this very loosely on, there are some guidelines out there. So being in Google and other places, they do something similar to this, although they pay people to do it for them. You don't kind of have a budget to pay people to do this. But so there's documents out there. Google has a 150-page PDF about how to use their scoring interface, with lots of examples. And this isn't nearly as complex, but it's the same general concept of having the things. But that's a good idea. That's something that we can look into and think about, definitely. Anybody else have questions? All right, awesome. Well, enjoy the pizza, and if you'll try it out and ask us any questions you have, I'll leave this up here in case it helps anybody see how to spell the Cernotron. Brendan. That'll be the WF-guest at work. Oh, it's one edit can make a difference, but the one is the number one, and then the rest of it is all lower case with no spaces. OK, we're going to stop the recording. So thanks everyone for coming. If you've got any questions, feel free to get in touch.