 Okay. Hello. So my name is Tim Beier. I'm an analyst in the Wikimedia Foundation's reading team. The title of the talk is New Leadership Data and it's just a couple of things with an emphasis on things we've learned recently about how Wikimedia is being read. So it's a bit of a grab act to be honest. It's not one overarching topic. A lot of these things are by-products of things we're doing in the reading team to in product development. So for example we might add instrumentation to look how various interface elements work are being used. We could change them but at the same point we're also learning how readers interact with the content. And I've also included some other people at the Foundation and by some academic researchers. And last thing as a caveat it's not about page view data. So there's been a lot of research and on our last this I just presented at a metrics meeting last month about our general readership trends based on page view data and there's been a lot of academic research about. So it's not about that. But to quickly summarize where we're there also you can locate the different instrumentation we're talking about. That sub is still a bit more than half of our page views but it's decreasing. Mobile web is half of our page views a bit less but since we have less pages per user on mobile it's actually more people already on mobile. And our apps Android and iOS about one 1.2% of pages. Thus it's also not about this talk is not about new metrics that we're going to present soon on Unix for example. So just get out of the way. Okay so after introduction I'm going to talk a bit about load times which is a work with the performance team already here. Then we look at which parts of an article actually being read. So after the page view what happens and one by position in the page geometrically and then also a bit by topic. And there's some other data sources and also other research that's going on in the foundation right now. Surveys and video interviews. Okay so first topic what how long do readers have to wait. And this is a chart of so we're publishing the median or average load times a lot and see how they develop. I'm so sorry all of a sudden we're having problems switching to your slide deck right now on the AV bridge. Do you mind pausing for just like one minute while I figure this out for the sake of remote viewers. Okay so I hope everyone can see the slide now. I have to repeat for remote is it shows the loading time until the whole page has completed loading. And it's a bit compressed here it's a log scale. I just invite you to look at this and contemplate that the people on the left which have less than one second or even half a second are in a much different reading situation than the people on the right who wait for 10 seconds or 20. I mean there's even some outliers which with several minutes wide and there's a timeout. So this depends obviously on your connection speed but also on page size so I mean longer pages load longer obviously. But what we are interested in is one thing that is of course does it depend on geographic location. And so this is the same chart the blue thing is the same here but it's some of the fastest countries. You see over the Switzerland yellow is here USA. On the right hand side now the slowest countries are India Ghana and Nepal. And you see it's not all black and white there's some very slow people in US and there's some very people on fast connections in Nepal but it's not a lot. And over all these countries differ a lot. Again we are yellow here in the US and there's other countries which are in quite different zones. And you can also plot us in a map which is showing up. So this is a map by connection speed and I really sorry about the colors. I'll update the comments with some nicer colors but yellow or green is the fastest. It's here two seconds. That's yes that some European countries not all like UK, Scandinavian, you see a little dot of Switzerland that we saw on the previous chart. And you can hopefully also see that Netherlands is yellow and that's of course because we have our caching data in Amsterdam. So European countries benefit from Amsterdam. And you see it's pretty much a map of the global south and with some especially slow countries in Africa and with some caveats. This data actually doesn't show up if the page is not finishing loading. If the user waits too long and moves to another page or if the connection times out which can happen too. But you see a bigger picture and this kind of thing really informs a lot of current work to improve things for people on slow connections in the global south. There's a big tracking back on fabricator. Make Wikipedia more accessible to 2G connections. 2G is slow mobile connections. So that's things like lazy loading images or recently removed presenting very high definition HGPI images which are really slow to load on slow connections. Things like that. Okay that's so much about performance. Next part will be which part of an article are we just actually reading and that's something we have been looking at pages all the time but we really don't know what happens after a page or does a page even mean how much of an article is being read. And since one or two years we have some new data source on this and that's what I'm going to talk about in the next and 15 minutes or so. The first data source is something with it with a mobile web team. On the mobile web version of Wikipedia maybe we've seen is sections are collapsed below the lead that means you got to read the lead section and all the top level sections below that are collapsed and you tap on the section heading if you're interested in that section. And then we can ask the question so how many people actually opened this section right and the result is that 60 over 60% or 61 only the lead section is looked at and it decreases rapidly as you can see in this chart. So it's like it's like 20% exactly one seconds and 25 or so one or two sections and they increase rapidly. We have some other instrumentations also done because the product measures wanted to make some decisions about what to add or not. So for example in 2014 we're looking at placement of some buttons and at the bottom of the app I think Android was and I found it in during one week 25% of app users ever saw the end of the page. So that's not the page where that's per user and we that's not allowed right. It also means that a lot of people just stay at the top. Some recent thing for the related pages. These are links that are shown at the bottom of the page and then we had 40% of pages on desktop and 32% of pages on mobile web getting to the bottom of the page with a caveat that these are mobile web users meaning people that actually opted into the mobile web so they're probably more engaged readers and also on desktop they opted into this. And another data server is really interesting is click actions. There's a really interesting data set published about a year ago by the analytics team which mapped click streams meaning how people click from one page to another in cyber Wikipedia what internal links, wiki links. It's on English Wikipedia and desktop. So one finding is you see the child of the right which is rotated to 90 degrees basically. So the left hand side is the top of the page, the right hand side is the bottom of the page. You see that the click rate decreases considerably from top to bottom although a bit less rapidly than you might expect what we see later. Some other academic researchers, so this was done by the foundation with you from Stanford Bob and Leila at the foundation. Another academic researchers team looked at the centers of clicks to the lead section and found that we see between 26 and 43 percent. It's not exact science because links might repeat but that's the estimate. That's also another data point showing us that the lead section is really important and by the way this is all really interesting for editors. I mean a long-term editor myself and it differs so much what part of an article you edit not just which article you edit. On English Wikipedia there was recently an edit contest called take the lead which specifically focused on improving the lead section because also driven by this inside wide audience. What most people read it's really important to get the lead wide. The third result here is that this is a bit funny because normally you can't predict where a link ends up depending on your screen size but I kind of assumed that you have a fixed browser with with a standard resolution and then I found that links on the left hand side are clicked more often. That's also by the way I made us a lot of standard research on web pages with heatmaps or eye-tracking. I'll talk about that later and so that's a pretty standard inside that the left hand side gets a bit more attention and they could verify that using this data set. So another instrumentation about this last one comes from the Android app and the Android app we looked at how what's the lowest point of a page that people see and so this is a really nice plot showing you how it decreases. Of course 100% people see the top and about I think was 7 or 8 on the this Android app per page view see the bottom. Again it differs a bit from the higher numbers we saw on these opt-in readers on desktop and mobile app but that's the app view for you and again it shows topics we see much more attention and should also also much more important to get it right from an editor's perspective. Okay this is another study, a small study that studied scroll actions. So usually we don't track users with this kind of detail. This was a study done with Amazon Mechanical Turk users who basically had every action recorded in very much detail. So they were giving tasks to look up certain questions about diabetes and other medical content I think. You can look up what is on meta-booking but this is one example quick explain what it means. So the horizontal is the time axis you see it's the whole thing took like 12 minutes and the vertical axis is the position in the article and the gray thing is the view pane, the view part so people what the part of the article that people see. So this person started out on the main page that's Violet and at the top then I went to the article right diabetes and jumped down to the middle of the page right there for a while then the pen we could have lost here that jumped up and down a lot. But another part, another part, I was caught on a bit more, went to another article and just read this one section, could look up what it was actually. I didn't went back to previous article to do about the same part of reading beforehand, moved up a bit and moved to two other articles and yeah this is just one case. Not a systematic conclusion but it was really whatever we like about this is that it drives home the point that maybe the artists are not being read linearly right from top to bottom and this is probably a pretty typical use case. Okay so the next part we can also ask what kind of content people are interested in. So this is somewhere limited data but with the same data set we had earlier that tracked the section expansions. I also looked at some examples and some specific articles and that's kind of funny because it tells you what part is most interesting here in the article about Barack Obama. You see it's a lot, it differs a lot the attention between different sections and not a lot of people are interested in his legislative career before becoming president or in past campaigns but there's a lot of interest in early life and career, family and personal life and maybe that's a bit disillusioned but yeah I wouldn't say it's tabloid interest but and that's definitely some difference between these and by the way also it's not surprising that notes and reference don't get a lot of clicks because you can click on references within the article but it's kind of surprising what's also seen other articles that external links don't get a lot of interest on the mobile web. And another example, I have a few more including Donald Trump for example. And by the way this needs this needs a lot of traffic for this to work so if you have articles that you are interested in I'm happy to wonder where it just needs a lot of pages per month to make this statistically significant. So let's say I dig about second world war and we look at it again the kind of clerical sections at the bottom don't get a lot of intention external links reference situations notes here also. I can see that chronology and cause of the war so like the timelines of seems a bit more popular than like the analytical background people impact but again this is just one example there's others and this would need a more wise with systematic analysis but again this is like one of first times really see which parts of the article are most interesting for readers. Like I said before there's some standard methods if you want to study that kind of thing on other on the web and one method is eye-tracking but you put a person put on some glasses or cameras pointing to their pupils you see which part of the screen they're looking at. This is really a trick I mean it's standard but it's pretty tricky to do in the expensive equipment and need to train people who are administering this. We haven't been using it at the Foundation. I think Abbey said that she's done in a previous job but we don't have the capacity to do that here but there was one German team of academic researchers who use that for a bit on the German Wikipedia. Yeah that's that's just an excerpt but somehow that's defined in lookup tasks so I talk about it more later but we assume that readers come to Wikipedia with different needs but someone to read whole articles someone to just look up one specific fact and look for that within the article. So in that category we just spend a lot of time according to this eye-tracking study by scanning the table of contents and lists that you have in the article and the other category they use was learning and in that category they saw a big difference that people spend much less time less than 10% on the table of contents and on lists. So another paper that published was again it's about the eye movements and I'm not going to go deep into this but usually you look at fact fixation points and movements between the term for the saccades and just looked at what points in which sequence readers are focusing on, classified them by for example images or like table of contents as well. Tillman on your World War 2 slide what was on the x-axis? People on IRC want to know. Okay that's a frequency how often these are clicked but I didn't put in absolute numbers because it's just about a sample but if you sum again for every section you see how many clicks expanded, tabs expanded that section and that tells you how popular, how much interest pure were in the content of that section and why I don't know it's just about a headline it's not about content because they don't see that only after opening it. Okay let's go back to the second point here where they looked at the eye movements between these various fact images and text features 39% of content points so that's movements let for image to image i.e. images are really important for navigation right and mutual exit views are next so they will form one image to the subsequent one the text contact point was 37% so also important but a lot and introduction that means so I don't have time to cut it but I think that means that at the beginning they tend to focus on an image. Yeah that's our offense might also time wider because I'm already 40 we give me a research newsletter that we're publishing every month. So a few moments I just want to mention other research we're doing. Abby just came for her part. So one thing with the reason he was to actually interview real in-life readers in the global south in Mexico. So when you went to Mexico and interviewed a lot of them you can read details on our media wiki.org so that's qualitative research right that's kind of complementary to the quantitative research we're doing here and it helps you to yeah to maybe find things you would overlook and quantitative research and that you can then also later test again and similarly you have to in a series of reader surveys using the quick service features first so the there's a new subsection that can where you can put in little banners inside articles. So the goal of that is to build taxonomy classification of Wikipedia readers and of articles based on how they are being used by different kinds of readers. So what does it mean? So there it is in the classification was talking about why you might come to Wikipedia and have want to learn about something in depth. I'm going to Paris and I want to learn after a minute to read about the Eiffel Tower and it again overview what is here on the chart here all you is kind of a dictionary definition if you have never heard about what is this Eiffel Tower thing and oh okay it's a tower built in 18 something and famous site or whatever so that's also a quick lookup basically and then they looked at isolated fact lookups how high is the Eiffel Tower when was it built and it's a bit what we saw earlier in the the diabetes for example what I asked a specific question had to find the answer in the article or in the eye tracking study whatever looking at the table of content so long to see where in this article might this info specific information be hiding that I'm looking for and so do you see the first result on the right so the fact lookup and I mean the in-depth lookup is important important but it's like the minorities 25% and the two quicker tasks overview and fact about the same and just to mention what they're going to do they are also interested in if somebody tells you what they are doing what situation they are reading Wikipedia at school for example what what of these three tasks might I be doing or the most likely be doing or vice versa and site also look at Patriot sessions so that's the interesting thing I mean I've been various readers surveys before there's a list on meta and what's new here is that I can actually tie that to what these people are actually reading and how they're reading it so that's that's there being ongoing work but I think it's going to be really interesting so that's what it's a Patriot session features meaning how long the session might be how many articles are they reading a lot of related articles and we can we also predict which articles are being related that's a bit similar to the clicks team data set I mentioned earlier which what I forgot to mention that so the real purpose like for the other instrumentation three of the six in data set or was not too often was not to study these negation things but to provide link recommendations but what kind of links are my permissing if a reader clicks to five articles and then ends up on one maybe you should link directly to the fifth article and similarly here but if you see that a lot of people are reading this group of articles maybe you can provide direct recommendations to a person in that kind of task again that's more on that on meta not just want to mention because it's pretty big product going on yeah so again it's been a bit of a grab-back take away is really that I'm only beginning to understand what people are doing inside articles and one big take away where there's especially is that the lead section is much more important right and also that section headings matter at the site on whether people click on open your actually a section to actually read what you wrote or don't on mobile web also on table of contents by the way I mentioned the their performance realization that we have readers with really really differing connection speeds and needs as well so and yeah and the different three different tasks of the beginning to distinguish that we just come to Wikipedia very different needs and tasks I want to accomplish and that's what a lot of people are working on right now cool yeah any questions so well we are in Mexico one of the things that we learned we only talked to 15 people but probably about half said they use Wikipedia for an overview rather than deep learning and I think three people who are in higher education said I don't trust Wikipedia for a deeper learning to like one of the micro biologists and she's studying biology and she said I can use it to understand a definition but when it comes to applying that definition or understanding how the details I don't necessarily trust it because there's inconsistencies in the information and she was talking more about Spanish Wikipedia and I think that the the reader segmentation the little survey that went out was in English I'm wondering if we're gonna do that reproduce that experiment in other languages also because I'd love to see the results in Spanish wiki yeah I need to put a question to let our others who are working on that it's an answer to I think I've done targeted Persian Wikipedia at least but and wasn't also Spanish I'm not sure yeah but it's underrated definitely indeed yeah coming from different angles other questions we have not invited you to also look at monthly research news data that and the Twitter feed that Darren I of publishing so we cover a lot of academic research there also sometimes the state analysis on the foundation and you can learn a lot about various aspects of how Wikipedia is being used that's not meta and on Twitter also the the Wikimedia Foundation's reading team has a page on media wiki.org with ongoing work and quarterly goals right what's coming up in the next quarter what has been up this quarter and the third page I link was that collects part of the things I was talking about which parts of an article being read so you find some of the charts and the papers that such are there and also I'm going to add some things from this talk I just did recently okay any other questions so cool thank you