 So, welcome everyone to the research showcase. My name is Ellery. I'm a data scientist here on the research team at the Wikimedia Foundation. Today we have two exciting talks lined up. The first is on emergent work in Wikipedia by Ofer Orozzi for the University of Aipa. And the second talk is from Charlie Kunchma, a US intern at Wikimedia at Georgetown. We'll have sort of a five-minute Q&A after the first talk and an extended Q&A after the end of the second talk. For those of you who are remote, you can field your questions via IRC using the channel Wikimedia-research. And with that, Ofer, please take it away. Okay, so I guess I'm going first. I'll start by briefly introducing myself. First, thank you very much for inviting me to speak at the Wikimedia-research showcase. I'm going to talk about emergent work in Wikipedia, and I'll share my screen out so you can see the slides. Okay, seeing the slides. All right, so the title of the presentation is Emergent Roles in Wikipedia. This is joint work with colleagues from Darmstadt in Germany, Johannes Dachsenberger, and Irina Gorevich, as well as colleagues from NYU, Ila Lyfty Tassaf, and Odednov. Okay, and this research is part of a broader research program into peer production, some of the communities that we've been looking into, aside from Wikipedia, various open-source software development projects, as well as citizen science projects, that is, iNaturalist. Now, when I'm talking about peer production, I'm referring to community of volunteers that self-organize the co-production of knowledge-based products, basically any product that you can turn into ones and zeros, such as software, code, and Wikipedia, and other examples. Okay, so our objective is to enhance the understanding of how self-organized co-production work is coordinated, and have my youngest boys almost eight years old, and when he asked me, Dad, what are you doing with them preparing for this presentation? What is the presentation about? I told him it's about how Wikipedia works, so my boy said, why don't you just simply look it up in Wikipedia? So maybe after this presentation, we can update the Wikipedia entry as well. All right, why Wikipedia? Aside from Wikipedia's success and being the exemplar of peer production, for organizational scholars, Wikipedia is an exciting setting because of its radical openness. And when I say openness, I mean that anybody, even not being a community member, can go change the product, the product being an encyclopedic entry on a wiki page and publish it instantly. Aside, of course, from few restrictions, you know, page protections and so on, but generally speaking, the co-production model is extremely more open than any of the other examples that I've given for peer production. And if you look at Wikipedia's co-production, it is largely free from workflow constraints in the sense that there's no peer-determined order for the activities. You can, you know, initially add content and then reorganize it and then add hyperlinks or do it in a different order. And tasks are not assigned to particular individuals. Anybody can pretty much do anything, again, aside from a few exceptions like reverse and so on. So this is why Wikipedia is so interesting. Now, if I ask the audience, how would you explain coordination in Wikipedia and generally in other peer production projects, some of the explanations that I'm sure you'll give and other scholars are given in the past, I've pointed to, for example, Wikipedia's talk page and the coordination takes place there. Other scholars have pointed to Wikipedia's norms and policies as a key coordination mechanism. See a couple of papers in this area. Yet others have stressed the importance of quality work, quality assurance, including members of this research team. And alternative explanations have looked at the functional roles or the special access privileges as a key coordination mechanisms. Now, all of these explanations are valid, yet in this talk, I want to look at a different line of explanation. And what you see in the slide is these previous explanations. Okay, so although Wikipedia has an extensive set of norms and policies and the talk page is there and the special access privileges, if you look at the number of participants that are likely not familiar or not very familiar with the norms and policies and are likely not active in the talk pages, these are roughly 50% of the editors here. This is out of our data set, 52% are non-members in the community identified through their IP address and I would estimate that even some of the registered members are not fully aware of the norms and policies are not very active, maybe not even reading the talk page and often do not have special access privileges. In terms of the number of edits or the amounts of activity, roughly 40% of the activity in Wikipedia is performed by IP address. And I will argue that to a large extent this work performed by IP address contributors could not really be explained in terms of the other lines of explanations that I've pointed to earlier. Okay, so how is work coordinated or this portion of the production work in Wikipedia coordinated? And when I say production work our focus here is what takes place on the main namespace, the co-production of encyclopedic entry and we're aware of the vast activity and other namespaces but that's outside the scope of this particular study. So recently organizational scholars such as Farage and others have described the co-production work as highly emergent and acted at the moment. Let me read you a short excerpt. The co-production process that evolves the knowledge artifact then is less predefined by the organizational structure and instead is one of generative response to proposals that change the content over time. Well an approach that emphasizes organizational structures may provide general rules and context for this evolution. It does not capture the highly individualized, transient and unstructured response to individuals of individuals to maintain co-production in a fluid environment. Okay, so if this really how work takes place then how can people arrive at anything that is coherent? How then is work coordinated in such environments? And this is exactly our research questions can order organically emerge independent of those coordination mechanisms that I've described earlier and then what could explain the emergence of that order. And to be more precise when we speak of order in this particular research study we're talking about a robust and stable prototypical activity pattern. So basically bundles of activities that contributors tend to make to get together for example a contributor may focus on copy editing tasks other may focus on restructuring the articles and we refer to those prototypical activity patterns as emergent roles. Okay, so this presentation is largely based on forthcoming paper ad information systems research also follow up at CSCW 2017 but I'm not sure that I have time to get to that the second part so I'll concentrate on this paper. And the figure that you're seeing now is actually where this presentation will arrive at the very end. I want to use it now as a teaser just to give you a sense of what this work is about. The title talks about turbulent stability and oxymoron and we're seeing turbulence in some levels stability in others. Okay, so I'll redo these slides let me know if there's a problem with voice. So the title of the paper is turbulent stability of emergent roles and turbulent stability and oxymoron we see both turbulence and stability in different respects and the figure that you see is the illustration which I'll end this presentation but I want to give it here as a teaser and what we're seeing is on the bottom left in orange is an arrow representing time and we're seeing two surfaces one at period one the second in period two we notice that the surfaces are quite similar in their shape. The surface at period one is created by inflows that push the surface forming mountains and then puncturing the surface and then the outflow afterwards same for the surface in period two the inflows and outflows are different flows yet the surfaces are very similar. So this is the gist of the paper and I'll explain the details as we go along. Okay so in terms of research method we looked at a thousand Wikipedia articles from the time of their creation until our cutoff in 2014 using a double stratified sampling that sample article from different topical domains as well as from different maturity levels in terms of the number of revisions 250 articles up to 10 revisions then 250 articles 11 to 100 another 250 101 to 1,000 and then the last category over 1,000 edits altogether over 222,000 distinct contributors and what I say contributors including those that have made only a single edit to articles in the sample altogether over 700,000 revisions. Okay data extraction procedures focuses on the type of activities and we try to tag the activities here seeing an illustration for the article of Chicago different content elements created at different revisions the second part shows the revision history and assume that each one of the revisions was tagged with one and more edit types and once we tagged the revisions we're able to create profiles for contributors and later we cluster those to arrive at what we refer to as emergent roles. So this is a brief illustration some additional details the typology of wiki world which we've used lists 13 categories from creating new articles to adding content deleting content fixing typos refreshing texts and so on this is based on prior works we do not create this typology in terms of annotation we started with a different set of 90 articles that initially had over 34,000 revisions multiple annotators students basically that helped us in annotating this this small data set and we arrived at over 13,000 revisions with reliable annotation reliable meaning full agreement by multiple writers and we use this manual notation set as the input for machine learning algorithms again very briefly go through this getting too much into the details but the actual algorithm was based on the work the prior works of two of the co-authors Daxenberger and Grevich using the Raquel algorithm with random forest classifier performance is very similar to human performance. So after we were able to identify the or tag the 700,000 revisions in our data set we created profiles for contributors we assumed that the contributor can play multiple roles across articles thus creating over 300,000 distinct contributor article activity vectors and this is an illustration each activity vectors you're presenting in terms of the proportion for each of the edits assume that the colors here represent different edit types and this is a profile for one of the contributors using came in clustering algorithm with a Euclidean distance measure and we use standard techniques to determine the optimal number of clusters in this case compactness separation optimal class of quality arriving at seven clusters okay so once we've tagged the 700,000 revisions in a data set we then created profiles for each of the contributors having a distinct profile for each contributor in each of the articles it was active in and the profile is represented in terms of the proportion for each of the activity types as you see in this illustration different colors representing different edit types this is a profile for one contributor in a particular article next we cluster those profiles using came in clustering algorithm with the K between 2010 we're determining the optimal number of clusters using standard techniques compactness separation optimal cluster quality as well as using langiz machine learning technique for verifying that our clustering solution captures the natural clustering of the data okay so the seven centroids of the clusters represent seven emergent roles what you see on the left hand side the colors represent the 13 types of activities and each one of these roles with the hats represents a different combination of bundle of activities from all-round contributors vandals watchdogs and so on some of them are based on multiple activities such as all-round contributors and some of them are one-dimensional for example vandals most of what they do is vandalism are the watchdogs that mostly correct vandalism i'm not going to get too much into the details of these roles because i want to talk more about the general structure of things rather on the individual characteristics of each of the roles okay so once we identified the prototypical activity patterns we moved to analyzing individuals and their mobility and what we found is that individuals are highly mobile in fact i'd say volatile in their movement in and out of articles analysis on a year by year basis shows that about 90 percent of those active within an article in a particular year were not active before so they're new and then only about 10 to 15 percent of them persist to the next year so the attrition rate is extremely high and even the few that do persist their activity within a particular article over years they usually change their pattern of activities so moving from one emergent role to the other i won't get too much into the numbers but by and large we're seeing a high extremely high level of mobility this is an illustration for different type of users these are real Wikipedia users one that was the first one with the blue hat was active in a single year in a single article the hat representing the type of emergent role the numbers representing the articles that they've worked on the second one with the orange hat was active in the first year on eight different articles playing the exact same role in all those articles and then the second year moving to yet a different article playing the exact same role the third category represents people that persist their activities within one article yet switch roles and then the fourth category is people that move between roles and articles okay so this is what we've seen at the individual level now in terms of the overall organizational level of overall structure and when i when i i'm referring to structure i'm talking about the nature of emergent roles so what we looked at is different stages in wikipedia's evolution and you can look at this from many different lengths but wikipedia until the end of 2006 and from 2007 onwards is a pretty different organization in terms of the special access privileges that were added here we see the graph of newcomers in many many different angles so we try to analyze split the data set for the first period and the second period and then using the exact same technique identifying contributors profiles and then clustering them to arrive at the prototypical activity patterns and we ask ourselves will we receive the same set of emergent roles and really we expected to see a very different solution one because wikipedia is very different in this two periods second because those active in the first period are not the same users that were active in the second period and third if you played around with clustering you know how sensitive it is to the algorithm's parameters so we do not expect to see the same solutions yet this is what we found here we're seeing the list of 13 activities and the seven prototypical roles for the first period in red border and then the set of prototypical roles that we received for the second period and you see that it is very difficult to distinguish the two clustering solution and when we look at this this is a striking result in terms of the level of similarity okay so all together what are we seeing turbulent stability highly turbulent individual level mobility and then highly stable organizational structure now I can explain what these surfaces in the inflows and outflows represent the first surface represent the clustering solution or the set of emergent roles for the first period the mountains of the clusters the pins in the center represent the cluster centroids or the emergent roles same for the second period you can see that they are very similar the inflows are the people that were active in this period they push the surface and create a surface and then the vast majority flow out of that surface at the end of the period they do not continue only few participants continue to the second period these are the dotted lines in different colors blue red green and even those that continue from the first period to the second period they change their position meaning they play a different emergent role okay so how can we explain this if the large portion of the work is not coordinated by the mechanism that we've seen how do we get this structure the structure how to emerge somehow there has to be some form of even implicit coordination that makes sure is that this emergent role persists over time so this is the puzzle that we tried to explain and others have suggested that the artifact serves as a key coordination mechanism this is Kevin Crosston and his colleagues looking at an open source software development recently they coined the term stigmagic coordination which is a term borrowed from the insect world to describe this type of implicit coordination actors are living traces of their action in the code in software development here it would be in their actions in the article as they are reading and reflecting on the code written by others in order to take coordinated action so if this is the explanation we try to find evidence for this in wikipedia and how do you find evidence for artifact center center coordination we're looking for comments when a contributor would say well i'm looking that the article is in a certain state it is missing references though some coming in and adding references and we look at the comments that people can make when saving changes to a page and this is a qualitative study and we were able to find some evidence for this artifact center coordination i won't go too much into the details because i know that we lost some time a second analysis that we performed looked at articles of different maturity stratas and if article of different maturity levels need different type of work you would assume that the type of emergent roles that they attract would be different and this is exactly what we're seeing a different stages different set of emergent roles that are enacted in those articles providing yet another evidence for artifact century coordination so this is pretty much in terms of the findings what is this main key discussion points why do people enact a role at a moment looking people's motivation this is what we try to gouge at in the follow-up csw paper i won't have time to get into this second is this type of emergent order good well we believe that it is we believe that if you look at other less successful communities you would not see this type of stable structure we believe that it entails effectiveness but this is yet to be proven and other interesting discussion points relate to the conditions under which stable emergent roles can emerge and we have some ideas about what these conditions might be in terms of clear goals and visibility of activity again i'll skip this for the benefit of time over we're running due to the technical difficulties we're running a bit over time so if you maybe wrap up in the next two minutes and we could move on to questions okay i will i'm just at the end of the presentation so the key insight that we get out of this study speaks to this balance between openness and closeness of system and we've seen Wikipedia with a very open philosophy adding more and more restrictions as the years goes by with the growth of the community and the threats to the content and at least our interpretation of the results is that given the right mechanisms you can allow for openness and freedom and order will self will organically emerge okay practical implications some insight for people like yourselves working the design of the community the design of the platform also for business organizations looking to adopt the open principles and holocracies the terms that it is now by business people okay so this is pretty much it thank you for listening and i'm open for questions okay thank you over since we're running a little bit short on time let's save the questions for the end and give charlie a chance to present so charlie if you're ready mr you're actually muted charlie can you hear me now yes great okay hi i'm charlie um i did um my bachelor's thesis at wikipedia germany and now i work and now i'm doing an internship there uh in new x design and research and um i would like to present my research too that i did on my thesis okay and i will be sharing my screen now hopefully that'll work uh whoa that's second can you all see it wonderful okay um so my research was on human-centered design uh no i did human-centered design um to figure out how to use and edit the structured data in wikidata in wikipedia and primarily in wikipedia info boxes and um yeah and i developed a concept for that and so what we wanted to do was to provide structured data as i said um just like comments for example provides uh images for wikipedia and its sister projects in one centralized space wikidata was is supposed to provide structured data for wikipedia projects for example in info boxes where the data is structured and at the same time we also wanted it to um to make it possible for editors on wikipedia or other projects to be able to edit the data on wikidata directly from there without having to um to go to wikidata so it's more comfortable for them and at the same time this would also increase the editors on wikidata which would increase the data and so the reason why we wanted to do this was that at the moment the data in particularly i was looking at info boxes is inconsistent which means that for example i looked at all the featured articles of berlin and compared the data they used for population for example and it was inconsistent throughout every language wikipedia or through every article and um and this is something that could be resolved if um there would be one centralized place where you would update the data and correct and so it would make the inaccurate data also accurate because often it's not just outdated but um just entirely wrong or missing even and um it would help especially small language wikipedia's which can't maintain every single article that may be a big language kind like the english or german wikipedia and it would be easier for them to have um more quality data on their articles this way um and what i did at first was um get feedback from wikidata editors and wikipedia's on the wikidata project chat which is like wikidata's village pump and i asked some questions there like uh what would they um what exactly would they need um if we would do something like make wikidata be editable from wikipedia or any other project and what i should be careful about what i should avoid and things that they really would want to see in there and um the responses i got were very mainly uh that i should focus on wikipedia at first and um concentrate on the info box integration of the editing and also make it work with the visual editor and um when i based on that i did some research on what already exists and what i found was that some communities or some projects already changed their templates in a way that makes them able to import data from wikidata into the info boxes as you can see here for example is the template for telescopes on the english wikipedia and um how this looks then when you look at the wikitext is here you just in the end need one line instead of as you can see down here i took an old revision the all the lines before and so now every single like if if um if those templates would be updated throughout the different languages then the info box would only ever need this one line and it would always have this new and up up to date data from wikidata this is how the info box looks um on the english wikipedia um yeah instead of every language having to input the data this way um another thing i found was um something that the russian wikipedia came up with um which is a way to not integrate the data but to make uh to be able to edit wikidata and this is a gadget they made um where you can edit the corresponding item of the article you're on you can't integrate the data in any way but um you can at least change um entire statements or anything you want pretty much this tool is very powerful um yeah and so since this is since everything we do is a community driven um i wanted to do user centered design because the community is obviously um yeah the the most important thing and um so what i did was try to um come up with personas that i can use and i did this with Ludia the product manager of wikidata we created pragmatic personas which are personas based on experience and not research because at the time um i didn't have the resources or the time like in the scope of my thesis to do this and based on that i created scenarios for these personas and together with Ludia we then formed uh requirements that uh the that my concept would need or the tool would need them in the end but um requirements not only for wikipedia in the end but also for wikidata because there was also things to consider uh for wikidata like we obviously didn't want to increase vandalism for example on wikidata through this and um based on that i then made some wireframes and we then decided on one and from that i then made a prototype which i tested in a usability test with uh 10 people from our berlin open editing session and they and i made sure that i had various levels of um wikipedia and wikidata editing experience and um yeah and what i how this worked was i first let them explore the prototype a bit on their own and then i asked them two separate sets of questions one was actually solving tasks so they could um so i could see how they use the how they use the mockup and um how they find certain functions and the second one was after the fact i asked them questions like what went wrong for them and what worked well what needs improvement and so forth to find out uh yeah to find out more and then um after that i also let them fill out a questionnaire or which had uh questions that asked them to uh ask them about the level of editing experience and their comfort with technology and so forth so i could better evaluate um the data that i got um and so now i'm quickly going to show you the prototype just so you have an idea of how it looked um this on the left you can see the actual info box in um individual editor when when when you're editing it and on the right is my prototype and i wanted to keep it as toned down as possible because i didn't want to interfere with people who would not be interested in the wikidata functionality um so it is entirely possible to not make any use of it at all and just continue working as normal uh as as before um and so there are two main things that i did which is first there's the function of um being able to import the data from wikidata as i said and the other one is editing so editing the respective item of the article you're on and editing the actual data on wikidata and um those two things are completely separate of each other they don't interact at all they do interact sorry but they they're not um they're not interfering with each other yeah um and yeah so the corner here this triangle yeah the little triangle in the corner is what um you click to um import the data from wikidata and then this field which is usually wikitext turns into a normal string and then that's that's the data and um but if you want to go to the edit mode you need to go into this um corner here the wikidata logo and um yeah so we will do that and what happens then is um the sidebar opens and the sidebar then um has information about the item so but first i'll explain a bit what you see here so um this is already a field the blue surrounded one is a field that already has the data from wikidata as you can see it's not in wikitext anymore and the field name is grayed out on top and the corner is now colorful so the wikidata corner is active in a way um and on the side then you can edit the references and qualifiers and the rank most of the things you could also do on wikidata and if you want to change changes uh safe changes actually done to the item you need to um save it uh in an extra step instead of just um going to apply changes as you normally would and this was this was also one of the measures we took to not make vandalism too easy because of um people could just change the item label for example by just going into the field that could have caused um voluntary but also probably often involuntary vandalism by just people maybe not realizing the to what extent they're actually changing things um yeah and so if um if this gets implemented one day then um then we're hoping that what this will do is that the quality of the data and the info boxes will increase and through this uh the the consistency of the data on wikipedia can become better because it doesn't need to be maintained in so many different places at the same time and uh the other advantages advantages is that it'll it'll get um editors to wikidata and this will obviously make the data on wikidata better which in return will make the data and the info boxes better again and this was only a very I mean this there needs there's way more research that I need to do and I'm actually currently working on it and I also have not um in the prototype that you saw I have not integrated the feedback I've gotten already from the user test so this was the prototype that I tested with and um yeah so there's um loads of feedback that I got that I need to implement and um then test again to finalize the concept so we so um so we have something to work with when we want to implement this um tool the node will be a gadget beta feature yeah that's um it are there any questions uh I need to not do this anymore uh one second there we go hello hi charlie I have a question yes um so since we have a little bit of time and you uh you said at the end of your presentation that you have not yet integrated feedback from the user tests I wondered if you could tell us a little bit about the the kinds of feedback particularly like some of the more maybe what what was the most important feedback or the most surprising feedback you got from your user tests um some of the most surprising was probably some of the most uh some of the stuff were in the end well afterwards you think wow that's so obvious and why didn't I see this but for example um the those little triangles in the corner uh a couple of like you know I think like 30 or 40 percent of the users um I thought they were those things you can click to drag the field to make it bigger because that's actually how they look exactly and I just while coming up with it I never thought of it so I had people sitting there trying to drag the field and not actually clicking it which is super obvious when you think about it but um didn't come up with that at that time and another thing was the wiki data logo in the corner many users thought it would take them to the page to the wikidata.org or maybe to the item but they they were actually kind of too scared to click it and didn't really indicate at all what what could happen if you click it and um that's something I need to change there probably needs to be a better symbol or maybe a short word or something that explains what happens um so those kind of things for example but um some positive feedback that I got was also that um the most of the users were very clear on um if they're currently still uh in their old info box in a way so being able to manually input wiki text or if they're actually um using the data from wikidata right now so it was not so what I called it in my thesis was the locality so they were aware of uh how what's going to happen to the info box in a way so that was communicated clearly at least yeah those are some three examples. Thank you. Does anyone else have any questions? Otherwise maybe Offer wants to answer some from his talk. So I do have a question from IRC for Offer. So this one's from Pine. He asks does the research provide any insight into one how users get the information they need to be to be productive from the very first edit or two how to increase editor retention. Okay two interesting questions I'll start with Latif about editor retention so from this particular study we have less information about why editors choose to sustain the participation over time. There has been quite a bit of research on the editors motivations and you're probably aware of it and other people as well in the follow-up ccw paper what we need to look at a different data set where we had both information about people's motivations at the very early stages just as they join Wikipedia newcomers and linking that to information about these emergent roles and what we're seeing is that people that are tend to remain active within a particular article over long periods of time are more motivated by intrinsic motivations the ones that move around articles and emergent roles are more motivated by extrinsic factors such as reputations and peer pressure and so on. So insights more from the follow-up study not from this first one and then the first question was about just remind me it's how users get the information they need to be productive from the very first edit okay so interesting so we don't know how people get the information they need what we know is that a lot of people do not look at Wikipedia's laws and policies and do not look at the talk page but I have a sense of what needs to be done I think that the reason is that the what the end product should look like and that is an encyclopedic entry as readers we have a very good idea and that is one factor that contributes to to knowing what to do and the second one is the visibility of the artifact itself you can look at the article and if you like you can go to history and compare revisions and you have a very good sense of what is the current state of the article so it's like if I walked into a class and I told the students why don't you arrange the tables in a u-shape I don't need to tell them exactly what they need to do they know because they know what the end state should be and they see what each other is doing and then they'll they they know what they expect what it should look like they see some deficiencies and they just go on to correct it I'm hoping this answers okay so I have one more for offer and one more for Charlie um since the one for offer came in first um so this is from uh page's uh pjz uh I may have missed something but is there any indication that classes have changed over time uh in other words uh using your data set when you look at the data over time would it be possible to look into whether new roles have emerged over time or have you looked into this and or aware of studies of that so there are very few studies that have looked at these emerging roles or structural signatures or prototypical activity patterns and I think ours is the first study that looked at it over time and the evidence that we have that they don't change over time is by performing the same analysis independently for one period and then for another period so the vectors that went into the clustering for the first period are not the same vectors that went into the clustering in the second period yet the results of a striking resemblance and again this is this is not something that we anticipated okay so uh this question is for Charlie and it's from Inali um were you looking to how to handle edit conflicts or potential edit conflicts um that's actually something I didn't look into but that's a really good point uh which I will write down for for now the next coming upcoming month when I will continue my work and um but the good thing is there are way less edit conflicts on wiki data than on wikipedia so um so that's that's something yeah it should happen as often I assume yeah thanks for the questions are good to them it's a good point so Ellery I have a couple of my own questions but we've taken a lot from IRC land are there any questions from the room maybe some of your own I suppose the room is just you so far right yeah not from mine we have three with us so I saw that you had some questions in the IRC if you want to take a stab at them sure so so uh this question is from uh me um so so uh when you're uh offer when you were looking at article maturity level I thought it was interesting that you decided to use the the number of edits in the history um and I was curious why you didn't instead look at like a measure of quality for its maturity level well okay so this is yet another valid approach and if I wanted to look at use the quality metrics in wikipedia's internal rating system I would have to have articles where there are a number of different you know quality ratings over time and I'm not sure how many articles you know have this serious of quality ratings that have changed over time and I think that the other studies in the past have also used the number of edits as a proxy for maturity and this is allowed us to sample articles different categories and so on I mean like do you have an idea of how you how you would create a sample of articles with many different quality scores so um uh well I didn't intend to be here but now I'm pitching a data set that I just published um which is the data set that has uh so we we built a high quality classifier of article quality in uh wikipedia and then use that quality classifier to make assessments for every article on a month by month basis and so I just threw a link for that data set in IRC I'll actually throw it in the chat here too so that you can uh find that later but that seems like it might be a useful way to do this but I definitely understand this problem of like you don't know when the assessment is coming so you don't know exactly when the article quality changed this data set should hopefully uh help people to get past that um but but um you know if I can narrow down into my question a little bit um the reason why I wanted to ask because I was curious you know what's sort of like the the um foundation of using uh edits uh as like a notion of maturity like how that fits into you know this abstract notion of maturity um and maybe we can take that offline because I've already spent a long time on this question um but yeah I'd like to talk about that more later if we could for sure so I think that we're at time um thank you uh hopefully Charlie for presenting and thanks everyone at IRC for participating and so many of your questions and we'll see you probably next month okay thanks thanks thank you all for Charlie thanks Ellery bye thank you