 Hello, everyone. I'm Daniel Kinzler. I work for the Wikimedia Foundation in the core platform team. And today, I would like to talk to you about untangling media wiki. This is really awkward, right? Kind of talking to myself. So I don't even know whether you can hear me. I sincerely hope that someone will tell me if it's not working. I don't know if there's three people still having coffee, or if there's 30 people eagerly awaiting my words. So this is a little bit odd. But I'll do my best, and we'll get started. So this presentation, I think, is useful mostly for people who are working on media wiki extensions or contributing to media wiki core, but also to people who may be interested just in what the foundation is up to. Maybe also to understand why some things are just taking as long as they do take. So over the years, media wiki has grown into a big ball of mud, a big tangle of code, of classes that all refer to each other. I have a nice story somewhere about how we got where we are. It involves a winged elephant with a hat on. Unfortunately, I think we won't have time for that today, but I'm happy to tell it some other time. So yeah, it kind of looks like this, except it's way worse. There is just hundreds and hundreds of classes, and there is no rhyme or reason to what depends on what. And that makes things hard to understand. It makes it hard to fix. It makes it hard to predict what effects change will have. So we want to break this up. We want to break it into components. And maybe we want to separate out things that are responsible for user interface or for search layer, maybe something for managing page content, something else for managing users, and so on. Having things broken into components is good in general. It makes things easier to understand. It makes things easier to change. It allows individual parts to be reused without depending on the rest. And it also provides a way to have clear ownership of parts of the code. So a part of the platform engineering team set out on an expedition to untangle MediaWiki Core. And we call it an expedition because we know where we want to get to. We know the rough direction. We know the first few steps. But after that, all the details are unknown. We are sure we will encounter obstacles and surprises. Maybe we'll get snowed in or hit a canyon and have to backtrack. That's just how it is. If you want to know more about this, you can find more information about the expedition team on MediaWiki, or the MediaWiki wiki, of course. So the very first step is to break apart big classes that kind of straddle the boundaries between the different components. And these classes, everything kind of depends on them. And they are big themselves and depend on a lot of things. So they tie everything together into this big knot. They're kind of like the nuttiest bit of the knot. And if you have ever worked with MediaWiki code, you know these classes. They are called title, user, language, and so on. They represent the core concepts we have. But because of that, they kind of encompass everything. Everything uses them and they use everything. And also they are kind of unclear. I just said that they represent our core concepts, but these concepts have become blurry over time or have grown in scope. So for instance, a title has a page ID unless the page doesn't exist yet. And it has a namespace, unless it's an inter wiki link, and it has a page name, unless it's a relative section jump on the same page. It has an associated talk page unless it's a special page. And there's really no part of a title that you can be sure that it will be there. Similarly, a title can be viewed unless it's an inter wiki link. Then we can maybe redirect you, but we can show it. It can be watched in your watch list unless it's a section jump, right? You can rename a title unless it's a special page. So there's really also no operation that you can be sure you can perform on a title object. And that makes things really unclear and confusing. And unclear and confusing vocabulary leads to frustration and bugs. So our mission is to replace these monster classes that bind the code into a big knot with a lightweight vocabulary that we can use to communicate between these components without these components depending on each other, knowing about each other. So we are fighting the monster classes. The first monster class that got slain was revision that actually happened quite a while ago. Currently we are using, we are working on the user class and the title class and after that we will look at the language class. For a rough roadmap, in MediaWiki 1.36 we have introduced a number of new interfaces and value objects to represent concepts that were previously covered by user and title. In 1.37, we will see a lot of old methods and interfaces be deprecated. And in particular, in 1.37 we will start to migrate hook interfaces away from using the old types, which typically involves replacing the hook with a new one, unfortunately. And in 1.38, I hope that we can start to delete the old stuff so we can actually see the benefits of replacing them. As long as we have all the old stuff still around for compatibility, we add kind of like a nice alternative to the old tangle, but the tangle is still there, right? So for the title class, a long time ago already we introduced a link target interface that basically represents whatever you can put between square brackets as a Wiki link. And now, in addition to that, we introduced the idea of a page reference and a page identity and a page record, not a page page record, what happened there? Okay, I'll fix that later. So a page reference is also kind of like what you can link to in an inter-Wiki link, except that it's just a simple inter-Wiki, sorry, what you can link to with a Wiki link, but not an inter-Wiki link, not a section jump, not all of that, just a simple link to a page that takes you to a different page on the same Wiki. Page identity is the same, but it's only for pages in the database, so it would not be special pages, just pages that have a page ID in the talk page, and page record would include all the meta information, like when the page was last touched, what the latest revision is, and so on. User, we are breaking apart into a user identity, that's basically just a user ID and name, an authority object that represents everything that a user can do in the current request, including any restrictions on the IP address, or any restrictions imposed by OAuth grants, and so on. And in the future, we will probably introduce something representing the user account with all the metadata about when the user joined, how many edits they have, their email address, and so on. Over the last few months, we have made quite a bit of progress. The yellow and red lines that start out at the very top of this graph represent user and title, the biggest monster classes, and you can see their usage slowly declining over time, while the usage of the alternatives has been picking up. You can watch this progress using the link on the slide. It's the monster monitor running on Toolforge. This has been a very useful compass, I'll say. Like every measurement, right? It only gives a rough idea of what's happening, but it shows that we are moving in the right direction. So the art here is to move slowly and avoid breaking things, because MediaWiki is not just the software that brings to Wikipedia, it's, of course, a framework that others build on, and so we have to provide a stable interface that people can rely on. And we have worked to refine the stable interface policy for MediaWiki quite a bit over the last months and years, and the idea here was to give people more certainty what they can rely on, while at the same time giving core developers more freedom and more clarity on what they can change. In general, if you're developing an extension, calling things in core is fine, right? If it's public, you can call it. However, you should in general not be instantiating classes, you should not extend classes, and you should not directly overwrite methods or implement interfaces, unless they are marked to be safe for this purpose. Another thing that is worth noting is that the implementation of the stable interface policy relies a lot on the code search tool, so which allows us to search not only through core and all the extensions maintained by the foundation, but everything else on Garrett as well, plus a number of repositories hosted elsewhere. The MediaWiki stakeholder group is maintaining a list of non-Wikimedia foundation extensions, most of them hosted on GitHub, and this repository just has a list of submodules that list all these extensions, and everything that is in there is reachable by code search, so we know about it when we do refactorings. So if you develop an extension and it's not in Garrett, and you want us to be aware of how you use core and whether we need to be careful not to break your extension when making changes, you should make sure that you get yourself into that list, which I believe is as simple as making a pull request on GitHub. Now, how do we change things without breaking them? There's a little compatibility dance that we came up with, it's explained in detail on the expedition page as well. We basically make, we define new interfaces, we make the old class implement these interfaces, we change the method signatures to accept the new types, and then we replace methods that return the old classes, and eventually we can remove the old class. This is very abstract, I just wanted this to be here as a summary. I've tried to visualize the idea. Here the red circles are in, okay, the grayish blocks are methods with a header that defines the method signature, and a body that represents the implementation. And the circles of course are the expected type, and the dots are whatever is actually used inside the implementation. And we start out with just the old types being used everywhere, right? And an attractive idea would be to just look at one class and just replace the old stuff with the new stuff. But then if we only have the new type and we try to use it to access some other component, some other class that still expects the old type, then things clash, right? That won't work. So you would have to find an order where you kind of do the basic things first that don't depend on anything and then build on top of that. But the very problem we are trying to solve is that everything is tangled in cycles of dependencies. So there is no such order that would be safe. So what we do instead is we modify only the signatures at first, and internally wherever we need it, we convert from the new type to the old type, which in most cases is just a cast, not an actual conversion, so it's quick. And so any calls to other components will still work as before, right? And eventually all the signatures accept the new type, even if the implementations have not been changed. But at this point, we can very easily change the implementations one by one or all at once because we know everything is already accepting the new types. And this also works across component boundaries between extension boundaries. So what we're doing is replacing the monster classes that bind the code into a big knot with a lightweight vocabulary of for communicating between components, as I said before, without breaking things. So little time check here because I have more information about how exactly we do this with respect to extensions and with respect to how the new classes that interface this relate to each other, but I only have five minutes left to speak. And I would like to know if there's already questions or comments. So if there's already questions or comments, let me know. There are if the questions, I think it's gonna be three minutes to answer them. So you can like seven minutes to speak more. Okay, I think I will skip over the part where how we do this in extensions. If there are questions that relate to that, we can come back to these slides. I will talk about the type system a little bit because if you are observing the changes being made to core, it might be a little bit confusing what we are up to there. As I said, we are trying to replace title or let's say one aspect of title with the idea of a page reference. Page reference is basically anything that you can visit as a page on the wiki, right? Something that will show up as a page title in the URL. It just internally, it just has the DB key. So the normalized name and the names face ID. Derived from that, we have the page identity which also has a page ID where the page, this doesn't have to exist but it would need to be something that can be created, right? So it's not a special page, for instance. And on top of that, we have a page record that's always an existing page that has all the metadata. For these three interfaces, we have an implementation that is just a value object that doesn't depend on any services and that is not mutable. So kind of the idea that we want. Problem is, how do we get code from using title and wiki page actually to using the new types and to allow for the transition in the way that I described earlier, we need the old classes to be compatible with the new interfaces. For instance, title implements page identity because title can be a page identity, right? It often is. Unfortunately, it's not actually always a page identity. A title object can also just be a relative section jump born into wiki link. So making it a page identity is, well, a little bit cheeky, right? Kind of a lie. So what we did is we made title implement the page identity interface and then we made another interface, the proper page identity that actually provides all the guarantees of the page actually being able to be created. A page identity looks like something that can be created as a page but it could still happen that the object that you have is actually can't be created. And if you try, something will explode. So you will have to check on the page identity which in this stage, things are just a little bit confusing and you have a lot of different interfaces to juggle. We have a similar situation with wiki page and page record, right? Wiki page can represent a page that does not yet exist and it's about to be created. So page record in that case has all the metadata set to null and if you try to access it, things go wrong which is kind of unfortunate. So we have the existing page record interface that gives all the guarantees and of course, the new value objects implement the interfaces that give more guarantees. But keep in mind that this is temporary. Once all the code is migrated away from the old types, we can fix this. The idea is that the only implementation of page identity that is not at the same time, sorry, the only implementation of page identity that is not a proper page identity is the old title class. Similarly, the only implementation of wiki page that is not an existing page record is wiki, sorry, the only implementation of page record that is not an existing page record is wiki page. So once all the code is migrated away from using the old types, we can make them aliases, right? Page identity and proper page identity will just be the same thing. Existing page record and page record will just be the same thing because we got rid of the old classes that violate these guarantees and we will again, end up with a clean type system and we can retire the additional names eventually. So this is how- We have five minutes, by the way. Five minute warning? Okay, thank you. Yeah, so this is how we go about doing this compatibility dance. And we are, well, we are about to release 1.36. We will be doing this for quite a while yet. I would expect to get to this clean situation, sorry, here, in perhaps 1.38. So we only got started on this expedition and you'll have to bear with us it'll take a while, but I really hope it will be worth it. Thank you, that's it from me. So what are the questions? Thanks a lot. Let me ask them very fast. Will title value die with these refactoring? No, title value is the value object implementation of link target. The name is a bit unfortunate and I have been thinking about renaming it. The name could be changed to something like link target value, but title value as an idea will stay. Okay, we have three more questions left. Is there a specific place that discuss vocabulary to use in the API? Not really. I'm trying to think, in the API, or you mean the web API in terms of just the terms, like use page instead of title or something like that. All this refactoring does not really touch on the API, but I can see that it can be confusing to use the word title in one place and page in the other when it means the same thing. And sometimes it doesn't, right? And another question, what is the best way to follow this by looking at the expedition page updates? Maybe. The best way to follow this is probably the fabricator board. We have, there's an expedition tag on fabricator. We are not, I have to admit that we are not super great at tracking every change, but you can follow the overall progress and the overall epics there. I can add a link. Where should I post the link? I can put it. You can later put it to Telegram maybe, because I don't think we have a stream chat here. Okay, I will do that. And will the presentation be published? This was another question. Yes, I would very much like the recording to be published and I can also put the slides online. I think the slides by themselves are not super helpful, but I can put them up on comments anyway. Okay, and the last question. Would the internalized localizable API something practicable? Rolefully speaking, ability to to alias classes or methods? I did not quite understand the question. An internalized, sorry, what? Internalized localizable API. Is it something practicable? The question is this. A localizable API. I'm trying to think what kind of API is meant here. I'm not quite sure I understand. Is the idea that we have a web API where the module names and parameters are not in English? Maybe that is the idea. That's really unrelated to this refactoring, but I'm not quite sure I understand. We can perhaps discuss it on IRC or Telegram. Okay, and another question came. Where is the expedition page? So we already said you will send it to the Telegram. Yeah, I will put links into the chat. The expedition page, you can just search on video wiki.org for expedition and you should find it. Okay, now we have to finish the session. Thanks a lot, Daniel. Thank you. Bye. Bye. The next session is from Luca Marie and about Wikibase starting from scratch. Hello, good afternoon. Thank you for hearing me. So welcome to this presentation. Yes, as you can see from the title, we are now discussing about setting up a Wikibase instance from scratch. Just a few words to me. I work as an IT manager for companies unrelated to Wikimedia and a lot, but I am administrator of a couple of media wiki sites for personal projects. So at a certain point on my journey to use media wiki, I discovered that for my personal project, I needed to use the data from a Wikibase instance, or originally I thought about using the data from Wikidata. So when I started looking in Wikidata on how to access the data from my own media wiki instance to the, from Wikidata to my instance, I discovered what you can read in the right part of the slide that actually the access to Wikidata is restricted to the Wikimedia project. So from that, I understood that there was no way for me to get data from there. In the last sentence, as you can see, the text invites the user to set up the own Wikibase instance. So the purpose of this presentation is actually to guide you and oh, sorry, you can't see the slides. I can try to read the presentation. Yeah, let's do that for some reason. You can't see the slides. Okay, let's try again. Can you see them now? You should see the first title slide. Okay, very good. So, well, you didn't lost anything very important. It was just a basic presentation on the text on the Wikidata that actually suggested the user to go on setting up a personal Wikibase instance to get access to the data. So the goal of this presentation is to actually tell you what they did to help possibly other users to set up a Wikibase instance as I originally did. So, first of all, a little introduction to Wikibase. If people here are not aware of, actually Wikibase is the extension or the set of extensions that make Wikidata runs. So Wikidata for the people don't know it, not knowing this is a media-wiki instance with the Wikibase extension installed on it. So, while Wikidata is, of course, the most important, the most known Wikibase instance to the public, everyone can set up his or her own Wikibase. As Wikibase is not very, very well understand and very well known outside the small community, I think that sharing the experience I had with it and setting it up is something that might benefit other users and also see that the difficulties I found might help other people not to encounter the same one. What we are going to do is starting from scratch, meaning we will start from two existing media-wiki installation that I take for granted. You will know how to set up and we will adapt the two of them in the same way, basically, as Wikidata and the Wikipedia are set up. So, we will set up one media-wiki instance with the two extension Wikibase repository and Wikibase client enabled. And we will call this for the sake of privacy for the rest of the presentation as a server. While the second wiki will only be enabled with the client part, so we will simply call it client in brief for the rest of the presentation. So, at the end, we will end up with two media-wiki instances similar to Wikidata and Wikipedia and a lot. So, very first thing to understand is that Wikibase is, as usual, packaged in an extension package that you can download from the site as every other extension. It doesn't work on very old Wikimedia version, for instance, the 1.29 that I was using, the past was not compatible, but in the past couple of years, sorry, I tested it on several versions like the 31, 34 and 35 that is the current one I am running on and they are all working quite well. So, as usual, nothing difficult here. The extension needs to be uncompressed in the folder, but from now on, the interesting part starts. So, next step are, first of all, to use Composer and second to do the basic configuration of the Wikibase. What is Composer for the people not knowing it? For instance, me, when I started to work with Wikibase, Composer is a so-called application level package manager. So, it's a PHP package manager that takes care of managing all the libraries within the Wikimedia installation. Also, it is of paramount importance, of course, to continue with the Wikibase installation and it can only be along the way. I discovered that it is also very useful for other tasks. For instance, for the automatic extension installation in other cases. The key to operate Composer for extensions is to use or to edit a file that is called Composer.local.json. There is an extensive documentation that I link here, of course, during over the course of the presentation, we can only briefly discuss it, but for reference, you can find the link to the full documentation there. So, here on the right, you can see a basic example of the Composer.local.json. Please note that this file is separate from the Composer.json that manages the MediaWikiminstallation itself. This one is for locally-installed components. So, you see that the most important part is the two strings that I highlighted in blue. The first one is the Installing Composer to automatically install a package that is called monologue, that is used as a prerequisite for Wikibase. While the second one, you can see there is a merge plugin that includes a JSON configuration file that is in the extension folder of the Wikibase extension to take the data from there and also install all the libraries that are included in that JSON file. So, Composer automatically cascades down along the files and downloads the one needed. Just for an example, but this is not strictly related to Wikibase, I show you how you can, for instance, automatically install another extension. This is an extension I wrote myself that when I need to install on my MediaWiki setup, I find very easy to simply put into the Composer.local.json. You don't need to manually install, download or put anything in any folder. A line like this in the Composer.local.json will work automatically by downloading the packages and install them. Remember that normally the source of the packages is the packages.org repository. After you edited the Composer.local.json, it is necessary to start the Composer executable that takes care of installing all the packages and the libraries needed. After this is done, it is necessary to do the configuration of the Wikibase like any other extension. Of course, it needs to be enabled. As I mentioned before, there are two main parts of the extension. One is the repository, the server, and the other one is the client. On the MediaWiki instances that we are calling server, I enabled both of them, as you can see on the right, while on the Wikibase, the MediaWiki instance that should work as a client, it is only necessary, of course, the one that is named client. The two required ones statement ask the local settings to import the basic settings that you can see in the Wikibase and the example settings.php. Those are the very basic settings. Some of them we are going to overwrite later in the course of the presentation, but you might want to have a look at them because you can see the full extent of the number of the options available and how they are set as a default. So, after that, you can immediately work on the server instance. It will work immediately. If you go on special pages, you will find a whole new sections with the Wikibase thing. You can start creating items and properties exactly like you do on Wikidata, for instance. The client instance will not work immediately because it needs some configuration that we are going to see in a second. Please take note and remember that the settings are named client settings and repo settings so you can immediately understand when you're setting up, if you are acting on settings that affect the server or the client. So, let's start with the client. This is a very basic configuration. Of course, there are a lot of options to configure. For your reference, I wrote here the link to the list of all the settings but we will go through the easiest one. This one needs to be added, of course, as usual to the local settings.php after you loaded the extension as the previous slide. So, the first few parameters are the configuration telling the client where to go on the internet to take the endpoint of the server one. So, this is basically the location of the servers, of the server media, Wikidata instance. You can, if you did the installation, you should know already all the data but you can reference all of these data from the special version page of the server. So, the data you take from the server you put here, as you can see, this is a client setting but the options are called repo. So, you are telling the client what the data regards the repository are. This is, I think, pretty straightforward. This is the configuration of the database, of the, again, of the repository. So, you are telling the client what is the repository database. Normally, it's simply the name of the short name of the Wikibase installation and, as always, as usual, is configured on the local settings on the server part. After that, there are two very important settings here. These settings defines the name of the client, Wiki and the group of the client, Wiki. Please take mental note of these two data because they are very, very important. We will get to them in the server configuration to see the full extent of their importance. So, this will, again, we'll identify this client, Wiki and the group this client, Wiki is part of. So, now that we configured the things that you saw in the previous slide, basically, you can use the client instance and the getting data from the server one using what is called the arbitrary access. The arbitrary access means that you, as you do normally with the Wikidata, you need to use the statement, a magic word. You can query for a property, but you only need to tell from where item, from what item you need to take. You see that in the first example here, I have a from field telling them, telling the client what item I need to read from. This is okay, this is versatile, but normally the way you are used to get data from Wikidata in a Wikipedia site, for instance, is using the direct access, meaning a page on the Wiki is linked to an item on the Wiki-based installation and you simply need to ask for the property to get them, the system already knows what item is linked to what page and you only need to ask for the property. So, how is this achieved? You might be surprised by the fact that till the spring of 2019, this was not clearly explained in the public documentation of Wiki-based, how to achieve this. I linked here a few discussion that we had, me and of course many other user had and what we discussed on how to improve the documentation to clearly explain this. Bottom line, the point is that, and be very careful, this is the most important topic of the configuration of the Wiki-based is the configuration of the site links. The site links you might know, I put just a very simple example on the right from Wiki data that are a list of link from Wiki data to the linked pages of other Wikis. From a user perspective, this might seems to be the only use, meaning a list of link of this item in other Wikis, but in fact, this is the underlying way that the server can link an item to a page in a Wiki. You absolutely need to set the site link up in order for the direct access to work. So now we are going to understand how to set this up. First of all, we need to set up the table that is named sites, the sites table, it is the table that MediaWiki use to knows other sites that the MediaWiki is the site. It doesn't need to relate to. Of course, it serves a lot of purpose, but for the reason of this presentation, we are only looking at it as a way to use site links. So first of all, you need to populate the sites table with the sites you need to link and the sites you need to create site links for. There's a script that is called populate sites table that of course automatically populates the sites table with the sites that are known to the Wikimedia Foundation. But for your personal Wikibase setup, of course you need to import a custom list of sites. How to do this, there are several ways to do this. The easiest way that I found is to use an XML file, an XML file that you see an example on the right and you can see the full documentation on the sites table of the Wikim, of the MediaWiki manual. This XML file that you can easily also create by hand or export from other sources, it can be imported in the sites table to automatically generate the proper rows. As you can see here, there is an import site script that takes the XML file incidentally. Of course, it also exists an export site script doing the same thing, doing the exporting of course. While they import sites, it takes the XML file and creates the rows in the database with the proper information. Here on the bottom part of the slide, you can see an example on how the data from the XML file shown here are actually imported in the sites table. So this is how to populate sites table. So this is the way for the MediaWiki to know the sites we are going to connect to. Now, let's proceed with the server configuration. Right away, partner's left and we have one question as far as I can see. So you can arrange your time. Okay, okay, I will cut short and then we will go to the questions. So the server configuration, again, there are a lot of options you can see that are referenced here, but the most important thing as before is how to set up the group that needs to be included in the site links column. You remember before I told you about the name of the wiki and the group the wiki is part of. So the group that the wiki is part of needs to be written here in these repositories. This one is for the group that are shown directly and these one are the groups that are shown together in the special links group. They are just shown in a different way but they mean the same thing. So the group is configured, you remember, on the client. It is written in the sites table here. The name of the site is written on the site, on the client configuration as we saw before. And when you tell the server side to list a special group, the server takes all the site from this group, in the sites table and reads the name that is configured in the client. So this way we are telling the server what sites needs to be included in the site link on the right part of the screen and then you see the links directly. But this is the way for the server to know how to create the direct link and the direct access to the client. So everything runs with the name of the site, the group to which the site belong and the configuration of the sites table, as you see here. This is again the most important setting in the configuration of a WikiBase instance server and client because it is the configuration that allows the exchange of data with the direct access. Just for your information, if you want to go deeper with the configuration of all now to configure site links and inter Wiki that in my tests, I discovered it is worth treating as a single reason, a single concept. You can go reading a blog post I've wrote on Dev that I referencing here. I will not spend too much time here because we have just a few minutes left. But if you're interested, you can simply go to and read that. Another very interesting server configuration is the formatter URL. The formatter URL is a parameter that identifies a property that can be used as a template to construct URL. It's easier to see an example. This is an example from WikiData data again for easier understanding. You know that the WikiData has, for instance, a Netflix ID property that has a string containing in it. This Netflix property contains the formatter URL that is defined in the server configuration file and that formatter URL contains the string with the placeholder. The combination of the formatter URL and the data here creates the URL to go directly. When you click here, you don't see the full URL but it is constructed behind the scenes. Another interesting topic to discuss is where to point when you use the slash entity slash name of the entity or name of the entity.json or .xml or whatever. What is the page that actually the server needs to go and get to return the data you asked. So here there's a mapping of the entity to, of course, the special entity data page. Again, if you are interested in looking deeper into this topic, I have a blog post. You don't have to, it's one minute left. Okay, let me just see if we have any other quickly topic. This is how to set up the sections. There is something you might have seen in the media Wiki. And okay, this is just Michelinus thoughts but basically the presentation is done like this. Sorry for the wide ride. It was a lot of things in a little bit of time. It was an amazing session. Thanks. And there is a huge request that for you to share this slides. Yes, the slides will be loaded on commons. I will load them on commons. Amazing. So maybe we can answer questions in the chat so we can finish the session here because we are out of time. Out of time, okay. So thank you very much for having me and have a nice afternoon. Thank you, bye-bye. Bye-bye. And the next session is from Tohom. You are module's training. Okay, hello. I am user Tohom, active mostly in Ukrainian Wikipedia, Wikidata and Wikimedia commons. Also occasionally make edits to English Wikipedia. In real life, I am a game developer and I develop games in, you guessed it, in Lula. And I also write modules in Lula for Wikimedia. So what are modules and how are they used? Wikimedia uses a lot of templates which are either a standardized form of representing information or some black box function where you give it some input and it gives some output or it may be some often repeated text or anything else. There are two approaches to creating templates in Wikipedia. These are parser language and Lula modules. Parser language is a classical way of creating templates and it has parser functions which allow to use basic logical functions and basic Wikidata queries and so on you probably have saw them. They have a lot of curly brackets. I would say way too many curly brackets. For example, can you tell me from the first glance what is written here? What does this code mean? Well, maybe you can't. Maybe you can do it after reading it but still you need some time to understand what is written here and here comes the first problem with parser language that it's not human readable. If your template consists of several lines that will not be a big problem but if it is as big as here with several hundred lines it will be a problem and changing something here or just fixing something here becomes a big challenge. If you drop just one curly bracket or if you put one extra curly bracket it will break completely and you will not even know where exactly did it break, where exactly did you miss a curly bracket or put an extra curly bracket. It will not point to exact line where you did it and that renders them almost impossible to edit or add new features if templates are written in parser language and are bigger than several dozens of lines. Also parser language is limited in its possibilities. It provides only the most basic control structures if it exists and so on. And it allows only the most basic call to Viki data which is gets only the first value of some property. You can't get labels, descriptions, the second value of property. You can't get qualifiers, site links and so on, but with Lua you can do this. Lua is a full scale programming language which has such things as control structures, functions, variables, arrays and everything that a decent programming language should have. Of course it has some things that are not like in other languages. For example, arrays in Lua are numbered from one, not starting from zero like in other programming languages. Also everything in Lua is a table, absolutely everything. Global wearables are just values in a global table. As well as functions and so on. Lua has many similarities with JavaScript in the way it functions and many similarities with Python in syntax. However, Lua has disadvantage that it has higher entrance point to it than a parser language. Parser language could even be used by non-programming contributors because it is much easier and if some template is written in Lua and there is some mistake, editors who are not programmers who don't have technical background, they tend to turn a blind eye on problems with modules because it is some high tech dark magic and they don't want to get involved in it because it's hard. Or otherwise, they will not do it themselves and ask for help from some editor who is fluent in this and if you are the only or one of the few editors in some projects who can edit Lua modules, you will have a range of questions and suggestions and reports of bugs pouring on you concerning the Lua modules. Also, Lua modules allow you to access wiki data and you can get any part of wiki data item with Lua. Wiki data items behind the scenes are JSONs which are tables inside of tables, inside of tables again and so on. For example, let's take this item in wiki data which is about wiki media hackathon 2019. This is how it looks on the front and for simple users who go to wiki data to see it. But how it looks like behind the scenes what is actually the code of this page. So, this is the source code of this item. It has a master table with keys type, ID, labels, descriptions, aliases, claims and site links somewhere in the end. Here are tables and inside of them are other tables and other tables and with Lua you can get this value but you can do it with other tables. You can do it with a parser language and also in Lua you can iterate over those values which you cannot do in parser language. Is everything clear for this moment? Are there any questions? After I answer the questions I will continue. There is no questions yet. Okay, then I will go next. So, how it works. Let's start from my demonstration from test wiki. Imagine we have a page, an article where we want to use a template based on Lua module. In this page we just call a template and the call of the template will be irrespectful of whether it's written in classical parser language or in Lua. And let's pass into this template a string hello world. Save. Next, we have page template. This is the template that we actually call from this page. It will then call a module which is done with cell door template invoke. You write a sharp symbol. In double curly brackets you write sharp symbol, invoke colon and name of your module. Then a vertical pipeline and the name of the function which you want to call from Lua. Usually it is main. If it is a module for one purpose, for one info box or a net box it is usually called main but it can be other things. For example, if you write in a module that is sort of library of functions then you just call whatever function you want. And now to the module itself which starts with the prefix module colon, semicolon, which is a separate namespace. Okay, let's create it. So since Lua is object oriented language there should be an object. Let's create it. In Lua everything is a table including objects. Objects in Lua are also tables. This curly brackets in Lua means that we create a table which can be an array if there are only numeric keys or it can be a hash array if there are non-numeric keys. So we created here an object which is also a table. We attach the function to this object. We create a function main to which we add the frame that is some sort of what we called in our article. In this object there will be arguments best to template and from template here to the module. And we will return this object. Now let's do something meaningful. Let's do something meaningful. Let's do something meaningful. Now let's do something meaningful in here. First of all, let's get parameters which were passed from article to template and from template to here to module. This is frame, get parent, arcs. And we need the first value. Remember that Lua numbers arrays from one. So here in variable str we will have this string. And now what will return? Let's reverse the string. So put the last character at the place of first in the place of the last and so on. And we will return this string. Please increase the font size or zoom a little bit more. If you have any. Now let's refresh this page. And yes, it reversed the text that we passed there. Here in this return value, there can be any kind of Viki markup. But if it is not a simple text, you should pre-process it so that it is treated as an object. Viki markup. For example, imagine that we put our answer into the table. So table is not visible. So I would better add class Viki table. So something is wrong. Okay, in other way, I should be this way. Okay, I don't know why it doesn't work. So imagine in this rest value, you have not just a text, but some Viki markup, which can be tables function calls some Viki markup like italics or bold text or references or anything else. Without this, it will return them as text and references tables and other things will not be treated as such. Without pre-process, this will not be a table. It will literally write opening curly bracket, pipeline line break, another pipeline, pipeline closing curly bracket and not a table if we write it without frame pre-process. But with frame pre-process, it will be treated correctly. Also, this editing interface needs some explanation. So here is how it's usually. Currently, the screen sharing works, but there is no sound. Can you make sure your microphone is on? It seems like it is on, but we cannot hear you. It was only for the last minute, so no issues. We fix it, we can continue. Still from you, go. Okay, so I was telling about function frame pre-process, which turns text into, which turn text is with the Viki markup into some visual elements. If we don't use frame pre-process, the text will not be treated as Viki markup and it will literally write open curly bracket, vertical pipeline, blah, blah, blah, vertical pipeline, closing curly bracket, but with this function, it will be treated as Viki table and the table will be depicted. Next thing, I have to explain to you some, in some words, about editing interface. Here you have a window of text editing, much like usually in articles. However, there are different buttons and also there is a line numbering. This button is used to indent or outdent. Here is summary and publishing show changes, button like in usual. Article editing window. Here is preview page with this template. This thing allows you to view the page, how it will look like if you apply the changes here, like show preview and it just shows us this. This was supposed to be a table, but something went wrong. Well, oh yeah, I forgot this, right. Let's apply some changes here and preview how the page will look like if we save these changes here. We wrote the name of a page we want to preview and we clicked show preview and now we see how this page will look like with those edits that were made. Also, we have here a console, which is kind of sandbox where you can test things. Lua modules and Lua programming language allows you to do some complicated logic some complicated mess. It has control structures like if else, if else it has force control structure, it has while, but that's not the only plus it has. It also can work with wiki data items, which are stored in form of JSON and let's see how it works. While I did not forget, here is documentation of Lua, how it is used in Wikimedia projects here is a short description of the language itself and then there are functions that were added here in media wiki for the language to be more convenient to work with Wikipedia articles and items and so on. I will give a link to it. Also, there is a sub library, which allows you to work with titles, get labels, get ID numbers. For example, this function, if it is given a QID, it gives you the page itself. This function does the same, but for current page, page from which the module is called, this does it for title, you give a title of a page, it gives you corresponding wiki data item and so on. It is also very useful and here is the documentation of Lua programming language itself. You can access the source code of wiki data items through API, like here, which will give you the content of the page or in the console, let's access this item in the console to see it's JSON. Well, something doesn't work here, but this function gets the item itself and this is a console function that will write it content. So if it is applied right, you will see JSON with tables inside of tables inside of tables and so on. And let's see how to access those queries. First, we get the item itself, oh, not here, should be here and here we should have, so let's get organizer P664, we get the item, we get subtable claims and it's valid for P664, then let's get a list of organizers and iterate over this list. So we get stable for this item, property P664, here it is, we need the first values and main snack, it is actually a first values and main snack, then data value, value which is inside the data value and ID, ID number, of organizer. Let's now get item for this ID to get its title and add it here to this table. Here we'll add labels, English and value and now let's concatenate this table and save it in the very same variable and return it just like this publish. And if I wrote everything right, here should be a list. Yes, I forgot module sandbox or not. Well, when it comes to action, everything goes wrong as always. It says I have an error here, labels multiple. You see now we got these two values from this item and successfully added them to a page with template. So we used our template to add something in wiki data which is somewhere deep in this wiki data item and unlike parser language, we added not one value, but two values. So questions, does anyone have questions? Yes, do you hear me? Not yet. I ping them to let them, we are gonna start Q&A. So let's see if they're gonna ask any questions. There are no questions so far. Just to let you know, I forwarded the links that you shared to the other channels. So they're all shared. Okay, then I will leave my content in this chat so it will be recorded and you can write to me and to ask if you have any questions and need any help. I am user Toham in Wikimedia projects. You can contact me in Facebook messenger or in Telegram or write to my mail. So if you have any questions, you can contact me and here we will end I guess. Okay, thanks a lot for the session. Okay, goodbye. Let me record it. This session is from Daniel Kinsler about converting an extension to the new hook system. Hello everyone. Yeah, doing a workshop with this setup is a little bit odd. I really like to just have everyone in the room and have a conversation while working on code. I hope we can make it work anyway. If you have any questions or comments, ideally write on Telegram. You can also write on IRC and I'll see it just in the main hackathon channel. If you write on the YouTube chat, I should also get your message but it will be copied over manually. So let me take a minute. So just for me to get any sense of who is here and how many people are attending, please just say hi using one of the channels that we have available within the next few minutes. So just so I get a sense if there's new people or people I know or people who are just dropping by or what brings you here, that would be helpful for me. So this session is intended for people who maintain extensions and who are interested in converting their extension to the new hook system. The new hook system was introduced in MediaWiki 1.35 but we have not yet really started to convert extensions to it. We are getting to a point where this is becoming more interesting and more urgent. And the idea is to deprecate the old system and remove it in the not too far future, meaning one or two years I suppose until it is entirely gone. Okay, so so far I'm not seeing any messages from anyone. So I guess I'm just talking to myself. I will keep talking because this is recorded and perhaps it's going to be useful for someone in the future. If you have been working on extensions in the last, oh, I don't know how many years you have been using the extension.json system. And if you will probably know that the way to register hook callbacks is to use the hook section in extension.json. And the main idea of the new hook system is really to get away from static methods here to non-static methods on an object instance. This will allow us, this has several advantages. It allows for dependency injection into the hook handler. It allows hook handlers to share state in a clean way on the hook handler object. And it allows us to use proper interfaces to represent the signatures of hooks. Oh yeah, I'm seeing the first hellos in the chat. That's nice, okay. So I'm not entirely alone, cool. Please do ask questions and interrupt me at any time. Yeah, I try to convert this. Okay, today the example I want to work on today is a category tree. Category tree is an extension that I originally wrote before I joined Wikimedia. I think there's still like a author notice from 2006 or something on there. I have not actually touched this extension in a long time. It has been maintained by other people since and I don't actually know the code in detail anymore. So this is really me coming in kind of from the outside just with a mindset of looking at how the hooks are defined and how they should be changed to use the new hook system without having deep knowledge of the code itself. And there's kind of two phases to this. In the first phase, we can kind of blindly convert to the new system, which allows us to use a new system but does not really allow us to benefit from it much. And once we have done that, we can start to see what converting away from static callbacks now allows us to change about the extension, change how the extension is written in general, moving it away from relying on global state. That's the idea. Okay, so for the first phase, for just converting to the new system, we can simply, well, first of all, if we move away from static methods towards instance methods, right, we have to create an instance. And in order to do that, we introduce a new section into the extension.json file, which is called hook handlers. And in most cases, we only have one hook handler, maybe two, maybe three, but not like a huge amount, not a different one for every hook that is not necessary. And we have to name our hook handlers. And since we're just starting out with one, we call it main or default or my hooks. You can also call it Frank, nobody cares, right? So let's call it default. And the hook handler is defined as an object spec. So you give it a class name, that's the minimum we have to do. And the class name will just be the name of the place where all the hooks are currently, I mean, typically in an extension, all the hooks, hook handlers, all the static methods for handling hooks would be in a single class. Maybe they're distributed over two classes. It really doesn't matter much, right? Okay, so once we have done this, we are ready to convert our first hook function. So the first hook function would be article from title and it implements the article from title hook. So what we do here is we will implement the respective hook handler interface, which is just the name of the hook with the word hooks attached to the end. And so my IDE is telling me, okay, we can do that, but now there's something missing, right? It's saying, oh, you're not actually implementing the methods that are in this interface. So let's look at what the article from title hook interface looks like. It has a single method, which is called on article from title. The name of the hook interface and the name of the method on that interface is always derived from the hook name and it's always derived in exactly the same way. For the interface, you have the word hook attached to the end and for the method name, you have the on attached to the front. The only exception is hook titles that contain columns. Sometimes the hook names contain columns, then they would be replaced by underscores. We will see that in a minute. Okay, so the method we have to implement is on article from title and I will copy the entire signature. Oh, maybe not yet, I will do that in a minute. So I know the method name now and the name that we actually had is article from title without the on, right? So I will have to find that article from title. Then I'll just replace the name, right? Okay, so now again, the ID is complaining. Why is the IDE complaining? It is complaining because this method signature here is much like the method signature here, but not exactly. Here we have three parameters with no type hints. And here we have two parameters with type hints. The method signatures have to match exactly. So we, oops. So I will just copy what is here to end of course. Oh yeah, it's still complaining that it's static, right? So it should not no longer be static. Okay, are there any questions so far? The reason that there are no type hints here is that as I explained in the presentation I gave two hours ago here, we are trying to move away from things like title and also we are trying to recover other old classes like article, which just exposed too much of the internals of MediaWiki. We want to convert hooks to only getting what they absolutely need, which makes it easier to change when and where they are called. It is easier to keep compatibility without having to replace the hook. And sometimes this means replacing the hook entirely, but if we don't have strict type hints and we know exactly how these hooks are used, we can sometimes get away with supplying a fake object here that just implements the methods that are actually used by extensions. That will, but of course that wouldn't work if we had a strict type hint here. The having the type declaration just in the dock block will still have the type enforced by IDEs and it will also have it enforced by our continuous integration tests via the fan utility. These will complain if you're using the wrong type. So there's still type safety here enforced when merging patches, but we don't have them enforced at runtime, which gives us a flexibility to supply fakes for backwards compatibility in the future. Okay, all that being said, this is actually done, right? We are now properly implementing this interface and now we can just say that instead of the method here, we just specify which hook handler to use and we just give the name of the hook handler and then the new hook system will just look at this hook handler and look for the appropriate method on that hook handler and we'll call it. The method name will just be this name with on on the front. And all the hooks will all use the same instance, right? This instance here that is specified here will be created exactly once. Okay, so first hook down, next hook. Now, here we have a hook that has columns in the name. We did that for some hooks in the past when we wanted to kind of have the class name in the hook name. That turns out to be a pretty bad idea for various reasons because it means that if we want to split up the class, the functionality of the class into multiple parts and now the hook gets called from somewhere else, the name is still completely misleading. Okay, anyway, so we have to put in the underscores here. We look at the hook interface. We have the method signature. Go back, look for the old method name which was on special whatever in progress. Okay, so this is the old signature. We put in the new one. Remove the static. So when changing the method signature in this way, of course we have to make sure that we're not changing the parameter names so everything is still in place. The ID isn't complaining, so I guess I didn't replace anything. Okay, so done, right? Next, this is a pretty boring process. There's not that much to do here. I may, I think I'll skip forward and skip to a more interesting example so we are sure to get to it. So copy signature, back on the unspecial. That's the one we already did. It's the other one. We used in the other signature. Remove the static modifier. We're done. Oh, we see. So the ID is complaining about these method names. This is not new. That was also the case before. They used the wrong casing. This way to spell getDB key is kind of sane, but it's wrong. So we have to change it to the less sane but correct spelling with capital DB and lower case key. I think I can just do that while I'm here. Trying to read the backlog on the chat. Am I reading this correctly that there's no comments so far? Yes, as far as I can see. No, no, not here. Okay, thank you. Everything is crystal clear. Yeah, so far it's kind of embarrassingly simple, right? It's kind of embarrassing. Yeah, it's not very, very complicated. Okay, let's skip forward because I want to actually, I want to demonstrate something. Let's look at the before page display one. So we do before page display hook message signature there. Note that this actually has the return type declared. And we also have to declare that return type in at some, until some time in the past, hooks were generally expected to return a Boolean. This has changed already a couple of years ago. So only some hooks are supposed to return or are allowed to return false to avoid operation and prevent other implementations of the hook, other handlers of the hook from executing. This is only supported from some hooks and not for this one. So this declares the return type to be void and any handler trying to return a value will fail. So, okay, what's the old name? The old name here is add headers, completely unrelated to the actual hook. And I'm noting that the, there's actually two hooks using that. And in the old system, I could have two hooks using the same handler function. That is no longer possible. So what I do is I just create a new function, a new handler function here and I just call the other thing. I just calls self headers and I keep that static for now. It could maybe this can also become an instance method in the future, for now I'll just keep it static just to avoid any more confusing, more complication, more complexity in this initial patch. Okay. Oh, I forgot to replace the up here. So, okay, we have now also done the four page display. Four page display. Now we move on to the next hook and we try to do the same thing as before, right? We put it up in here at the hook thing at the end and now the IDE doesn't find it. It's not there. And if I go to my code search and I try to find where this hook is actually called in core, right? It's not there. What happened? Well, this hook is not defined by core. It's defined by another extension and that extension is mobile front end. And if I look at the mobile front end extension there's actually two cases here but I don't unfortunately don't have an example for the second case. So, where am I? So if I look at mobile front end I can see that this hook name is mentioned once in a place where this hook is called and it's called the hook call is the old style way, right? With hooks run. And there's no hook interface. So what now? And the answer to that is simply nothing. Just keep it a static method on the same class. No problem. You can still use static functions as hook antlers. It's not preferred, but since mobile front end is not offering a hook interface yet we will just keep it the old way. And that's it, right? We just do nothing and we'll keep working. We can have static antlers and instance method antlers on the same class. They don't, that doesn't hurt. Now, something that I can't demonstrate here right now is what if the hook interface is defined by another extension but the extension isn't there. What happens? In that case you have to, let's pretend that is the case for a minute. I will undo this in a second. You have to create a separate handler class here. Like mobile front end hook handler. Let's just call it that and we'll also call the class this. And then in this other handler, you can use the hook handler interface defined by the other extension. The reason that this has to be in a separate class is that if that other extension isn't there, if it's not installed, then the interface will not be found. And if you put it on the main handler, creating the main handler, instantiating the main handler is going to fail because that interface is missing from the, because the extension isn't there. Now, if you have it separate, the hook system will only try to create the hook handler when the hook is called. But since the extension isn't there, the hook is never called, so that makes it safe. I suppose since most extensions are not converted to the new hook handler system yet, I'm not aware of any that is exposing handlers the new way. If someone can tell me about an extension that does, perhaps we can look at the example as well. But this is just to say that it is possible with the new system to register a handler for another extension that may or may not be here. Okay, we just found that we don't have to do anything here, this will just stay static. Since Jason doesn't allow comments, I can't make a comment here. Always found this a bit annoying, but okay. So let's do output page parser output as we did before. I can do this quite quickly, import, look at the signature, the signature find the old implementation, which is just called parser output and replace the signature. Note that we renamed the parameter, which is confusing, and now it's good. Yeah, what happens when the extension providing the hook is not installed? Well, if the extension that provides the hook is not installed, nothing will call the hook. And if nothing calls the hook, the handler system does not try to instantiate the hook handler, so nothing happens. Does that answer your question? I'm just looking through the chat whether I missed any other questions. Any warnings that you could help see the missing piece? I'm not sure what you mean by missing piece. So there's two cases here, I suppose. One is the other extension is an optional requirement. Well, if it's optional, there's no missing piece, right? If it's not there, it's not used, done. If it is a requirement, then that dependency should be declared, I think, also an extension JSON somewhere up here, right? And then the extension registry would complain. Yeah, so if the dependency is required, you will declare that and the extension registry will complain. The other way, of course, to do that is you can just put the interface from the other extension on your main handler. And if the other extension isn't there, that will just explode, right? Now we get unrelated chatter about the work adventure in the chat, okay, cool. Anyway, while I was looking for this method, I stumbled across the initialize method. There is a initialize function up here on the handler. And this is not mentioned here. This is not a hook handler, right? It is, let me find it. It's an extension function. Extension functions are like the ancient version of hook handlers. It's like the hook handler before hooks even existed. It's a callback function that is invoked after the setup phase is complete. And I will go on a little tangent here because this is something that might bite people. It is unrelated to the new hook system, but interesting perhaps anyway. So one thing that is frequently done here is manipulate settings. Maybe full in defaults that depend on other, that depend on other extensions or on other settings dynamically here. And this is problematic because this function is called when service instances in the main service container may already have been created. And so this is basically too late. Some service objects may not be picking up the changes you make here. Not a problem for category tree in particular, but it has turned out to be a problem for, oh, if I recall correctly, central auth. So one way we could fix this is we could make this a handler for the MediaWiki services hook. So what one thing we could do is we implement MediaWiki services hook, which is invoked immediately after the service container is created, which would be before nearly all service instances are created. There's two exceptions. There will be two services already created, which is the hook runner itself and the configuration factory. And everything else will be picking up on changes that you make here. Okay, so MediaWiki services hook is on your hook. Copy the signature back. And we'll replace this. Well, I could just, let's see if this is called anywhere directly. Nope. So I can just change the signature here to the new thing. And this means that we can remove the extension function up here and instead register a handler for the MediaWiki services hook. We'll do, okay, services. Wow, I've not seen that before. Okay, my ID just crashed. We will be back in a second with the regularly scheduled program. Sorry about that. Any questions in the meantime? Comments, thoughts? No questions, but I have comments. You should read your notifications on MediaWiki because you have tons of notifications when you switch it up, I saw them. It's not a question, it's just to something that I recognize. Sorry, notifications on MediaWiki? Oh, yeah. Yeah. Yeah. We have a big backlog. Otherwise there is no questions on both channels. Be comfortable, I am following the channels. There is no questions yet. Okay, cool, thank you. Yeah, I would actually be happy about questions, right? To me, a workshop works best when it's actually a conversation, more like me talking all the time. But I realized that that is difficult if we are not in a room together or at least in a shared video chat. Okay, how much did I lose? What happened? Are you asking me the time, you name it? No, my ID just crashed and I lost my last few edits, but that's not terrible. I should still have like 20 minutes, right? Less than 20, but we are close. I will put this in, copy the entire signature. That will work. In the future, we could look into a better system for manipulating configuration. This is under discussion, but for now it's important to at least do it at the correct time and no longer do it in the extension functions thing. So we can remove this and that should still work. If we do output page output, we did. So this can also be default. We have one that stays static and we still have three to go. I will do this quite quickly. If I can. Signature, skin, law, fixed. Maybe a quick side note about these reference parameters. Some of them are actually out parameters where you can modify the parameter add to a list for instance or replace an instance. Others used to be declared as reference parameters in the past because of historical reasons really. In PHP three, objects would be copied if you didn't pass them by reference and that was absolutely terrible. But that has been fixed a long, long time ago. So we are no longer passing objects by reference just as a matter of course. And with the new hook interfaces, we fixed this in quite a few places. And so that's another change that you may see when you convert to the signature of the function in the hook interface. Okay, on skin side bar is done. Bar is our first call in it. It's also somewhat interesting. We'll look at that in a second. Set hooks, okay. So we'll replace this. Now what this does is it sets our handlers for parser hooks or parser functions. And these again are static methods on this class. And we want to move away from global state. We want to move away from the static functions. We want to be able to use dependency injection on the instance and share a state on the instance. So I will just make them non-static, right? Instead of, I can just do this and then I will look at where they are defined for the function is here. I'll just remove the static modifier, right? Here's the other one. When I do this, I should probably search whether there are direct callers elsewhere in this code and okay, it turns out there are some, but they're in the same class. So since we are converging everything to be non-static, we can actually do this. I'm getting a lot of things here. Okay, yeah, Lukas likes the new system, cool. Very nice. Nestle, when you're copying things, could you copy them in the main channel? That way I don't have to switch channels. Thanks. That was I, okay. So this is still red because we have not converted this method to non-static yet. I think it's actually the last one left. Yes, this will go next, right? So in part of the function, we already have it. I can just do this. Do we have direct callers to this function? All right, so the next one will be last hook, last hook version. Okay, let's go. Output paid me to category links hook. I'm not finding it. Why am I not finding it? And this one, actually, I genuinely don't know. I expected it to find that. Maybe I'm confused somehow, because I think that is in core and it should have an interface. Let's see, it does exist. So either I got confused or my IDE got confused. Oh yeah, I mistyped that was all. Okay, fine. All right. This is something that I noticed working with PHP storm in this file. It is PHP storm is very smart about working with JSON and it will add and remove commas when you copy lines and insert stuff, which is kind of convenient. But for some reason it wants to add a comma at the very end of the file breaking everything. That's, I have no idea why that is happening. I noticed this yesterday. Okay, so this should still work. We have converted everything, I think. Nearly, wait, default, default. Okay, let's see if this is still, of course, that is cached. And if we just do edit and preview, still have the category tree working on the page. If we go to category page, navigate up, I have the categories here, domain, zoom in, so that seems to work, no exceptions. So this is basically the first step. Quickly make a commit, sorry. And I will push this commit later. And I'll clean it up, make sure. Well, actually I will push it now and I will make sure if it fails any tests or anything, I'll clean it up a little. And I will paste, I can actually give you the link right now because why not? So I will paste it to the telegram, whoop. Books, workshop, watch, right. No guarantees that this is 100% complete and correct, but it seemed to work. Now, we have like 10 minutes left. I would be happy to have a conversation about this or I could try and push on a little and have, maybe get an idea of what this means, what this just freed us up to do. What would you like to do with the last 10 minutes? Oh yeah, good question, good question, Alex. Is there a situation when I want to use multiple hook handlers rather than one? Well, one situation, as I explained earlier, is when I have a hook interface that comes from an optional dependency that may not be there sometimes. So then that has to be separated out. Another situation is, and we didn't really get into this. Now that everything on my hooks here are in instance methods, right? I could try and look at, no longer using globals, instead have config injected into the constructor. Similarly, I think we get a database somewhere. Yeah, so here instead, we could inject a load balancer, right? So we would have this, load balancer connection ref. And, oh, actually, yeah, that is actually a good point. I did not talk about this yet. So this is a load balancer. And we will want a constructor that actually takes, that initializes this, right? But we have to somehow tell the hook system to also supply this parameter. So we will have to specify which services to pass to the constructor here in the hook handlers. Specification in extension.json. And I will come back to your question, Alex, in just a second, because there is a connection. And, oh yeah, with load balancer, typically the name of the service is just a class name, but here it's DB load balancer, because load balancer itself is a very generic name. So this would make sure that we get the load balancer from the service container injected into the constructor here. Now, the more things I inject for the different handlers, the more cluttered the constructor gets. And if I notice that there is some things that are only used by one set of hook handlers and there's other services that are just used by another set, it makes sense to split the handler, right? Maybe I have one handler that does storage-related stuff and that has like database connection and whatever storage manager is injected. And maybe the other one is for skin stuff. And so I have localization-related things injected into that handler. And so I have all the skin-related handlers grouped into one object and all the storage-related handlers grouped into another. And yeah, just looking at what services and what configuration variables are used by which handler function will give you good guidance at what to group together. Does that answer your question? Okay, I'm reading through the comments, but I only see unrelated chatter. Did I miss any questions? No. Good. No, you didn't miss. Yeah, but it's very good that this came up so I could actually show how dependency injection works here. We have still have like five minutes or so. Trying to think what else I could get into. One thing we could look at is how to inject configuration variables. There's actually two ways, well, we can ask for the config object. We can just ask for it to be injected and remember it in a local variable. Add it here. This is the main config. And just having the config here and accessing it directly is the easiest way, right? So for something like here on part of our SQL in it, I want to know whether the category G allow tag is set. So I would do something like this, right? So that allows me to just access configuration variables without relying on globals. Having the full config object hanging around here is not super nice using a service options object as we do in other places is somewhat nicer. But yeah, I think getting into that here is a little bit overkill. Using the config object instead of global variables is already a big step forward. Same thing here, of course, we could now go through and convert all these access, all this access to global variables to config. Any other questions? Oh yeah, I'm just looking at a bug, by the way. You may appreciate the fact that this is called sidebar up here. It's called sidebar down here. This is called bar. So the bug would be that this no longer works, right? Because this got renamed, bug fixed. That's the kind of small thing that we'll have to pay attention to. It was especially tricky in this case. And I only noticed because of syntax highlighting the environment was telling me that the variable was unused. So sometimes these kinds of hints can help. I am a, you know, if you can work with a full-fledged IDE, do it. It has so many advantages and it makes things so much faster and so much safer. Just give it a try for a week or so. And I don't know about other things, but PHP storm has, for instance, a mode that can emulate VI. If your fingers are totally stuck to the world of VI, you can just keep using all the weird key combos there. So highly recommended. Doesn't have to, I mean, PHP storm probably is the best that I have used. But of course, it's not open source. There is open source options. But yeah. It is way faster and safer than using just an editor. I'm not getting into editor for this territory. I didn't, I didn't recommend Emacs. And actually I keep, you know, I use VI for some stuff. For instance, I use it to write my commit messages. It's still for just logging in somewhere and quickly editing a file or two is my go-to thing for doing refactoring in an environment with like 15,000 files, not so much. Okay. Any other questions? Brian says, whatever editor works best for you is the best one. That is true, but it requires you to actually give new things a chance and actually take the time to try them out properly. That coming from me, because I hate giving new things a chance, change is just horrible and throws me off. But, you know, I never look back to be honest. Ooh, I see chatter about pause in the chat. I never got into properly using it, but I think pause is absolutely awesome. I agree. I really need to play with it more. If there's no more questions, then I think I'll just stop here. Thank you all very much. Thank you. Now we have the break. Great, bye. Bye-bye. Can I stop recording? Now we will have one hour break and then we will continue with the session an introduction to Wikimedia Cloud Services from Andrew. Hello, everyone. Oh, go ahead. Welcome, everyone. So we have Andrew Bogan and Brookstone here with us to talk about Wikimedia Cloud Services. So let's get started. Hello, yes. As already introduced, I am Andrew Bogat. I have with me Brookstone. We are SREs on the Cloud Services team. This talk will be an overview of the services that Cloud Services can provide for you. For the most part, we'll be covering what you can do, but not so much how you can do it. So once you know what you want to do, you should feel free to follow up at the end of the session, ask us questions or find us on IRC or ask fellow attendees about how to actually do things. Explaining the how would cause this 20-minute session to be a 20-hour session instead. So Wikimedia Cloud Services provides hosting and storage and data access for projects that are involved with the Wikimedia movement, but are not themselves the Wikis. So bots, tools, websites, anything that's not under an actual MediaWiki install is most likely running on Cloud Services. All of the resources we provide are freely available for any work associated with the movement. And also all of our systems are maintained in public. They use open source software and our code is on public Git repos. So to start with just a few numbers, the percentages and accounts on this chart are up and down a lot month to month just like page views are on the Wikis, but you can see that the numbers are big, right? There are a lot of people participating in our projects and a great deal of the content contributed to the Wikis is passing through Cloud Services and makes use of Cloud Services. If you spent any time at all using Wikimedia's Wikis, you're probably already using or have seen some things hosted on Cloud Services. A few quick examples. If you use an offline reader, the content is generated by Qwix and Qwix builds its files using Cloud Services. If you've used Wikisource, there's a download link in the sidebar which relies on Wikisource export which is used to create about 20,000 EPUB and PDF files every week by people who want to read Wikisource text. There's the Wikidata game which is used to quickly insert properties into Wikidata and there's deployment prep which is the staging zone for edits or changes or code contributions to MediaWiki. Things get rolled out to deployment prep and tested before they are released to production Wikis. Here is a quick rundown of some of the execution platforms we provide. I'm gonna present these in order starting with tools that you can start using today without any setup and then I'll make my way up to the more complicated, more powerful options. Anyone with a Wikimedia account and MySQL skills can sign on to query and query the database replicas. The database replicas are stripped of personal editor information that contain all the page content and history for most of these. Not the content usually, just the metadata, but yeah. Yes, sorry, crap. The content you can get through the APIs. Yes. Which you can do with pause. A segue, another platform that you can log into right now using your Wikimedia account is pause. Pause provides an interactive web-based notebook to run your code. You can write snippets of Python with ready-made PyWiki bot and account integration which is great for research or one-off batch jobs. Pause also supports R and bash besides Python but the PyWiki bot integration makes Python the most effective tool. It's also possible if you really need to to run a command line shell in your browser and pause. If you wanna write persistent software then you probably want a tool forage count that is a system that gets you a shared, sorry, a login shell on a shared platform. The sign-up process is pretty quick. Once you have an account you can write your software in whatever language you like. Getting set up is more complicated than pause and you need to be a little bit familiar with the Linux shell but you can do a lot more on tool forage than you can do in pause. There are a couple of very common tool forage use cases. The first is writing a bot which is just a piece of software that runs all the time and then either accepts messages or interacts or connects to IRC or whatnot. We have some automatic frameworks that will keep your program running basically forever or run a job every day at four o'clock or something like that. They'll also have ready-made access to the database replicas as in query or pause and also access to a tool-specific database account. The other major use of tool forage are web services. This is another thing that we provide a lot of support for. You can write your little Flask app or your simple web service and then we automatically will provide you with URL and SSL termination. This is usually done by launching your web service in a Kubernetes container. And if just a login on a shared system isn't enough we also have the VPS service where you can apply to have an entire cluster dedicated to your project. This is Cloud VPS where VPS stands for Virtual Personal Server. The upside to this is that if you have a thing that isn't easily containerized and you wanna run it on a VM or a fleet of VMs you can do that. The downside is that once you have your own VM you are also the assistant man of that VM. So this is the solution that's sort of like having AWS account or Google Cloud account where you just get a VM and you get a login to it and then you just set it up as you like. Okay, that is the code execution side. Now I'm gonna turn it over to Brooke to talk about what data is available for you to access from your tools and projects. Okay, so yeah, we provide a number of data sources that you can access from within cloud services to make them a lot more useful. The main data systems that we have that have special uses in the cloud are the Wiki Replicas, ToolsDB and Dumps. Wiki Replicas are our real time system in the sense that they're a live database replication actually coming out of the production MySQL and MariaDB databases directly. They're sanitized over the course of a couple different steps for public use and at point in time they should be appropriate for that. They're a queryable database from Query and that's what you're hitting and MySQL client access is automatically granted to all users of Toolforge and Paws. Slide. ToolsDB is where we provide a using the same credentials as for the Wiki Replicas in Toolforge. You're given access to a shared database instance that allows you to create your own database and write to it for your web service which can be awfully useful times. It's sometimes difficult to just query and process it and put it on an app. Sometimes you do a need to share state. Next slide. And we also help administer the Dumps which is a service that you may be familiar with already because obviously it's on the internet and you can access it from there and there's also mirrors in other places around the internet. It provides several formats like HTML, XML and JSON. It includes the data and the content as well as the metadata. The thing about that is that inside Cloud Services we also provide direct NFS access so you can actually interact with the Dumps tar balls directly on the file systems without having to download them first over the web which can be very useful and it gives you a lot of data that you can work with if you don't need to query the Wikis in real time. Yeah, so that is a very quick rundown of the most commonly used services we offer. To get access to those, you... Well, for pause and query, you don't really need an account. You can just go ahead and use your standard login account like for Wikipedia.org or whatnot. To use Toolforge or CloudVPS, you will need a developer account which you can create the tools admin.wikimedia.org. Once you have an account, you can join an existing tool, you can join an existing project, you can create a new tool. If you want to create a new VPS project, you need to apply and then we have a fabricator workflow for that where we can discuss what you need and what you're using it for and so forth. Ordinarily, our turnaround time is maybe two or three days at best and maybe a week at worst for approving these account requests. But certainly this weekend, we're trying to be available and on call all the time. So if you want to apply for resources, please, or an account, please just ping us and we'll do our best to get that set up right away so you don't waste your hackathon. Yeah, and anybody who doesn't have these things already, you might want to screenshot this slide because it's got some good links. And the next slide as well, which is contact information. We have several communication channels you can use to get in touch. We have mailing lists, we have IRC, which is mostly on Libera starting yesterday, but we're lurking on FreeNode still as well. Our user docs are on Wikitech, which is a Wiki so you can contribute to those docs as well as read them. There's also administration documentation, which is what we used to actually maintain cloud services. And this is an unobvious point. You are invited and encouraged to use Wikimedia cloud services, but you were also invited and encouraged to contribute to the cloud services infrastructure itself. There are several volunteers that have almost, many of the same root privileges that we, the staff have. So if you're interested in helping, please talk to us. And again, everything is done in the open, so you can see what we're doing, you can see the code and get repositories and so forth. That is our talk. I think we're set up to field questions in the YouTube chats. So we're just going to sit back and answer questions, but I'm also going to back up onto the how to contact a slide because that seems like the most useful persistent info. And then we'll just give you a flash of credits at the end. Thank you very much, all for being here. Thank you. So we do have a question, and that is, can we host a Wikibase instance on Wikimedia cloud services? Is that the place to do so? Wouldn't instance hosted this way possibly be a third from other projects? Do you want to answer, Brooke? Wikibase, I don't know exactly what Wikibase requires in order to run. But you can definitely host a Wikimedia instance and any of its necessary plugins. We do provide, as a convenience, we provide a Wikimedia vagrant class that you can use on your instance or you can build it from scratch. And that is a place that you can host it. As long as it's something that is valuable to the movement, not something that's just useful to you. Yeah, I agree. The answer is almost certainly yes. The only reason it would be no is if somebody were trying to host their own, if it was something that would be best done on fandom, then we would send you the fandom. But if you're using content that is consumed by or relevant to the sort of general mission of the movement, then you would be welcome. That sounds like it would be a cloud BPS project. And as Brooke said, it doesn't sound like something that we have ready-made setups for. So we would almost certainly provide you with the resources, but you would probably be on your own for setup and administration. Awesome. So the next question is, what is the difference between Forge and PBS? Should one just run over the other? Oh, thank you. I was just gonna ask you to paste because I'm having a little bit of audio trouble. Okay, well, go ahead. Oh, okay, I'll do this one. Tool Forge is intended as something of a platform as a service in the sense that you can go on there and there's a set of tools that allow you to take your code and put it on a web app or to run a bot or a cron job with as little effort as we can make it. Right now you still have to get a log into a shell and do things like that. And there's tutorials online on how you can get your services up. But for the most part, you have things that are maintained by our team that are running it, like the actual web servers, the actual database, things like that, that's all maintained by our team. On the other hand, on Cloud VPS, you get a VM or you get a set of VMs. You get a quota where you can make VMs and you can make instances and then you can do what you need to do on them. So if you feel confident that your project is going to need and that you're capable of doing something like setting up your own web server, setting up your own instances, your own cron jobs and things of that nature, you have a somewhat freer hand and you have more options in Cloud VPS because you control the servers. On the other hand, you have to control the servers and we don't support them directly. We just help where we can. Yeah, but the vast majority of tools or projects that people propose can be done on Toolforge and that's almost always a better user experience. But often what happens is that somebody will open a request for a Cloud VPS project and then we discuss, right? And figure out a way to have them move to Toolforge or discuss with them whether it's appropriate or not. So feel free to engage if you have a particular puzzle in mind and we can help you figure out what the proper platform is. Definitely. Thanks. Moving on to the next question. So what about hosting a bot on Wikimedia Cloud services? Does that require to request an account for Toolforge and is that account different from the Wikimedia developer account? I'll answer the second part and then let Brooke answer the first part. Toolforge account is effectively the same thing as a developer account. Typically people apply for Toolforge account on tools I'm in and then they get both and then you're off to the races. Of course, if your bot is gonna talk to the Wikis then you need permission and whatever flag is necessary for your bot to talk to the Wikis. That's a thing that we aren't very involved with and I don't know a lot about but that's the other flip side of the bot if it's an editing bot. I said that I was gonna let Brooke answer this but now I'm not sure. Is there a more specific question there or did that answer the question already? Then I'll let the person who asked the question reply to that if it answered that, answer the question. Let's move on to the next one. Any plans for providing an environment to host static file projects or docker containers directly? We already do provide the ability to host a static file which is nice. In Toolforge, generally, you have, we have something called tool static that I don't have a link to at the moment somewhere but in Toolforge there is a public HTML folder where you can basically just put HTML in your tool account which does put it online under your tool name dot toolforge.org as far as docker containers. Interestingly enough, when you're running a Toolforge web service, you're typically running it in Kubernetes. So you are running it in a docker container but it is a docker container that we created and you are serving an NFS volume to that docker container. We are not currently providing the ability for user created containers to run as a service. And for services that are not suited to Toolforge, that is we don't have an alternative for that right now in terms of like a serverless type deployment or things like that, not at this time. We're working on other ideas but not yet. Yeah, literally just for file hosting, we have, I think, budgeted hardware for the next year to support some sort of object store, Swift or S3 probably will be Swift. So that'll at least provide you with a place to stick files that the public can see. But it's very early days so I can't say a lot about what the user experience will be like for that. Yeah, I think the best we have is stay tuned. Yeah, but at least, I mean, it's not someday because we have already ordered the hardware but I think it's maybe for a middle of the, maybe like come Christmas or something I think is when we're getting the hardware. I think we've answered all the questions in the chat. We have a few more minutes. So if folks want to go ahead and ask questions, please do. Yeah, if there aren't any more immediate questions, I just want to put in another plug for our ISU channels. We the staff are in there during our working hours and outside of working hours but there really are a lot of helpful volunteers that know as much about the services we do. So that's really the place to go. It's great. It's generally very friendly and helpful and I would encourage you to use that as your first place for support for getting started with these things. The cloud channel is actually linked to a telegram channel as well. They're mirrored. How does that work? We don't have that on our slide though. Yeah, if you just visit wikitech.wikimedia.org the front page has a bunch of links right there on the front to cloud services docs and communication tools and such. So that's another good place to start. If you're not already IRC, you can start there and that should get you going the right way. I know they can't see this but that's a link I see. But yeah, it's always a good idea to go to wikitech. For questions first. Obitra, do you wanna send that back into the YouTube? Do you like a preparation on my part? We are in our own chat and video and using Obitra to pass things back and forth. So. Done, let's put the link down there. Okay, well, thank you so much everyone for attending. I think we can stop the recording and go to break before the next session. Thanks everyone. Thank you so much, that was really nice to have. Hey folks, my name is Brian Davis. I'm a software engineer at the wikimedia foundation working on the technical engagement team. My pronouns are he, him and I'm here today to talk to you about the tool hub project that I've been working on along with Shristi Sethi for about the last five or six months now. And I hope you folks will leave the session today with a general understanding of what gaps tool hub is hoping to fill for the movement, how to get information about your own tools into the catalog and where to look for discussion of future features and how to get involved in our project. So there's a really rich ecosystem of tools built by volunteers and staff within the wikimedia movement to help fill in workflow gaps on the wikis. There are thousands of bots, user scripts, web services, gadgets, desktop apps and phone apps out there. Maybe even one that does the exact thing that you're trying makes the exact thing that you're trying to do easier or possible. But how do you find them? The wikitech L discussion that I took the poll quote on the previous slide from inspired user Rekordi Samoa to create a fabricator task that collected links to existing parcel solutions. I discovered this task in early 2016 while I was researching ways to help volunteer developers working in what we now call Toolforge. And I think this was and still is a brilliant idea and a thing that many, many people have asked for over the years. So it led to Toolhub 1.0 goals. Our target minimum viable first launch project product for Toolhub is focused on this list of goals on the slide here. We're working towards a core product that makes collecting and reusing information about tools as open as we can. Rather than creating yet another one time list of tools, we wanna make a platform that makes it possible to extend and remix the catalog. And I think really critical to this openness is our choice of what is called kind of an API first design. So a web API is just a fancy way of saying that the web application has features that can be used by other software rather than just humans. And in our case, we're hoping that Wikimedia volunteers will be able to build tools that interact with the data that's stored in Toolhub in many ways. But in thinking through and designing this, we've come up with several personas and use cases that we're hoping to support. One of the first ones is on Wiki editors. We want them to be able to search for templates, modules, gadgets, other tools that help them that make specific editing related tasks on Wiki easier. We want them to be able to make and view public lists of tools in a category to learn about things for those kinds of specific tasks. We want editors to be able to contribute information back about the tools that they use. And kind of one of the wishlisty features, we'd really like it to be possible to write Lua modules that query Toolhub for certain types of tools and then use that information on the Wikis in an automated way. Another one of our personas is developers. Some of you all, maybe all of you all are developers. We want developers to be able to build subsets of the catalog using the API for their personal use or community use. So basically to build tools outside of Toolhub that gather Toolhub's data and show it in some different, better, unique way. We'd like developers to be able to develop a gadget or a user script that makes it easy to register a tool in Toolhub. We'd like developers to learn about the tools that are available in different programming languages that they might be good at contributing to. And we want developers to be able to connect with their users with each other and with other resources like documentation. A third persona that we have is researchers. And we want researchers to be able to use Toolhub as an entry point to learn about specific tools like bots that are available in the Wikimedia ecosystem. Movement organizers are another persona that we're hoping to reach, especially to allow them to search for lists of tools that help with organizing programs and events, maybe lists that are created by other existing organized groups like Wiki Loves Monuments or Wiki Women in Red. And readers of the Wikis are a persona that we care about. We want readers to be able to find tools that recommend them new articles, give them new ways to experience Wikimedia content. How are we doing this all right now? Well, we've got a small team and a tech stack working. We're actually running some social and technical experiments in how we're doing this project. We formed what we're calling an advisory board to help get input from people in key roles throughout the movement during our development. Our advisors currently include foundation staff as well as community volunteers. These folks, they provide us with feedback on our design and implementation ideas, especially in areas where they're subject matter experts. And we use this feedback to help us iterate on collectively thinking about the project from a broader set of perspectives. So we don't end up just making the thing that Brian wants the most, but hopefully we make something that all of us can agree is pretty good. We're also trying to leave behind documentation about why certain decisions were made in a form that we call the decision record. I really want Tool Hub to have a life behind beyond the contributions of any single member of the team. And we hope that documenting why we've made some of our technical choices will help future maintainers when they need to make their own editorial decisions in the project. We're trying to keep the advisors and anyone else who's interested in following along up to date with what Shristi and I have been working on by producing progress reports each week on Metta and also posting a summary of that to a project's development specific mailing list that we have set up. And then kind of on the more technical side, we're trying to keep the development environment requirements simple to make it easier for people to contribute. And we're using some newer technologies that are being adopted in other areas of the Wikimedia movement like Vue.js and container-based deployment tools. A little bit about the Dev environment specifically. It uses Docker Compose to run a set of containers for the Django backend, the Vue front end, a MariaDB database, an Elasticsearch full-text search engine. And this is set up building on top of the blubber and pipeline lib configuration tools that are used in CI and will eventually be used for our production deployment as well. Then we've tried to set this up so that we've encapsulated as much as possible within the Docker container layer. So on your local machine, you should really only need Docker, Git and GNU make. And everything else that's needed should be included in the Docker containers with make file targets that automate things like running your tasks and regenerating the localization files. All the coding standards that we're using are being enforced with winters that can be run locally and are also voting when things pass through Jenkins in the CI environment. And this is a cool thing because it means Trusty and I don't need to quibble with each other during code review over little things like formatting. If the liners have passed, then it must be okay or we need to go open a bug about adding a new linter because we don't like what's happening. So let's talk a little bit about the various features. Oh, and my screenshot isn't showing up on the screen. Interesting. So various features of the project that the start is a home page, a landing page that's got a search box on it, a paginated display of the cards showing small summaries of the tools that the system knows about now. I think today on the demo server that I'll give a link to in a little bit, there are about 762 tools, not about, there are 762 tools currently indexed. And this is in large part thanks to our compatibility with Hayes Directory and its tool info JSON standard that we'll talk about in a little bit. Now I got one same slide with the screenshot, cool. Okay, next feature is tool info cards. So these are the little summaries about each tool. They have some information there, image, title, description, author, keywords, things like that. The next feature is a detail of a tool info. So from the tool info card, there's a button you can click for more information, takes you to a full page display about the tool where you get a lot more information, potentially depending on how the tool info record has been created. And one of the things you can see maybe here in the screenshot is in the upper right corner when you're in a right to left to right view is a view history button, which is another feature. So we have edit history, right? People are used to working in Media Wiki. This should look kind of familiar. Get a detail page about the edits that have been made. And here you can do things if you have the proper rights, like revert and undo and get diffs. Faceted search is a huge and exciting feature for me in Tool Hub. So users can search through the tools using free form text search terms and then refine those searches by selecting common values from the matched documents. Those common values are things called facets. There's sort of search navigation that you've probably seen on e-commerce sites where there's like maybe a list of departments or sizes or colors shown along with your search results and you can click on them to refine your search and include only things with that particular attribute value. And another feature that we have is tool registration. We need to get data into this system. So there's interactive edit forms that allow you to create a new tool document initially and then go on and edit it to get that sort of edit history that we've seen in the previous feature. You can also get your tool info data into Tool Hub using externally hosted tool info JSON files. And so then in our user interface and user interface screenshot here is shown in Hebrew to show you that we have some right to left support working in the system. So if you host on Tool Forge or in your Git repository or wherever your external tool is hosted if you host a JSON data file conforming to a particular format you can then come to Tool Hub, tell us where that URL is and then the system will go out and call your URL and bring that data in. That tool info JSON standard is something that was started by Haye or Husky depending on whether you know him from content wikis or the development side of the world with his tools directory project. The Hayes directory was an awesome innovation for the tools community that came out of discussions with Kermania in 2014. And when James Her and I were working on the initial documentation design for Tool Hub we made a very deliberate decision to start from Hayes standard and build on it in a backwards compatible way. This helps Tool Hub by providing useful data even before we've launched the product for public use. And we think this helps the community by showing that we can build on and extend the things that each of us make. More features that are in the UI, there's some status pages to show you what's going on with the web crawler and detail screens of those particular crawls so that you can see which URLs were crawled and whether they ended up in a completely good green state or some sort of intermediate failure state. And you can, when you're looking at this view you can drill down into the what happened with a particular URL, especially if it's airing out so that you can see what the system saw. If the file was a 404 or if it was badly formatted or something else happened. And we have an audit log stream, similar kind of to special logs and special recent changes in the MediaWiki software kind of combined into one list of actions taken. And we're working on adding more features to this that will eventually allow you to filter these logs by date range, by the user that took the action by the kinds of actions that were taken. 10 minutes, that's fine. Thank you, Paritha. I will move a little faster. So API documentation. I said API first design. The front end is built with JavaScript that talks to the back end APIs and built into the front end. We have a documentation browser where you can see what the APIs look like, what parameters they expect on the input, what they give on output. And you can even use this little console to do live testing of how you might use the API. Related to that, there's developer settings that allow you to create OAuth consumers that will then work with Tool Hub. That really kind of wraps up our features and then active work. We're working on content moderation support things. We're working on, and then we will move on to working on curated lists of tools. And finally, we hope to add community added information for tools in the form of a thing that we're calling annotations. So the core tool info records will be owned by whoever initially creates them, but then there's other information that we should allow everyone to contribute to. If you wanna help us make this all awesome, there's a demo server at toolhub-demo.wmcloud.org that you can check out. Our translations are done at translatewiki.net. You can go over there and help provide translations that are missing. There's a Tool Hub tag in fabricator that you can see our existing backlog and you can add new bugs and ideas. And there's pages on Metta under Tool Hub that you can see to follow our progress reports and other things. And there we go. We got to my credit slide, which means we are now ready to take questions. Avritha, do we have any questions from the community? Oh, that was a great talk. And I did have one question, it was how they can call you and I think I would add it in the end. I want to talk more about it, we can. And I wish folks to start to have questions on that. Your audio broke up a little bit for me, Avritha, when you were saying what question there was to answer. Someone asked if I was interviewing and how we can help. Oh, yes. But it's towards the end. If you wanna talk more, we can. Yeah. Contributing the projects in Garrett, the bugs are tracked in Fabricator. There's a read me that maybe helps you get set up, but I'm BD808 on IRC. Come find me on IRC. Let's talk, let's chat. Let's see how we can get you involved. I'm definitely interested in getting some community members excited about this project and helping contribute patches and make things better. So I don't think we have any more films, but I love your color. Someone who's a very cool member and they love your results. Awesome, awesome. Well, thanks for having me talk today, folks. And I will see you around at the rest of the hackathon. So I'm Shiko Venansu, or do you wanna go first and introduce it? Yeah, I'm happy to introduce. So we have Shiko here and he's ending on introduction to the pause for Python beginners. And he also be taking questions in the end so I can drop them into your chat and start to prefix it with, you know, question for two speakers so you can track your questions. So let's get started. Cool. I'm Shiko Venansu. I'm a volunteer from Brazil. I've been editing Wikipedia for the last 15 years. Somewhere along 2017, I started maintaining pause and I'm here to talk about how can it be used for beginners and hopefully for advanced use cases as well. Those are interesting to you. And I'm also generally available for the hackathon if anyone wants to go into a more specific question. I can take it right now or I can take it later in private if you want either in WorkAdventure or Telegram or IRC or whatever, I think most of them. They'll get into pause. Why was it created? UV was one of the main drivers for this and pause is a very oriented to ease the way that Wikipedians and Wikimedians can use more advanced tools, can use bots and scripts to do their edits and get things to work. We have other alternatives like To Forge and Cloud VPS for that. Before we had Wikimedia Cloud Services, we had To Server and you can always do the same things that you do on pause on other cloud providers or other notebook providers. Notebook providers, we have Colab, we have MyBinders or org and you can always just spin something up in AWS or Azure or DCT or wherever you want but that does take effort. That does mean you need to understand how these things work. You need to get used to that environment and usually that means using the command line at least medium way, it's not easy for everyone. And for pause, it's just there. You already have authentication with Wikimedia and Wikis. You can just start coding right away. If you don't know how to code, you can even just copy something that someone else wrote and just change it to be for your Wikis, just translate the strings. So it's a lot easier. We're trying to reduce the command line tax to get people to use advanced tools for Wikis. So starting up to Forge, which is the easiest of all those alternatives, means you have to set up, it's possible for experienced programmers, but you means you have to set up a developer account for Wikitag, you need to set up an SSH key, you need to upload that to Striker. You have to learn what is the job grid and Kubernetes or Kubernetes and one of those is fine. And for Cloud VPS, you have to do second factor authentication and there are still weird edge cases, though they've been improved a lot the last few years by the Cloud team. For pause, you just go to the webpage, you sign in with your Wikimedia account. That's it, you can start a notebook, you can start to code immediately. And everything that you do there is already open access and you can access it under the public pause page. One thing that kind of shows this is that pause has now 3,487 users that have ever used pause. While To Forge had 2,150. I mean, obviously To Forge users, the medium, the average To Forge user is more active on To Forge than the pause user is active on pause. But this means that more people have at least some contact with advanced editing capabilities. And you can see the growth is a lot higher for pause. The last time I did a talk like this was in 2019 and we've had 1,200 something new pause users and only about a little bit less than 400 new To Forge users. How does pause work? This is a bit advanced, but I just wanna get this out of the way. Pause is Kubernetes back to have instance. So every time that a user gets into pause, a pod is created, a server is created for them inside of Kubernetes pause cluster. And storage there is handled by NFS and authentication not only to get in and define who you are, but also to get access to the API is done with media we get met up. This is the general technical, one of the general technical diagrams for that. Users get into the proxy, the proxy sends us to the hub with authentication and then you have a user pod separated for each user there. I can go to more detail if anyone wants, but this is not the main focus of this. So what can pause do? A lot of use cases are endless because it's a server and you can do anything you want with it. But use cases are usually quick data exploration. This is something that I do myself a lot. Even for before writing tools, I'll go into pause and I'll write something to understand what's actually possible. And that means that I don't have to get into To Forge and try to query databases or to use PyWeekBot into Forge. I can just do that a lot easier in the web interface with pause. Creating dashboards, this is a very cool use case for pause because the dashboards can be updated and you just create a notebook and instead of having something that you copy and run on To Forge every time, you have a web page with the results and it's a lot easier to share and show that to other people. In making bot edits on a wiki, that's a very common use case. You can do that both from the terminal and from the notebook. The current limitations for pause is that we don't yet have a way to schedule execution. So it's just something that you're running while the browser is open. After an hour, the server that is open, there will be killed. So you can get away from your computer and the browser is there, that's fine, it will still run. But if you shut down the internet or your browser, after one hour, that will be killed and it won't run anymore. Using pause is easy. This is a 30 second demonstration of that. I just input the page there, sign in with media wiki, it already gets my username and I allow it and I'm in my server and I can create a Python notebook from there or I can create a terminal. And that's it, I mean, the 30 seconds there. Forking in pause is almost easy. Hopefully we can get something a little bit better on the second time. But as I said, and as can hopefully be seen over here at the end of the video, once we create a notebook, we immediately get this public link over here. So this is just a link to the pause public instance with this notebook, even this empty notebook will be there. And forking means that you can actually go to this public link for other users and then download the notebook and upload it into your own pause server. So this is the workflow for that. You go into the public link, you add format equal raw at the end of your URL and then you can save that. And I have almost here exactly because it's not as easy as we wanted to make. So choosing one notebook here, my first notebook and I can kind of show that just adding the format raw and you can save. Yeah, Windows messes with extension. So you need to save as all files there. And then we have notebook locally and I can upload that to my pause server. Using pause is very easy. I mean, we have no books. And the three things that I like to teach people about notebooks is that we're using cells. The notebooks itself is not the context of what you're executing, you're executing each cell. Each cell can be code or it can be marked out. And cells can be run in any order. This is a print screen from that, my first notebook. And it shows that this cell was running the seven thing that was run. And the first one of import was run as the 18th thing that was run. And this usually happens because you're changing something in the top zone and then you run it again. And then you go down to another cell and run it. So it's a very interactive way of running code. But in the end, the order does matter in terms of presentation. So you can also have these markdown cells like a first Clinton to power pause is a markdown cell. Let me show you that live on my first notebook over here. So this is something that I forked and you saw the video of me forking this a few minutes ago. And as you can see, this is a markdown cell and I can double click it to have it as the bra markdown. And I could, for example, make this level two heading instead of a level one and it will as soon as I run it now be a level two heading instead of a level one. And the same if I run this, this is from the forked instance. I have never run any of these cells. So if I run it again, it will now be a level one. And if I can continue down, it will be level two, level three, level four. Let me show you this other notebook that has more interesting examples of how you can use Python notebooks to do things in wikis. This notebook is very comprehensive. It has getting started with APIs and all kinds of ways to interact with the different media wiki APIs and even with outside APIs, apparently. But it's interesting like how you could use requests which is a popular Python library and the page that you want. And here's an interesting way to interact with it. And this, since it's not even authenticated, should work out of the box to get it. And we can get the response from the REST API here. Py wikibot is very interesting because we already insert authentication for it. So this is not an authenticated request because we're just getting the text. We're not saving. But we could then say page, and I won't do it, page.text equals something else. And if I then go page.save, it will actually, and I run this, it will actually make the edit for me as my user that has logged in. And I'm not going to run this because vandalism is not nice. But as you can see, it's a very interesting way and it's the most common way that I use pause and py wikibot. This notebook is very interesting. It has lots of, lots of instructions in Markdown and it's a very good practice to have to not only have your notebooks but also have documentation inside the notebooks. And let me go back to that table of contents. It also has the same instruction on how to work a notebook. And these are not working, these links inside but they do work on the public option. And you can also use outside APIs because this is again open service, open server that you can use anything on it. You can run anything. One thing that I want to stress also is that most of the uses of pause are Python notebooks but these don't have to be Python notebooks. I could have here, this is running as a Python 3 kernel and what we already have installed in pause is Python and R and bash kernels. And soon we'll install Julia kernel because that was asked for. But we could have several things over here. We could have node, we could have, I mean, the possibilities really are vast. So if anyone wants a new kernel running for pause, please ask and we can take a look at how that's possible. Where can we take pause? Jupter Hub is a project that is being developed very quickly and the whole Jupter notebook also develops quickly and has lots of releases and new features. So we can incorporate that into pause. One of the one of the interesting ones that already is there is Jupter Lab. And if we, that's not what I wanted, I'm sorry. We have 10 minutes left, Chico, just letting you know. Thank you, that was a perfect moment to say that. So we're here on the traditional interface, but if we change it here, we have a secret interface that's called Jupter Lab. And it should be a little bit nicer and this will be fixed soon, but has very different interface and more modern, I think. And this is what's actually currently maintained by the Jupter Lab team a lot more than the Jupter interface. One of the reasons that we don't have this as default yet is that if we open it here, the same book. As you can see, it's already running because I was running it over there. We don't have the same public link. That's one of the extensions that we need to figure out how to do for Jupter Lab. But it's a very interesting interface for people to use. And the public link can be figured out just by changing this to public or changing the link over here, we can copy shareable link. And this, if we just go public, should, yeah. Yeah, it did not. Yeah, there's some URL hacking there to get it. That's the main reason that we haven't switched yet, but it's a nicer interface overall. So as I mentioned, adding new languages to be used is very, possibly, it's not very vast. We probably add Julia hopefully by the end of the hackathon. And I mean, it's just a very long list of possible kernels over here. We're not adding them because maintaining this is not free, not free of effort. So whenever we have a request, we'll probably add it. We also need some extensions that could be used to solve these issues. I mean, a lot of scheduling of executioners has been a dream of mine for a few years. Drafts and publishing would be nice as well. You have, you know, books have by default a versioning system that is being marked as checkpoints. And we had all the versions of the previous versions of the notebook also stored, but a way of presenting that to all the users would be very nice. So kind of like have drafts and publishing so that the public could have a more polished view of the notebooks and have some kind of source control. Like it would be very nice as well because right now it's kind of hard to even for advanced users to understand what's going on on the versions of their notebooks. Are there any questions for me? This is the end. Not yet, I don't see any questions, but people are free to ask any in the YouTube chat. And I also have the YouTube chat open so I can respond to anything. And well, so I'll be available as I said in the beginning for all anyone that wants to hasn't tested pause yet and want to use it. I mean, I'm not the only one who knows how to use it. So you can ask other people in Telegram, IRC or we work adventure and there's lots of people that can help and I'm available as well. I can do a session to help with your particular notebook or script. Susanna is asking if there are any challenges with uploading images, for example. I've done a few image uploads from pause, but the bigger challenge that you have is to use this interface to get images here. So what I usually use to upload images to comments when I'm doing pause, I don't know if I have any. Yeah, I'm not sure that I'll have one here. But what I use is something that I'm getting images from some other source because sending images to pause is not easy in this interface. So getting images from like scraping a website into pause and then sending it to I think I have some here. Now this was just Excel XML and sending it directly to commas is something that I've done in the past, but doing it directly with images on your notebook is not as easy. It wouldn't be my tool choice. Any other questions? There's only three minutes left, I think. Let's wait a minute and then we can wrap up. Yes, I think so. No other questions. You do have one question or two. First is, can you improvise in module you wrote in the system? Yes, yes, you can. It's actually, let me share my screen again for this. UV actually made some magic that we can import things that other users wrote in the file system and you can definitely import things that you yourself wrote. So it's not, I mean, this is just a Python. So there's a Python kernel running in the back. So you can really port anything that you have over here. I don't normally use this, so I don't quite remember the syntax that UV created for this, but I can look it up and send people, send, look in. What's the, it's more of a call-out, right? The Susanna's mentioning that we're meeting the MatBurns in WorkAdventure after this workshop for Jupyter Notebooks. And I'll be there as well. I'm not sure how useful I'll be because there'll be plenty of people to help, but let's go over there as well. Awesome. So I think we're dying and we can wrap up. Thank you very much, Kiko. Thank you. And we'll see you in the next talk. I hope so. Next up, we have a pre-ordered session on sedimentation by hand. And if you have any questions, you don't have any light, okay? But right after the stop, we have a Q&A session in Akru, so if you have questions, you can go there and discuss with all the other writers. I'll write them, and we'll begin. My name's Dan Schick. I'm a technical writer for Wikimedia Deutschland and in this presentation, I wanna share some of my thoughts about writing documentation well, especially in the context where documentation work is getting done in a community and not an organization. I basically have a few things to say about writing good documentation in general and then some further observations about doing it when essentially nobody's in charge. So without further ado, let's talk a little bit about planting trees. Now, plenty of people know the old quotation about planting trees. It's cheesy and nobody knows exactly who said it, but it's still true, even in the context of documentation. The best time to document this project was when it started. And the second best time is now. So before we begin, I have a question for you. Does documentation even matter? Naturally, I hope you think it does, but it would be naive of me to claim that everyone feels the same way, especially when other priorities come into play. So let's presume for a second that it doesn't. So I'm gonna cover some common arguments for lowering the priority of documentation that I've encountered in my years doing this job. Does documentation matter? Here's some arguments against. One, our stuff is easy to understand if you're smart enough. Number two, it'll become obsolete in an eye blink. Number three, it's a waste of time better spent on raising product quality. Okay, so one by one, I will deal with these. Number one, our stuff is easy to understand if you're smart enough. Well, of course, this is an incredibly elitist claim, but besides that, it's dead wrong. In practice, nothing is self-documenting, often not even to its own authors and definitely not to end users. Now, it's especially easy to buy into this when you're creating something for a very select audience. But not only are most projects not that narrow, it's also way too easy to overestimate how obvious a certain functionality is and how to use it when you didn't make it yourself or when you did, but now it's six months after turning to another project and you find yourself asking, how did that even work? Also, needing a new topic explained is not a sign of ignorance. It shows curiosity, intelligence, and humanity. And you know, everybody benefits from having new things explained to them by someone in the know. So, argument number two, it'll become obsolete in an eye blink. Well, obsolete, yes, eye blink, no. When you do it right, documentation is not a single act. It's an ongoing process and it's part of a product's development cycle, which if it's healthy is not just a way of change, but it plans for it. Yeah, people who claim that documentation is not worth writing just because it needs to be maintained, that's the worst kind of fatalism. Don't hold with that. So, argument number three, it's a waste of time, better spent raising product quality. Well, I hate to break it to you, but documentation is already an essential part of your product's quality. That quality plummets when users can't find the information they need to use it correctly or the way they want. So, I'm glad that's settled. I hope you agree. And now a preview of coming attractions. So, part one of this talk is about writing documentation in general, how to get started, how to scope your documentation, understanding and empathizing with your audience. And part two is about documentation in your organization, especially if that organization isn't hierarchical. In the Wiki community, people volunteer their time and getting documentation to happen in that context is sometimes not as easy as in a hierarchy, not that it's easy in a hierarchy. So, part one in your own sandbox. So, the first piece of advice I have for you is to start early, but start. I mean, ideally, documentation would start when a project begins, but as you probably know, this rarely happens. Lots of projects have zero documentation even at a very late stage. If you're feeling hopeless, realizing you're behind on documentation, that's not gonna help documentation get written, you just gotta dive in and swim. So, maybe you're reading this right now through tiers with an empty docs repository and a project that's near or at completion. If so, congratulations, you're way ahead of the pack. Yep, tons of projects don't even start documenting until way after version 1.0 or launch or whatever. If you're thinking about it now, you are totally a step ahead. So, when you're in the situation, the first thing to do is make a list of the vital documents you need. A great place to start, use cases. And once you have that list, you'll need to find subject matter experts who can write outlines for each document. You don't need to write the articles, you just need them to write the outlines. But get ready to iterate because they are gonna be trash. Those outlines are gonna be, it doesn't matter how good your subject matter experts are, their outlines are gonna be inaccurate, incomplete, and provisional. And it'll be so embarrassing. But, as I said before, you have to start somewhere. And that starting place is gonna be substandard to say the least. But that's actually okay, that's great. You're seeing your documents, they're inaccurate? Excellent, you found inaccuracy. Now you can replace that inaccuracy with better content. Oh, they're incomplete? That's useful. You see the gaps, and now you know where you can put in more content. Oh, they're provisional. Well, news flash, everything is provisional. Docs grow over time and they start out as babies. Babies can't chop down a tree or drive a bus, but they can when they're adults. So anyway, be prepared to make multiple passes over your outlines. It will totally be worth it in the end. So, maybe on the other hand, you're reading this through gritted teeth because your once documentation, your two once current documentation is now completely out of date and incomplete. If so, welcome to the club. This happens to everyone. So, what do you do? Audit, it's time for an audit. Dive in and just audit the documentation you do have. When you're done, you wanna come out with four lists. You wanna come out with the list of documents that are actually in good shape. Warning, this may be a short list. Documents that need updating. Documents that are necessary but not yet written. And documents that are obsolete and need to be archived. By the way, not deleted. Always keep your obsolete documents. Just keep them out of the main area, but don't throw them away. Then go back to the previous step. Just find subject matter experts and create outlines. You're gonna smash lists two and three together and use that as your task list. And yeah, again, your first pass, it's still gonna be trash. You just gotta keep doing it and doing it and doing it. So, why do I keep talking about outlines? Because of scope. An outline sharpens your mind, it keeps you focused, it maintains the readability of your documents and most importantly, it keeps the scope of your document clear. If you have a document that you're writing with no outline, you run the risk of deadly and boring scope creep. I mean, come on. We've all seen it. We've all done it, probably. Certainly we've seen wiki pages that are way, way, way too wordy. But scope and outline are great, but what you don't wanna do is let an early outline hold you back. Once you're way deep into a topic, you may actually decide that parts of the outline are meaningless or that you'll be leaving out vital content. This is where iteration comes in again. You might feel that your document might be better as two documents, especially if you're writing to somebody else's outline, which you probably will be. I'm skipping ahead, but there it is. In this case, striking the balance between maintaining scope and having an outline and not being held back by your initial assumptions. That's a balance you just have to strike by letting your instincts and your audience guide you. And now your audience. Well, you gotta know your audience. Writing without the audience in mind creates a totally different problem from your docs being out of date. A doc that was written for the wrong audience, it can read really well, but it doesn't help the people who need it. So don't write a document until you know who you're writing it for. I think that's a really important principle that not a lot of people necessarily put into practice. Sometimes that audience might be you alone. That is also known as note taking. And nobody wants to read your unedited notes. Emphasis on unedited though, because you can turn your notes into a decent document, but note taking is not the same as documentation. Okay, moving on. So for any given document, I invite you to consider the following questions. What groups or demographics will read it? How will they end up reading it and can you control how they discover it? Like what flow is it gonna be linked from a website or is it gonna be on a company intranet or is it gonna be on their desk or who knows? How are they gonna get there? That is how you're gonna find out who your audience is. And in these documents, what information do you want to impart? What do readers already need to know to understand your document? Here's a tip, tell them. Make sure to include prerequisite docs if there's something that people need to know before they can understand what you've written. Also keep in mind your reader's attention span and their likely interest level. If they're just reading it because they have to, punch it up, make it more interesting. As much as possible anyway. The key is to get across what you wanna get across to the audience you wanna get it across to. So another thing to keep in mind is when you are likely to have as your audience non-native speakers of the language you're writing in. Try to use shorter declarative sentences whenever possible. Avoid using slang, metaphors, and niche references that not everybody will understand and pay extra attention to readability. Now by the way, I have pretty much completely violated these considerations in many places in this presentation. I've tried to present all the crucial information in a straightforward way but I've also tried to add a layer of complexity on top to make it accessible. But luckily you get to read my notes after you are done watching this presentation and you can also look at the wiki version of this which is in straight prose form and does indeed conform, or at least I hope it conforms to these restrictions. Here's the link, well I mean that is to say the link is here in these notes. And so back to this last point for a moment on the topic of maximizing readability. Here's an exercise you could try. You've written your document. Imagine you're in a room with your reader reading your document aloud, only to them. What are the questions and concerns that you think they might have? It is actually kind of amazing. There's two benefits to this. One is that you'll be able to hear the document read out loud, it's coming out of your mouth but it's also going into your ears. And you can't help but get a different impression about it when you actually have to articulate the words that you've written. Because if it's hard to say it might be hard to understand as well. And of course, and this leads to my next point, when you're speaking your document out loud and you have your audience in mind, you will be engaging a facility that most people don't immediately associate with documentation and that is the facility of empathy. Which is the technical writer's secret weapon. Once you've correctly identified your audience, you can begin writing. But now you've gotta put yourself in your audience's shoes and view your documents topic just as they will. That's what you have to keep in mind when you're choosing the words you will use to express your ideas. So you just wanna imagine your state of mind before you learned what you need to document. That's the act of empathy that is difficult but so, so useful. When you're reviewing the subject matter before you even write it, review it in that mindset. What was it like when I didn't know anything about this stuff? It's a kind of mental time travel. Except this time around you're gonna have the presence of mind to write everything down. So this is pretty heady stuff. Let's take a break. And when we're done with the breaking, it'll be part two, it'll be time for part two. Have a good break. Part two, playing well with others. So here's the problem. Nobody likes reading the manual. People say it's bound to be boring and it's not gonna help me. It's gonna be just easier to dive in and figure it out myself. What could go wrong? Even those quick start leaflets which are the desperate attempts of technical writers to get just the bare minimum of information in front of users, they so often get ignored or thrown out. It's really, it's really sad. But also nobody likes writing the manual. Organizations dread producing documentation. They often end up postponing it, deprioritizing it, rushing it, all the while fearing that most people will never even read it. And those users who do try to read it, they have a harder time using the product. And communications between producer and consumer thereby collapse. And the product's image suffers and we don't want that. You don't want a bunch of frustrated users who don't know how to use your product. How do we break out of this vicious circle? Well, I have a suggestion. Create a culture of documentation. How do we do that? Well, it's a slow and ratcheting process. First, allocate more time and resources. That will improve quality. And then the higher quality of your documentation will start to amaze your readers who came in expected to be completely bored. And then that higher quality raises their expectations. And then that feeds into this whole cycle that ends up making all of your stuff more usable because as your customers' expectations rise, your organization will see the need to add more resources to the documentation process until it's at a level that is actually helpful for the users. So again, how? Well, in order to achieve all that stuff, I actually think there's just two small things you can add to your documentation process that probably aren't in it already. Number one, time. Dedicated time. Here's where the difference between hierarchical organizations and communities might come in. In organizations, resources and time are allocated by management and time is jealously guarded by those managers. Documentation therefore has a woefully low priority and comes very late in the process. Now, communities, it's even worse because those time and resources are donated by the participants and that time and those resources compete with those people's actual lives, not just their work time, but their actual lives. In those cases, documentation is often rare and it's seen as optional, it's nice to have, it's so easy for a contributor to think, I've already contributed this tool, documentation is yet another gift I'd be giving, how much do they want from me? And then they think somebody else's problem. So here's some things to think about depending on what your role is in a community. Are you one of the people working on the documentation tasks themselves? Like, are you a participant or are you the one organizing the work? So if you're a participant, well, you just gotta commit some time. And you've gotta be realistic about exactly how much time you can offer. And the best way to choose is to let your interest guide you as opposed to picking a task that is just a horrible slog, if at all possible. And the community manager, their job is to create very specific and narrow tasks so that they're more easily picked up by the participants. And this is probably obvious, but the community manager needs to just sort of announce those tasks as opposed to trying to assign them to people because people will just need to pick them up since they're donating their time. But what the community manager can do, the pressure they can exert, is to review that task activity often and show up in those tickets or whatever process you're using to manage the tasks, show up often and make it clear that you are paying attention to what's going on. It takes a gentle touch, but that's something that can really motivate people in a good way to do good work and to complete their work. So in short, participants should follow through and community managers should follow up. So that's the first thing you need is time and time management. The second thing you need is love. All this is a lot. It doesn't matter how much advice I throw at you, people are still gonna feel frustrated about having to write documentation. There's no doubt about it. People will say, I still hate writing manuals. I can't just magically raise the quality of my documentation. I've got no time or desire for explaining what I do or the saddest of all, perhaps. Why are we even doing this? But it doesn't have to be that bad. There is a simple principle that can definitely help. Have folks do what they love. Subject matter experts don't actually need to be able to write an article solo and technical writers don't need to be universal experts. These are things that many people find surprising and interviewers involving people who are just interviewers, they can get information in a comparatively short timeframe. So you just get, you divide up the labor by the skill set and then you take a load off people's shoulders. People often experience very deep relief when they realize they don't have to do the whole task end to end. So from each according to their ability. So writers, they write the actual documentation. They do so by digesting info provided by others and their skill lies in addressing an audience. Whereas interviewers are good at engaging the brains of others, namely subject matter experts. And they're good at sticking to outlines, keeping scope and their skilled at framing questions. Meanwhile, subject matter experts, which in some shops are the people who just are forced to write all of the documentation. Well, of course they're skilled in their subject area but they're often overbooked. And you know what? They are allowed to hate the act of writing as long as you've got these other roles to support them. And then the managers, they help the other three work together. They monitor people who are overheating, who have taken on too much or who are burning out. And managers are skilled at following through perhaps the most important thing. Yeah. When people realize they can collaborate by doing what they're good at and that they're interested in, it's often a game changer. Documentation becomes not a chore but an opportunity to share knowledge and help others. And honestly, you know, as I conclude here, I'm gonna tell you this really oughten to come as a surprise. Because after all, and this is cheesy, but it's true, you can't plant trees in sand. You need moist fertile earth for that. Thanks for listening. Have a great day. Thanks Dan for recording that video. And just to reiterate, if you have any questions, we have a documentation Q&A session hosted by Sarah in Hacking Room 1. You can go there and chat with the technical writers. Thanks. Hi everyone. So next up we have Lucas, the picky data life query. And let's just hand it over and get started. All right, thank you. Yeah. I am basically looking for suggestions from you for what kind of queries we should write here. I shared this, either I've had already before. I'll just paste the link again in the Hackathon channel on Telegram. So if you have any ideas, you can put them here. And one thing we can start with is the data challenge ideas from yesterday, which were great. And maybe if we can start with, someone picked out this one, sure. A list of all the COVID-19 vaccines and include the developers of those vaccines. And so I started this query by looking for, I've remembered that one of them was called Covaxin. So I looked for that on the search and just looked, what kind of statements does that have? Is that an instance of vaccine? Is it an instance of something else? It's a subclass of a COVID-19 vaccine. It's also an instance of a vaccine type. Most importantly, it has this property, vaccine for COVID-19, which is great. So I decided I'm going to ignore all the instance or subclass of statements and just say it's a vaccine for COVID-19. That's how we find them. So to list, start by listing all the vaccines, we would do select anything where the vaccine is a vaccine for. And then with control space, it, if I can type control space, gives me the property ID. And we say it should be a vaccine for COVID-19. And if we add the label service and then select not just the vaccine but also the vaccine label, we get a list of 46 results, which is pretty good. I didn't know there were that many, I assume some of them might be experimental or something, or ones that didn't work out, that failed the clinical trials or something. This is a candidate vaccine. This is also a candidate experimental, experimental. Yeah, but I think we're interested in all of those. And then the question was to include the developers of these vaccines. I should zoom in here a bit so we can read it. Also close this. And we can get the developers as another statement. It's right down here. Developer is the property. So I assume that's the right one. And we don't want manufacturer or something but the developer. So we add developer, oops, and select the developer label. The issue is now we get a lot of results and they could in theory be in any order. And I first thought we could order by vaccine to ensure that the same vaccine is always together. But then we can also combine them in a better way, which is to use something called group concat. So we say we group by the vaccine and the vaccine label which means we get one result per vaccine. And then we combine all those developer labels, which we do by saying group concat with a custom separator, maybe a semicolon as developers call that developers maybe. And now the issue is this is actually empty because the label service is quirky. We have to tell it that we're interested in those labels. So that's vaccine, RDFS label, vaccine label. And more importantly, developer, RDFS label, developer label, because once the developer label is hidden in the group concat here, the label service can't find it automatically anymore. So we need to tell the label service we would like this label. And now we get this one comes from the National Institute on Algae and Infectious Diseases from MIDI Data Solutions and from Moderna. This one just has one developer. This one has two and so on. So that's how you can use group concat to combine those developers, but someone was also asking, let's close this. How to concatenate the vaccine developers into a single value? We've just done that. And then is it possible to show the countries the developers are from on a map? And I'm sure we can do that. So in that case, I guess we would actually discard the, we would stop selecting the vaccine and instead get the developer with label and also the country label. Remove the labels or the standard label service again, remove the grouping and the developer should have country, country. That's just a list of countries. And now we want them on a map somehow. And I guess one option is that the country should have a geo shape and we can select that shape and set the default view to a map and now it will load the shapes from commons. And then in a moment when it's done loading, we should see them on a map, but we will probably see some overlapping shapes which might not look very useful. Okay, this isn't working for some reason. Let's just try that again. There we go. So those are the countries that have developed COVID-19 vaccines. This one is shaded a bit differently. I'm not sure why. But maybe one thing we could think about would be to show the number of different vaccines so that it's not just overlaying areas. So in that case, we would again group by the country, country label, you would have the shape and then count distinct vaccine as vaccines or call it the layer maybe. And then we should see, oops, and we need to group by country, country label, shape, group by everything that's not, doesn't have an accurate function like the count. And then we have from one, that's orange, two, I think my screen just froze, two in red, three in blue, four in pink. India has four vaccines apparently and also the other countries down there that I don't recognize as well. But that's still India. But this one is, that all India bought some of those who are going to other countries, maybe not. And then five, United Kingdom is partially participating in five, apparently, seven, China and then 15, the United States. And if we wanted to be, do all countries have shapes on counts already? As far as I know, yes. I think possible, I'm not sure. I think when I checked this two years ago or so, there might have been some confusion where the Netherlands had a geo shape but the kingdom of the Netherlands didn't have one. But I think apart from that, all the UN member states had a geo shape already two years ago or so. But we can also try that out. I have a query for UN member states because you need it pretty often. And let's find exists states geo shape as shape as has shape and select this. Oops, this. And two countries or two UN member states do not have a geo shape, which is the Danish realm and the kingdom of the Netherlands. Yeah, okay, okay. But then you get into the question, what is the country of a, so the P17 of a company in the Netherlands anyways? Do people said kingdom of the Netherlands as a country or do they said Netherlands as a country? Because must be down here somewhere, right? Netherlands, I'm pretty sure has a geo shape. Just the kingdom doesn't geo shape. Yeah, this one does have one and the Danish realm, I assume Denmark as a country also has a geo shape. Yeah, there it is. So it's only those, I guess wider states that don't have geo shapes out of the UN member states because complications. What's going on in the ground? Nothing related. Okay, so this, I guess, should I put this query somewhere already? This is map of countries developing of it, 19 vaccines by a number of vaccines. And if we wanted to get really fancy, we could try to color this better because you can control the color. You can say something like your zero FFDD as RGB. And then all the layers are shown in this color. Apparently I picked some kind of pastel blue or cyan. And so if we got really fancy, we could select the right RVG, RGB, according to the number of vaccines so that we would get a nice scale that way. But that's pretty complicated and I don't think I want to do that right now. So having it like this with the legend over here is probably good enough. Let me just make a short URL for that and drop that in the either pad. I did not mean to make that into a new line. To experimental possibly expand the data challenge tasks related to maps to fetch geometries from open street map using SoFox. Or plot all the values of a date time type on a timeline. Do you have some more details for that off any prop? I have to write in here because the YouTube stream apparently has some 36 delay or something. So if I write in the either pad, it's probably going to go faster for a single item. Okay. Okay. Then let's, yeah, that could be something like the population of Berlin select date population where Berlin is Q64 has the population. No timeline, not the values on a certain time. I was thinking of something else. What would be a good example for that? All the values of the data. We can also, we can change this into times at which the population of Berlin is known. So that's PQ point in time. And we actually ignore the population value and default view timeline. And then we get it. Well, okay. I think we need the value after all. PSP1082 population, I'll add that as well. And now we know why do we not see it in the table? Oh, because I called it time here and date up there. That's why. There we go. So that's a timeline of all the times where we know the population of Berlin. There's some big gaps here. And then towards the modern day, it gets pretty crowded. But maybe there was something else a biography would do. Oh, of any property, right. I missed that part. I forgot about it for Q42. So let's try that. Sure. Timeline of Q42. Q42, any property predicate time and property has wiki-based claim P. So this P is then something like P, P31. And then other property has has the statement property or the wiki-based qualifier predicate. And this can be something like PSP31 or PQP31. And we select the time and let's say the predicate so far and make that a timeline. And then we get mainly qualifiers, but there's also some main statements. Yeah. So maybe let's turn that into a two-part query. Let's make that a union. So we start with some PS time where the same property wiki-based claim P and wiki-based statement property PS. Union, a case where Q42 PS or PQ time where property wiki-based claim P and other property wiki-based qualifier PQ. And then we need to create a nice label out of that. Let's still include a value here. Do it like this. And then let's do something like this for now find main statement as kind and over here bind qualifier as kind and then select the property, the other property label, other property label and the kind and also add a label service. What does this look like? We get date of birth, date of death and where did all the other results go? Select table and all the other ones, the time is, oh, we should limit the other property to be wiki-based data property type wiki-based time. The same goes for the property up here, by the way. Okay, now we only get two results which probably means that it's something wrong here because I assume Douglas Adams has some time qualifier somewhere such as where he was educated or something, right? So we would expect to see qualifiers such as the spouse or the child. Why is the date of birth a qualifier here? Never mind, novelist start time. So we would expect to see those. Why do we not see them? Any property with a value and the pq because I wrote quality liar instead of qualifier. That's all. That's the whole reason. Okay, now the query is being pretty slow which is strange. Let me just catch up on telegram in the meantime. Yeah, okay, someone noticed it already in the YouTube chat, but I did not see that in time, sorry. And now someone noticed it in telegram as well. Why is this being so slow? ps, pq, it's probably running something in the wrong order. Did I not bind some property the right way? No, this one is definitely... Let's try disabling the optimizer. That might help in this case because we want the query service to run all of this forward. No, no, if it's still not returning quickly then I probably did something wrong. But this part worked. So it would be something in this part. Let's put that into a separate query. Select anywhere, limit a thousand, p, ps, lupq. No, I think that should work. Let's see why it's not working. Let's open another tab, make the limit one even and also hint, maybe hint, prior hint gearing forward helps. No, that is query hint exception, blah, blah, blah, statement, pattern note. No, that doesn't work. That is not a pattern where we are allowed to use the gearing hint. But it doesn't work. Why doesn't it work? Let me just check that I use the right variable names everywhere. p, s, q, property and then other property, but that's intentional. The statement and the qualifier can come from different properties in this case. Let's try to have this in a separate query again. Remove this part, maybe. Also limit one. I do not understand. Also, it worked earlier, didn't it? We saw some more values on the timeline. Let's reload this and remove both of these lines. In fact, let's remove this one as well. This should definitely work. If we move the limit, we will get some thousands results, 20,000 results, but still, that's pretty fast. Now we restrict it to this and that took half a second for 1,000 results and the p is p108 and the time is whatever. Then we say that some other property should have the qualifier pq. That still works. Then if we add this property type time, if that's the expensive part, we can work around this and instead say filter data type of time equals xsd date time. That's an option. It still takes a while here, but I reload this. Does it help here? Now it returns in one second. This one still doesn't work. Maybe because I still have the pq base time here. Let's just reload this. Also stop disabling the optimizer and add the same filter data type of time equals xsd date time. There we go. 300 milliseconds. Why does the query service do that? I don't know. Also my screen is frozen. One moment. No, it's not frozen. The scroll wheel just doesn't want to work for some reason. There we go. I thought the pq base time here might be less of a problem, which is why I only added it down here or replaced it down here first, but apparently this is why. This works, right? Property where? Property type time. Yeah, 56 properties. It's not that many, but apparently the query service picks the completely wrong direction with that. What if I add prior hint gearing forward? Does that help? No, but if I add if I write it as a property with a dot instead of a semicolon. No. What's wrong? Statement pattern node. Just forwards. Then I don't know how the gearing hint works anymore. We have something anyways. The results look like there's the date of birth statement. It probably makes sense to put the kind in parentheses, I think. Let's distinguish it a bit from the other things. What we don't select here at the moment is the value, if there is any. So let's add the value label. And then we see we see them in a weird order, but we see errors number 10 and the point in time of that qualifier is 13 April 2017. I don't remember if there's any way to affect the order of these. But that's kind of, oh yeah, okay, and then that's unfortunate, of course, if the value is actually not an item, then using value label is a bad idea. But yeah, we have something. Let's create a link for that and put it in the etherpad. And then close Douglas Adams. And close, oh, I still had the query with the timeline where it worked. Too late now. Okay. Do we have any other suggestions for queries? What did I miss in the chat? Can you share that query? I put it in the etherpad now, which is also linked in the schedule page. Expand the data challenge tasks related to maps to fetch geometries from OpenStreetMap. How much time do we have left half an hour? Not. Does anyone have ideas for a specific data challenge task related to maps? Let's just go through the tabs. I still have to open all the water bodies. List of all the rivers that end in the Mediterranean and rank them in descending order of length. We could try that on the Sofox, maybe. So end in the Mediterranean, get that bit from Wikidata and the course of the river in Sofox. So there's the user interface for the career service. I need to start with... No, there are examples. I would need to start with some kind of river. Other water reservoirs and dams. OSMT waterway. Let's try... Let's just guess that something has Subject OSMT waterway river Limit 1 and OSMT name name and what's down there? OSMM lock core nets. Thank you. I hate this toast down there. I can never get to it in time. Okay, now we have the Chico river with coordinates which is this subject. So let's copy this, comment all of this out. That was with the off key held down by the way and then describe this thing and see what else it has. It has a way... No, the way is what we're looking at and the way is waterway river, user name. Okay, let's use the rel instead and see if we can find something useful there. It has ways, ways, lots of ways and the ways have coordinate locations but I'm not seeing a line path or polygon there or something. So I guess I would want to look in the wiki if that's OSM data stored polygon generating polygon files. What does this do? So Fox only provides access to centroid points, not geometries. It doesn't have the info for the river. Okay, that is unfortunate. It's this. Okay. Then maybe we have to give up on the idea of using So Fox. Or at least I would... I think it would be better if we go through some other queries because trying and failing to figure out So Fox is probably not the best content for this session. So if you have any other ideas for queries feel free to dump them in the etherpad in the meantime. I could maybe look through the queries I wrote for the data challenge yesterday and see if there's anything interesting in there I could talk about. Notting the centroid shows something as a result as rivers are split into many short parts. Okay. So maybe that would be useful after all. But we would also need the relation between something and the wiki data item. But okay, let's go with that. Let's describe wiki data. This one, which I think is a major river in France, isn't it? In Switzerland in France, okay. Don't think that one ends in the Mediterranean? No, it does. Great. Then that's an example. Yes, okay. OSMT wiki data. So we look for... Select... We have a rel OSMT wiki data. This thing. And then the rel... Let's look for everything those rels have. Oh, that is a lot of results. They have... what? It's supposed to be all the ways as the predicates and the inner or outer as the value. Okay, that's weird. But then OSMM has was the one I remember from the other results. So we have ways. So OSMM has way. And then the way has... What were the coordinates down there? OSMM lock coordinates. And then we want the coordinates and display those on a map. And now we have... Oh yeah, that's certainly a river course. The river seems to be branching out here, which is unusual. I assume that might be tributaries of the river. Let me zoom in. Okay, we only selected the coordinates, so we don't get to see anything more. But okay, so we need to service, make a federated query to query wiki data or Sparkle, where we get all of these rivers that end in the Mediterranean. So that is all of these. We're not interested in the length part, but the river... We're only searching for rivers, not water courses. And it ends directly, not indirectly in the Mediterranean. And then we have a relation for that river and the way has coordinates. And let's use the river as the layer. So then we should see the different rivers in different colors. And it might take a while also. Oh, that's interesting. There's three update icons for different parts I guess. This is taking a long time. Let me reload that and add a limit of 1,000 maybe. 10,000 would also be okay, but if this returned like 100,000 points, then my browser would probably struggle to even display it, so I don't want that. It's taking a long time. This part should be fairly fast, right? Select where... Yeah, that's very fast and returns 60 rivers, which is not the end of the world, really. But it might be too much for Zofox already. I don't know. There might be many, many ways linked to those rivers. Anything else in the chat? Let me see if there was anything else here in the meantime. Not really. Speaking of all these data challenge queries, I assume I will... I don't remember if the organizers of the data challenge were going to publish the queries that were sent in, otherwise I will put them on the wiki page somewhere, so if you want to look at them. But while we wait for this, for the list of all the rivers in France, rank them in descending order by length, I used normalized quantities, so length in meters, just in case anyone is specifying river lengths in the birthplace of the international system of units as miles or something like that. I thought that might be an option, so let's use normalized units just in case, or someone might be using kilometers and someone else might be using meters. Anything in there? Yeah, this query got a bit weird. Other people had a map that also looks similar, because all the rivers in France, highlighting the ones that end in the Mediterranean, means that you get some rivers in French guinea and, well, whatever other overseas parts of France there are. I assume this is something in La Réunion, right? No, that is... Where is this? French territories. But, yeah, that's how I did the highlighting. I'm curious if someone of other people tried to filter out these points, these various other rivers, which are technically also in France. Oh, yeah, I was confused for a second why some rivers not on the coast here were still highlighted, but, of course, if they're in Corsica and they end in the Mediterranean, yes, then they're still going to be marked like this. So, yeah, that is actually correct. I did not notice that yesterday. Did this return in the meantime? No, but it also didn't time out yet, so Sovox gives us more time, apparently. Right, this one was funny. All the images of different goat species. I interpreted this as, give us all the images you can, which means, obviously, using structured data on comments. But it has to be obligated query service query, so I'm using the MWAPI service to make a search query against comments with HasWB statement P180 of all the species item I could find. And then also get the images from WikiData from the image statement, just in case, and then combine all of this. And also the goat species was a bit ambiguous for me. I first thought, I first searched for goat and items which have parent tags on this item, Q2934, but that turned out to be no other item, only goat. So I've been looking a bit further up, or I looked for goat items and then found Capra and even Caprine, in English Caprine or something, which are some more general families or genera of goats and decided to pick this one. And then you get 820 results of many, many, many pictures of goats. I assume there's goats in here, they're just very well hidden. Goats in Bowie Bowie group. Oh, Petroglyph! Oh, those things here! Those are goats, or depictions of goats. Wow! That is amazing. Okay, that's a great goat. So that's how I found my goats for this query. I thought maybe this might be a language-dependent thing, because if we look at this item, for instance, in English that's Capra, but in German that's Ziegen and Ziegen is just goat. So if I had searched for goats in German, I would have found this item rather than this other one over here. So how you write the query might actually depend on which language you're choosing, I feel like. And then these Caprine are... I think the common name was goat antelopes, for some reason. And in German they're alike goats. So that's what I picked as all the species of goats at the end. Yeah, using NWAPI to find all the images on comments. Sort them by conservation status. This was the one where I realized it can't be correct that it's just a one goat item. Coordination of all the shipwrecks on Earth. I think I can skip a few here. And we've discussed this one already with the developers of the COVID vaccines. And then a list of all the scholarly articles on COVID-19. I actually did that completely with another NWAPI search. Because something like searching in label Wikipedia would be terribly inefficient in the Wikidata query service. You could do something like item RDFS label, item label and filter contains item label. Wikipedia. Or if you wanted to be more inclusive, include more items you could do. The lower case includes lower case Wikipedia. But the query service doesn't have any optimization for this kind of search. So it would just have to go through all the labels of all the items ever and check each of them if they contain the string Wikipedia which would be horrendously inefficient. And that's what elastic search is much better at which is why I'm using that. And then I thought if I'm using that already I might as well use that as well for instance of scholarly article and has statement what this I think main subject, COVID-19 because it should be an article on COVID-19. And then the query, the Wikidata query service actually does nothing more than build an item ID out of the title and add the label. And that's my query for this. You did this by selecting the title property and contains it was really? Okay, that's surprising for me. Interesting. And yeah, I only found six results with that. Maybe there are actually more results. Oh yeah, title or this label. What do I just do? Don't limit that to enable but search anywhere. Do I get more results? Same six results. That's interesting that it worked with the title property. I wouldn't have thought that. A map of all the volcanoes on earth, color them by country, and then the least common properties. But that's not that interesting I think. So Fox is still working. Was anything else added to the Etherpad in the meantime? Doesn't look like it. We have 10 minutes left but less than 10 minutes. This one I guess is kind of interesting, the query of all the shipwrecks on the earth. I looked at the Titanic item because that was the first one I could think of even before looking at the immediate next query which says RMS Titanic. But yeah, and on RMS Titanic, this one I noticed that it had instance of shipwreck and also a significant event, shipwrecking. So I figured I should include both of those in the query and say that the item could be an instance of shipwreck with coordinates or it could have a significant event. I put something wrong in the comment there but I think I selected the right item which is shipwrecking in this case, not shipwreck, and then as a qualifier the coordinate location of that. Then grouping that by the item label and selecting any random coordinate. So if an item actually matches both of these, such as the RMS Titanic, then we just pick any coordinate and that is fine. Okay, let me look at that query for the scholarly articles on COVID-19. Instance of scholarly article. Oh right, of course the main subject COVID-19 is going to narrow it down quite a lot. So that, yeah, okay, that works. Okay, but same, it's still six results. So, and this actually looks very suspicious. Is that the same paper? Quantitative science studies. Giovanni Colavizza. Giovanni Colavizza. Publication date, 14 May. December. Maybe it's the same article published twice or maybe it's actually something slightly different. I don't know, it's suspicious. I'll just drop it in the telegram chat and let someone else figure it out maybe. But yeah, okay, I didn't realize that. But yeah, of course, once he limited to main subject COVID-19, which is probably only as count where that is, okay, that is 69,983 items. So that's still a lot of items, but it's much less than the millions and millions of items of checking all the labels, what I had said earlier. So yeah, okay, I didn't consider that. That makes it work rather better, of course. At this point, we could also check for any subclasses of scholarly article, but I'm pretty sure there actually aren't any subclasses of scholarly article that aren't broad use. But let me check that maybe. Class count as count. Class is a subclass of scholarly article and instance of the class, group by class. Okay, that is actually very, very slow. Just surprising. What is the preprint of the other might be? How we model preprints. 15 results. Okay, but this class, whatever that is, actually has two million instances, and that is a review article, I see. And then this one has a case report, has 100,000. Let me do something else. This is also instance of scholarly article. Bind exists instance of scholarly article, as also instance of scholarly article, and group by that as well, or by desk count. Because as long as those, maybe all of those items, are instance of scholarly article and of review, and then we would still find them with our query. Or maybe they are not, and that's what I'm trying to find out here. This one, meanwhile, timed out. Okay, so yeah, you can't include subclass of academic article or scholarly article because that makes it time out already, whereas without that, we get results in a few seconds. Six results. And the other one, that's still going. It is still going, yes. Anything else? I also realized when I pasted that link, or those links into the telegram chat, that was like 30 seconds before anyone could know why I was doing that. That is a time out. Damn. Okay, then let's just bind review article as class, limit ourselves to that one, and hope that that works quickly enough. Or I could just go through what links here we have. We are almost out of time. Don't remember if anything was in the schedule after me, I need to leave now, or there's still maybe five minutes or something. This is a review article and it's instead of tourism in the region. Oh, good. Great item. Architecture of Norway is a review article. Country Norway counts categories. Wait, are people interpreting this as article as in that's what the Wikipedia article is? Oh, no. Literature review is a subclass of review article. This one sounds like a review. How much can we boost IQ and scholastic achievement? 1996, 69. That sounds like a great time for IQ research. That's a review article and an academic journal article, but not a scientific article. If you search only for instance of scientific article, you will miss some items, so you don't really have a choice, because including subclasses probably makes the query time out. That is not the nicest note to end on, but okay, apparently I can go until the end of the hour. Do I still have anything to talk about other than would someone please look at this tourism in Lithuania, which is not a scholarly review article? Systematic review of... Okay, so this one is a scholarly article and a review article. I mean, we can count them. Discard all of this. Select count as count, where item instance of scholarly article and review article. That takes a while. I guess it has to go through those 2 million review articles and check for each one if it's also a scholarly article. That's the more efficient way of doing that, because the other way would be even worse, but that's going to take a while. If you search for scholarly article, you also get the lovingly miscategorized book reviews, because their reviews and reviews are review articles, aren't they? So, folks, okay, now there's a timeout. I don't know what the timeout was. It doesn't tell me, but yeah, okay, this doesn't work out. I'm afraid. You could look at the results of all the shipwrecks, which was kind of interesting, I guess. This one is still running. 24,000 results. It takes a bit to plot them on the map, but yeah. I think I mentioned this in the GSD channel at the time. There's a lot of results around, specifically not even Great Britain, but Scotland, apparently, and also a very suspicious line here, which I now realize might be the line which points straight down to Greenwich Observatory. No. If we scroll down, where's Greenwich? So, the line is about here, and if we scroll down, we find right around London. Okay, I don't know where exactly Greenwich Observatory is, but I think it's pretty close to London. So, this certainly looks like it could be around zero degrees longitude. Then again, it seems to be slightly slanted to the upper right, I feel like. So, maybe it's... I'm wondering if this is a bug in some import, or if it's actually real data this way. But anyways, it certainly looks like some project did a big import of shipwrecks around Scotland, and that's where all of this data comes from. I'm trying to open one of them, but I think my screen froze again. Yeah, now we have it. Unnamed shipwreck Canmore, 102,000 something. Does it say who imported this? Canmore is the database. Quick statements, temporary batch. Okay, I have no idea who imported this or when. But yeah, we have loads and loads of shipwrecks around the coast of Scotland, and then also plenty of them scattered around the rest of the ocean. But that definitely looks like a large concentration. Okay. It'd be nice if we can wrap it up. Yeah, I think with that we can be done. If you have anything else, feel free to always contact me on Twitter at WikidataFacts. Maybe I'll also look at this Etherpad for a bit. And yeah, thanks for your attention. Thank you. Thanks for the session. All right then. So I think we're at the end of all the sessions for today. And we're also at the end of the main track. So first off, thanks to Lucas and all the speakers who gave such amazing sessions today. And thanks to everyone who joined. I mean, the chat was super engaging and super nice. So what's next? What else you can do today? I think in an hour or so, we have a community Eurovision watch party. So I think it's in the hours. It'll be in the kitchen, I think, on work adventures. So if you haven't seen work adventure yet, I encourage everyone to check it out. It's super nice and super cool. And then tomorrow, we have a showcase in the evening hours. So you can register on the hackathon, the key page slash showcase. There's an etherpad. You can go and register there. And we also have a group photo session tomorrow. So, you know, come get a screenshot taken. We'll put them all together in the end. So we have a nice hackathon collage. And just enjoy the rest of the hackathon. And thanks for coming today. Bye. Bye.