 Hello everyone. I'm Daniel Kinzler. I work for the Wookie Media Foundation in the core platform team and today I would like to talk to you about untangling media wiki. This is really awkward, right? Kind of talking to myself so I don't even know whether you can hear me. I sincerely hope that someone will tell me if it's not working. I don't know if there's three people still having coffee or if there's 30 people eagerly awaiting my words. So this is a little bit odd. But I'll do my best and we'll get started. So this presentation I think is useful mostly for people who are working on media wiki extensions or contributing to media wiki core. But also to people who may be interested just what the foundation is up to. Maybe also to understand why some things are just taking as long as they do take. So over the years media wiki has grown into a big ball of mud, a big tangle of code of classes that all refer to each other. I have a nice story somewhere about how we got where we are. It involves a winged elephant with a hat on. Unfortunately I think we won't have time for that today. But I'm happy to tell it some other time. So yeah it kind of looks like this except it's way worse, right? There's just hundreds and hundreds of classes and there is no rhyme or reason to what depends on what. And that makes things hard to understand. It makes it hard to fix. It makes it hard to predict what effects change will have. So we want to break this up. We want to break it into components. And maybe we want to separate out things that are responsible for user interface or for search layer. Maybe something for managing page content, something else for managing users and so on. Having things broken into components is good in general, right? It makes things easier to understand. It makes things easier to change. It allows individual parts to be reused without depending on the rest. And it also provides a way to have clear ownership of parts of the code. So a part of the platform engineering team set out on an expedition to untangle Media Weekly Core. And we call it an expedition because we know where we want to get to. We know the rough direction. We know the first few steps. But after that all the details are unknown, right? We are sure we will encounter obstacles and surprises. Maybe we will get snowed in or hit a canyon and have to backtrack. That's just how it is. If you want to know more about this, you can find more information about the expedition team on MediaWiki or the MediaWiki Wiki, of course. So the very first step is to break apart big classes that kind of straddle the boundaries between the different components. And these classes, kind of everything kind of depends on them. And they are big themselves and depend on a lot of things. So they tie everything together into this big knot. They're kind of like the nuttiest bit of the knot. And if you have ever worked with MediaWiki code, you know these classes. They are called title, user, language, and so on. They represent the core concepts we have. But because of that, they kind of encompass everything, right? Everything uses them and they use everything. And also they are kind of unclear. I just said that they represent our core concepts. But these concepts have become blurry over time or have grown in scope. So for instance, the title has a page ID unless the page doesn't exist yet. And it has a namespace unless it's an interwiki link. And it has a page name, unless it's a relative section jump on the same page. It has an associated talk page, unless it's a special page. And there's really no part of a title that you can be sure that it will be there. Similarly, a title can be viewed, unless it's an interwiki link, then we can maybe redirect you what we can show it. It can be watched in your watch list, unless it's a section jump, right? You can rename a title unless it's a special page. So there's really also no operation that you can be sure you can perform on a title object. And that makes things really unclear and confusing. And unclear and confusing vocabulary leads to frustration and bugs. So our mission is to replace these monster classes that bind the code into a big knot with the lightweight vocabulary that we can use to communicate between these components without these components depending on each other, knowing about each other. So we are fighting the monster classes. The first monster class that got slain was revision that actually happened quite a while ago. Currently, we are using, we are working on the user class and the title class. And after that, we will look at the language class. For a rough roadmap, in MediaWiki 1.36, we have introduced a number of new interfaces and value objects to represent concepts that were previously covered by user and title. In 1.37, we will see a lot of old methods and interfaces be deprecated. And in particular, in 1.37, we will start to migrate hook interfaces away from using the old types, which typically involves replacing the hook with a new one, unfortunately. And in 1.38, I hope that we can start to delete the old stuff so we can actually see the benefits of replacing them. As long as we have all the old stuff still around for compatibility, we are just, we add kind of like a nice alternative to the old tangle, but the tangle is still there. So for the title class, a long time ago already, we introduced a link target interface that basically represents whatever you can put between square brackets as a WikiLink. And now, in addition to that, we introduced the idea of a page reference and the page identity and the page record, not a page page record. What happened there? Okay, I'll fix that later. So a page reference is also kind of like what you can link to in an inter WikiLink, except that it's just a simple inter Wiki, sorry, what you can link to with a WikiLink, but not an inter WikiLink, not a section jump, not all of that, just a simple link to a page that takes you to a different page on the same Wiki, right? Page identity is the same, but it's only for pages in the database, so it would not be special pages, just pages that have a page ID in the talk page. And page record would include all the meta information like when the page was last touched, what the latest revision is and so on. User, we are breaking apart into a user identity, that's basically just a user ID and name, an authority object that represents everything that a user can do in the current request, including any restrictions on the IP address or any restrictions imposed by OAuth grants and so on. And in the future, we will probably introduce something representing the user account with all the metadata about when the user joined, how many edits they have, their email address and so on. Over the last few months, we have made quite a bit of progress. The yellow and red lines that start out at the very top of this graph represent user and title, the biggest monster classes. And you can see their usage slowly declining over time while the usage of the alternatives has been picking up. You can watch this progress using the link on the slide. It's the Monster Monitor running on Toolforge. This has been a very useful compass, I'll say. Like every measurement, it only gives a rough idea of what's happening, but it shows that we are moving in the right direction. So the art here is to move slowly and avoid breaking things. Because MediaWiki is not just the software that brings to Wikipedia, it's of course a framework that others build on. And so we have to provide a stable interface that people can rely on. And we have worked to refine the stable interface policy for MediaWiki quite a bit over the last months and years. And the idea here was to give people more certainty what they can rely on while at the same time giving core developers more freedom and more clarity on what they can change. In general, if you're developing an extension, calling things in core is fine. If it's public, you can call it. However, you should in general not be instantiating classes, you should not extend classes, and you should not directly override methods or implement interfaces unless they are marked to be safe for this purpose. Another thing that is worth noting is that the implementation of the stable interface policy relies a lot on the code search tool, which allows us to search not only through core and all the extensions maintained by the foundation, but everything else on Garrett as well, plus a number of repositories hosted elsewhere. The MediaWiki stakeholder group is maintaining a list of non-Wikimedia foundation extensions. Most of them hosted on GitHub. And this repository just has a list of sub modules that list all these extensions. Everything that is in there is reachable by code search, so we know about it when we do refactorings. So if you develop an extension, and it's not in Garrett, and you want us to be aware of how you use core and whether we need to be careful not to break your extension when making changes, you should make sure that you get yourself into that list, which I believe is as simple as making a pull request on GitHub. Now, how do we change things without breaking them? There's a little compatibility dance that we came up with. It's explained in detail on the expedition page as well. We define new interfaces. We make the old class implement these interfaces. We change the method signatures to accept the new types. And then we replace methods that return the old classes. And eventually we can remove the old class. This is very abstract. I just wanted this to be here as a summary. I've tried to visualize the idea. Here, the red circles are in, okay, the grayish blocks are methods with a header that defines the method signature and a body that represents the implementation. And the circles of course are the expected type. And the dots are whatever is actually used inside the implementation. And we start out with just the old types being used everywhere. And an attractive idea would be to just look at one class and just replace the old stuff with the new stuff. But then if we only have the new type and we try to use it to access some other component, some other class that still expects the old type, then things clash. That won't work. So you would have to find an order where you kind of do the basic things first that don't depend on anything and then build on top of that. But the very problem we are trying to solve is that everything is tangled in cycles of dependencies. So there is no such order that would be safe. So what we do instead is we modify only the signatures at first. And internally, wherever we need it, we convert from the new type to the old type, which in most cases is just a cast, not an actual conversion, so it's quick. And so any calls to other components will still work as before. And eventually, all the signatures accept the new type, even if the implementations have not been changed. But at this point, we can very easily change the implementations one by one or all at once, because we know everything is already accepting the new types. And this also works across component boundaries between extension boundaries. So what we're doing is replacing the monster classes that bind the code into a big knot with a lightweight vocabulary for communicating between components, as I said before, without breaking things. So a little time check here because I have more information about how exactly we do this with respect to extensions and with respect to how the new classes that interface relate to each other. But I only have five minutes left to speak. And I would like to know if there's already questions or comments. So if there's already questions or comments, let me know. There are if the questions. I think it's going to be three minutes to answer them. So you can like seven minutes to speak more. Okay. I think I will skip over the part where how we do this in extensions. If there are questions that relate to that, we can come back to these slides. I will talk about the type system a little bit because if you are observing the changes being made to core, it might be a little bit confusing what we are up to there. As I said, we are trying to replace title or let's say one aspect of title with the idea of a page reference. Page reference is basically anything that you can visit as a page on the wiki. Something that will show up as a page title in the URL. Internally, it just has the db key, so the normalized name and the namespace ID. Derived from that, we have the page identity, which also has a page ID where the page, this doesn't have to exist, but it would need to be something that can be created, right? So it's not a special page, for instance. And on top of that, we have a page record that's always an existing page that has all the metadata. For these three interfaces, we have an implementation that is just a value object that doesn't depend on any services and that is not mutable. So kind of the idea that we want. Problem is, how do we get code from using title and wiki page, actually, to using the new types and to allow for the transition in the way that I described earlier, we need the old classes to be compatible with the new interfaces. For instance, title implements page identity because title can be a page identity, right? It often is. Unfortunately, it's not actually always a page identity. A title object can also just be a relative section jump or an interwiki link. So making it a page identity is, well, a little bit cheeky, kind of a lie. So what we did is we made title implement the page identity interface, and then we made another interface, the proper page identity, that actually provides all the guarantees of the page actually being able to be created. A page identity looks like something that can be created as a page, but it could still happen that the object that you have is, actually, can't be created. And if you try, something will explode, so you will have to check on the page identity, which, you know, in this stage, things are just a little bit confusing and you have a lot of different interfaces to juggle. We have a similar situation with wiki page, a page record, right? Wiki page can represent a page that does not yet exist and is about to be created. So page record, in that case, has all the metadata set to null, and if you try to access it, things go wrong, which is kind of unfortunate. So we have the existing page record interface that gives all the guarantees. And of course, the new value objects implement these, the interfaces that give more guarantees. But keep in mind that this is temporary. Once all the code is migrated away from the old types, we can fix this. The idea is that the only implementation of page identity that is not at the same time, sorry, the only implementation of page identity that is not a proper page identity is the old title class. Similarly, the only implementation of wiki page that is not an existing page record is wiki, sorry, the only implementation of page record that is not an existing page record is wiki page. So once all the code is migrated away from using the old types, we can make them aliases, right? Page identity and proper page identity will just be the same thing. Existing page record and page record will just be the same thing because we got rid of the old classes that violate these guarantees. And we will again end up with a clean type system and we can retire the additional names eventually. So this is how... Five minutes by the way. Five minute warning. Okay, thank you. Yeah, so this is how we go about doing this compatibility dance. And we are, well, we are about to release 1.36. We will be doing this for quite a while yet. I would expect to get to this clean situation, sorry here, in perhaps 1.38. So we only got started on this expedition and you'll have to bear with us. It'll take a while, but I really hope it will be worth it. Thank you. That's it from me. So what are the questions? Thanks a lot. Let me ask them very fast. Will title value die with this refactoring? No, title value is the value object implementation of link target. The name is a bit unfortunate and I have been thinking about renaming it. The name could be changed to something like link target value, but title value as an idea will stay. Okay, we have three more questions left. Is there a specific place that discuss vocabulary used in the API? Not really. I'm trying to think in the API, or you mean the the web API in terms of just the terms, like use page instead of title or something like that. All this refactoring does not really touch on the API, but I can see that it can be confusing to use the word title in one place and page in the other when it means the same thing. And sometimes it doesn't, right? And another question, what is the best way to follow this by looking at the expedition page updates, maybe? The best way to follow this is probably the fabricator board. We have there's an expedition tag on fabricator. We are not, I have to admit that we are not super great at tracking every change, but you can follow the overall progress and the overall apex there. I can add a link, where should I post the link? I can put it, sorry. You can later put it to telegram maybe, because I don't think we have a stream chat here. Okay, I will do that. And will the presentation be published? This was another question. Yes, I would very much like the well the recording to be published, and I can also put the slides online. I think the slides by themselves are not super helpful, but I can put them up on comments anyway. Okay, and the last question, would the internalized localizable API something practicable? Rolefully speaking, ability to to alias classes or methods? I did not quite understand the question. An internalized, sorry, what? Internalized localizable API. Is it something practicable? The question is this. A localizable API. I'm trying to think what kind of API is meant here. I'm not quite sure I understand. Is the idea that we have a web API, where the module names and parameters are not in English? Maybe that is the idea. That's really unrelated to this refactoring, but I'm not quite sure I understand. We can perhaps discuss it on IRC or our telegram. Okay, and the next the another question came, where is the expedition page? So, we already said you will send it to the telegram. Yeah, I will send, I will put links into into the chat. The expedition page, you can just search on mediawiki.org for expedition and you should find it. Okay, now we have to finish the session. Thanks a lot, Daniel. Thank you. Bye.