 Hi, so I'm Robert. So what I'm going to talk about is indexing articles on Wikipedia. So without much further ado, I'll just start now. So about myself, my home wiki is on English Wikipedia. Permain edit on Singapore related articles and I have been actively maintaining our index, the index of Singapore related article for the last four years since 2019. So what I'm going to talk about next is why we want to create an index and what's index and how to create an index. My target audience is basically those who want to create an index for the topics that you want to that they are monitoring. And unfortunately, we will require a bit of technical knowledge because right now it's still pretty scripted. So for now it's just people with technical knowledge will probably will get the gives of how I extract the information. Okay, so first, what's an index for those who do not know? It's just an alphabetical list of articles and a system of the list itself as well. So list of all article that is available on Wikipedia for any broad general topic. So for example, it's Singapore related articles. It can be any other countries as well. It can be Malaysia, it can be Japan or even just a thematic topics even. Okay, so why you want to create an index, general is just these three reasons. So first, it's a quick way to find out if a subject article exist in context of the general topic. So let's say I want to find a topic about maker space in Singapore. Yes, we can use a search. But on top of a search bar, we can give you maker space article directly. We have a maker space in Singapore. It's just not listed here because they do not have an article. So if anyone want to create article from here onwards, they can do it. They can do so immediately. The next one is to fight pentatilism in that topic that you are monitoring. So how me and my fellow Singaporean editors are doing so is basically through using the related changes on the index itself. So this index right now currently stands at 12,000, 13,000 links, on page links. So we are actually monitoring almost every Singapore topic, almost, I mean, not all these articles on English Wikipedia. So once it comes in, if there's a vandalism edit, it actually serves the same function as a watch list. It's just that this is already a predefined watch list for everyone. And also the last point is discovery of new articles. One thing about indexing, putting on index is that it de-offence articles immediately. So it always kept at a minimum. Right now, inside this maintenance report, I just put this like just a few hours ago. It stands only at three links. And these three links, oh, these three links too were created in the last couple of days. And the rest of the, because I ran the index like earlier last week. So these links are the new articles for Singapore related articles. And also another thing is that in this chart here is between 1st June to 31st, October 2019, right in the middle around August is when I actually run the script to programmatically create the index. This chart shows the spider visits. So many will say your Google board crawlers, your Bing.com, all your search engines. Before that, there's like we only be visited like maybe 50 per day, 50 times per day by the board traffic. But after updating it, we are seeing more board traffic coming in because it's getting updated much frequently, much more frequently. So search engine barang the no index conditions and stuff. It will immediately ping and get all these articles out without much faster. It's easier for the search engine to discover new articles. Right? Okay. So, but here comes the issue. Maintaining index is very hard. It's very time consuming. If you do it manually, it's a boring task. You basically have to copy the links one by one. It's a lonely task as well because no one else will know what, no one else knows what you're doing. Right? And it's maximum effort. Right? Yeah. So it comes to a point where that was actually an AFD discussing to delete 174 indexes. Yes. So all the indexes that is related to country and region topics. Right? This is part of the opening statement. It's a redundant and unmaintainable systems of indexes, articles. Yeah, that this AFD concluded with no consensus because I participated by saying that Singapore list is updated. Sama yang else participated and say that the Vatican City list is updated. The two smallest city states came in and say it's updated. Yeah. So it ended up with no consensus because there's a recognition of the effort on us, the two main editors trying to get our index up to date. There were also other indexes that were relatively updated but not to the extent of our two city states. So now this is just a quick guide. I don't have my laptop with me to demo the crawl and stuff but this is my workflow essentially to create the index. So the general steps is download and set up a script. The script is on GitHub. You can just go and clone and fork and clone it. Get a list of categories. I script users categories to get all the articles out. So you need to identify the related categories first and then you run a script and then you manually update into Wikipedia. The reason why manually update is that I was and I'm still quite lazy at trying to get a board approval to automate the entire process. I was also pregnant because I don't know how well received this will be if it's just only for one index itself. So right now, I was still doing it manually and checking through the changes. But if I want to scale it up, there will need to be more semi-automated or even automated measures. So first, downloading and set up a script My link, the link of the GitHub is, repo is there. One prerequisite is that it's Node.js. Yeah, so unlike most of other people who were in data science or data engineering, my primary language is Node.js, not Python. Yeah, so that is something that I think is more accessible for those who are familiar with using JavaScript, which is most of, I will say most of the web developers or script maintainers on Wikipedia. In the main file is index.js, you have to edit two group of lines to match the current index articles if there is, the heading, the top heading and the bottom one. And then after that, you run MPMCI to get all the necessary Node modules, right? So this is just the script that I've pulled out to show you what you need to edit. So at lines 74 to 78, if you notice, it's the wiki markup of the index header code. So every time it runs, it's already loaded there. And it will load into a text file and I just copy the entire text file and paste it onto wiki itself. I do not need to find where I need to copy out. I mean, where in the wiki article to remove and insert the new list. I just do it wholesale. Same thing at the bottom, between line 95 to 102 is the bottom end of the index where you have things like your categories and the COSO sections. Okay. So this is where I need to replace. Right. And then to get a list of categories, right now, what I do is basically go to the main category page itself. So for example, category Singapore, open your browser script console and basically type dcj query command in. It will expand the category 3, one level down and just keep repeating this until you are satisfied that you have all of your categories open up. One thing to note is that there might be recursive category branches. So say, for example, on Singapore, it opens up to streets of Malaysia, which then will open up to Singapore, Indonesia and Malaysia and I think we'll just go over and over again. So for those who are more than you can inclined, you can start modifying the JavaScript query to only open up, to only open up those that are not on a recursive branches. Right. And there might be also unrelated categories. So that's what I suggest now that also like Malaysia and Indonesia topics, categories coming in. So that's what we remove. And then after that, you copy up into this file category.list, the entire category 3 and you reformat it so that it just only shows the category names. And so this is basically opening up the category 3 and then category list. And then after that, you just run node index.js and you copy the compile text from right now. I just basically loaded as pages 3.txt, but that is the completed index article markup. And for subsequent runs, you can monitor changes with, say, the new article board page itself. So for example, this will be at Singapore search results. It doesn't monitor every time you want to load a new index because there might be changes to the category names, there may be renames or deletions and stuff like that. So that is something that you need to do the maintenance for yourself for now. What's next is that I have plan to make it into a web, collaborative web application with our graphical process flow so that average editor will be able to generate index on their own and also to template the output so that it can be more usable for other kinds of index creations as well. And that's all with my talk. I'm open to any questions. Yes. All right. So questions outside.