 Hello everyone and so welcome to the integrating wiki data into the wikimedia project panel And so we've got a good lineup for you today and we've got five of us who are on the panel And so we're each going to give a five-minute presentation just summarizing our perspectives And then we'll move on to open questions And so this is the order we'll go in so I'm Mike Peel and user name Mike Peel and I'll give an overview of and the issue in general and Nirali will then talk about synchronizing wiki data and wikipedia Erica will then talk about the mbable tool Benoy will talk about the French wikipedia and wiki voyage and Sergei will talk about Russian wikipedia and others And then we'll have open questions And so just as a reminder, please ask questions in the etherpad and or you can ask them in the remote chat as well But these pad is the best place will then select questions which we can discuss live and others will be taken up On the tool page or ethernet and and fold up later If you want to be pinged if we don't get to your question You want to be and pinged when we answer then please sign them add your username afterwards And so I will start with an overview and so Ricky data one of the most important things of Ricky data is it's multilingual So you can edit it in your language You can view content in your language again And you can translate a single label and it's used everywhere that item is referred to So it's incredibly powerful tool for making content available in multilingual way across all the different wikipedia projects It's very multi-purpose from the wikipedia projects perspective It can be used in info boxes in an article text even maybe and if the lists for sure and Categorization auto-categorization can be done based on wiki data and also lots of metadata So the inter wiki links and and things like that and it can be used in all the different projects So wikipedia is one of the most obvious places But also in commons wiki voyage wiki source all the different projects can make use of wiki data It's very versatile and so you can easily access wiki data information through multiple different ways So the native way is through statements and as a Passer function You can also use Lua calls and so we can write a module in Lua Which will call wiki data and bring information in and format it Or you can use pre-existing Lua modules like module wiki data ID, which is one of the main ones So ID for info boxes and but it could also be used much more generally So that will and give you parameters You can specify what information you want from wiki data reformat it and give it back to you in a formatted way you can put directly into the article and You can also reformat using convert or other templates as well And it's easy to select the best statements follow links to other items and much much more However, wiki data is surprisingly controversial So most websites started using a database over a decade ago now right at the beginning of when Wikipedia was starting out on the coding websites and It's you've changed from Manually defining it to using a database and automatically generating pages. We still don't do that in wiki media, which is amazing and Wiki data is one way that you can do that because you've got this structured database behind your articles It's often seen though as an external wiki and people don't like leaving their wiki to go to another one So that can be seen as barrier It's something we're very used to with and commons for example with images You always go to commons to upload your image to and manage images and then you include them with wiki media articles But and doing so with text with data with templates is a different thing Requiring references can be a big issue and so on the English Wikipedia for sure They really want everything to be referenced even though most information on Wikipedia still is not referenced particularly in info boxes And you very rarely find an info box, which has got references in it And but it's something which is a good thing to do and is and something which should be encouraged But it's not very well done on Wikipedia on wiki data at the moment you get a lot of imported from Wikipedia statements But it's not so many references And so that can be a major issue or it can be a complete non-issue so commons for example doesn't care So all the info boxes we use on commons. I'll mention later. They don't require references And there's also a lot language specific wikipedia's don't view multilingualism as a priority Particularly the English Wikipedia thinks everything's in English pretty much. So that's not really a good thing And and it's really highlights and some of the areas where and so I work on the English Wikipedia a lot But I write about things in Spain in Brazil and the coverage of English Wikipedia there is terrible But on wiki data because it's linked to the Portuguese and the Spanish Wikipedia's and has a lot more in it So there's a lot of information that can be brought in there, but it's not seen as a priority that's a different and There's the risk of vandalism as a concern because that can be quite high impact and If someone changes the label into vandalism and I can show up in different places That's become a lot less recently Because now on wiki data all the properties are certainly protected and a lot of the main and items to be used Are also protected, but it's still something that's people worry about because it's not easily in the flow So there's a lot which you can worry about with wiki data, but in the end wiki data isn't used today And and it's really being successful in that so it's very widely used on those wikipedia's as we'll see during this panel For info boxes for lists on English Wikipedia has to be out of main space now Which is not great, but still you can have them all the inter wiki links, of course and Authority control at bottom articles. You can cross check IDs and Make sure that they're consistent between different wikipedia between wiki data and your Wikipedia coordinates and everything And my personal pet project at the moment is the info box in commons categories, which never had info boxes before We now have three and a half million of them and it's completely multilingual So if you're browsing commons in your language, you can find information about what's in that category in your language Which is really new So that's my overview and so wiki data has got a lot of good things a lot more promise a lot more to come I hope so I'm going to pass over to Nehri and who will talk about synchronizing wiki data on Wikipedia No, Ali. Okay So hello everyone, I'm Naila Saho and I'm in a wonderful so Yep, you're okay carry on Okay, I'm sorry for that The network issues so hello everyone I'm Riley so and I have been working on the project of synchronizing wikipedia's and wiki data Along with my mentor Mike Pio for as an outreach intern so moving to the first topic that is The importance of synchronizing wikipedia and wiki data in the first place so Wiki data along with the information of wikipedia can act as a source of information for the other wiki media projects like wiki voice or commons and even for different for the different language wikipedia's like the French wiki or the Russian wiki wiki data Also acts as a collection of only the relevant information from wikipedia articles that to our a very structured way this eases our access to information and helps us in importing these two different Info boxes and different wiki media projects so moving to the next topic that is How are the synchronization works or what is the process behind it? So In the next slide so the synchronization of wiki data and wikipedia Occurs with the help of bots so First of all we find the Articles of wikipedia that we are interested in are the topic Areas that we want to retrieve the information from Then we develop the scripts for the bots which will use them and Perform the imports, but these imports needs to be monitored just to make sure that no wrong information is transferred to wiki data This is done by a discussion among the community members or the admins With the help of bot requests bot request lists the functionality of the bot along with a code snippet or a link to our source code Once it is assured that the functionality of the bot is relevant and it does not Conflict with any existing rules or other functionalities We are asked to make test edits test edits is running the script on a small number of articles For example, let's say 25 or 50 articles and once those test edits are Approved just to make sure key just to make sure that our scripts are functioning without any errors our bot request gets finally approved as a whole and Then we can move to the live edits That is the final import of information from all the articles in our chosen or in our chosen topic area Into wiki data. This can be done in batches of hundred or two hundred to ensure that Our scripts doesn't malfunction in between or it can be done as a whole at once So moving to the next one So during this entire process there are certain hardships or problems of a phase the first one is knowing what to export wiki wikipedia is the house of millions of articles and choosing the correct articles or Topic areas which will be of our interest or relevance to us and others Can be quite overwhelming given the huge amount of top categories or less present in wikipedia Secondly is Second problem is knowing how to export This is related to the development of scripts For example, just like I told you Wikipedia is a home to millions of articles This also means that Every article even if they belong to the same topic area is structured differently For example one article on one person may mention his date of birth as birth date While the other article on another person may mention it as date of birth So writing scripts are developing scripts which will cover up as many structures of as many Number of articles as possible Can be quite challenging So well, this was just a brief overview of why synchronization of wiki data and wikipedia articles Is important the process that's involved and the hardships or the problems that we face during it So moving to the next topic that is I'm about four which will be carried out by Erica Erica seems to have frozen Let's give it a second to see that comes back Thank you so much and hi everyone. Hope you're all So you first a minute can you guys hear me now? Yeah now it's good Okay, so thank you all for having me at this session and my talk today is about the mbable tool that was developed to automatically Generate wikipedia wikipedia studies using information from wikidata. So Mike if you can move to the next slide, please Thank you It was developed by me and my colleague at the portal inspired by a template by Richard nipple from the Metropolitan Museum and This project started in 2018 at the research innovation and dissemination Center for the Neuromathematics here in Sao Paulo, Brazil During a science journalist Fellowship. This was a research on computational journalism and structure narratives using wikidata next slide and Why did we do it? Well in this research Center We were working on how to integrate wikidata and wikipedia to improve knowledge dissemination and the idea was to automate Complex or even boring tasks and to provide a resource for the users that allow them to focus on more analytical Contributions so for example every wikipedia entry on films should include its title data release the main cast who directed it or If it is a comedy a drama or even a documentary, etc And many other information that is already available in wikidata So while we offer this information structure in the narrative the user can focus on Elaborate in the context in each the movie was set It's public repercussion or anything else. And this is the logic behind the enable tool And this is a tool that can be used in education and glam projects as we do here in Brazil And finally it has the potential to support small language communities are creating some articles next slide So this tool was based on the structure narratives Concept that is the development of texts Understandable by humans that are actually automated from pre-determinant Arrangements processing information from a structure their database in this case wikidata And so the enable tool provides the narrative templates in which the gaps are filled by wikidata information basically next slide and How was it developed we use the wikidata I be module and work a lot on elaborating the narrative templates trying to make it sound as natural as possible for the user So the templates for works of art museums libraries archives books Films journals and even earthquakes are quite simple and can generate the stuff using a single wikidata item But it provides less information for the user than the complex ones that we did for the Brazilian elections Coordinate then several items at once and of course the work of modeling them directly on wikidata And we also tested it a lot. So we did a lot of testing with all those narratives So they could sound the more natural possible for the user next slide and How does that work for the user we can't provide a button in which the user can simply add the desire Template and the wikidata item that he or she wants to create an article for We can also add the enable templates on lists so people can just click on the red links and start creating the entries and We even use and be able on wiki books to create a photo album for a glam collections here in Brazil and It was built using the resource in a way that the subtitles for rich photography was created with a narrative template There are also queries in this book to show related images for the user And the cool part is that we also invite the reader to experiment with Metadata apps to collaborate with the creation of the image so that if he or she adds more information to the Depeaks property through those apps it automatically improve the quality of both subtitles and queries in the book for the image next slide and Finally, let me show you an example of an article on Brazilian elections created entirely with the mb able to next slide We have here this Quite complex introduction with all basic information you can see in a good quality wikipedia article Such as what were the disputed positions for that particular election who was elected and from which party With how many votes a lot of data to situate that particular election, right? And it was all automatically generated. So which is this is very impressive and on the right side you can see this beautiful in for box that is actually a work of art by my colleague at the portal and next slide and We also have a section for the results for each round and disputed position of the election So this table is included here in the narrative template which is pretty cool since people have a lot of difficulty in building it manually and Then we also have the same information from the table on the written form including these percentages and every data that is in here and next slide The final part of the article is also automatically generated including categories and some of the references that we built use and identifiers and next slide and Just to wrap things up here I think this is a frozen again It's good as I can see that comes back Although we started this a couple of years ago and we're coming in my background Yeah, you're back now. Okay. So just wrap things up here Although we started this a couple of years ago and we worked on simplifying the code We still need to work on documentation so people can actually collaborate on building more structure narratives And I think there's still room for imagining how enable could serve other language communities and that's it for now Thank you so much. Hi everybody. I'm going to to present Ricky data usage in French Wikipedia with data from rookie data and Another example in the Ricky voyage The usage is a bit different with Ricky voyage because data are from open-suit map outside Next slide please so in French we have kind of classical usage of Ricky data to For for example with biography infobox or monument infobox or even sports event infobox for example All data are from Ricky data and there is The possibility to use the multi-lingual and the multi-grammar if I can say Aspect of Ricky data to have a good translation The gender for example, I took this example as written for This property is correctly translated in French in red actor or red actress According to gender the feminine or masculine gender and this Kind of usage is the cause of debate sometimes as mentioned by Mike just before They They are kind of controversy sometimes at contrary the the usage in the footer of articles Really appreciated by all the community. I think I guess nice you pause for example, authority control and Specialized database ID For hearts Politics sports, etc. It's really re-useful because it's a bit complicated to write so it's really Easy to just put a template just write a template and automatically you can get every idea Like in the next slide, please for example in this article they are I took this example because Anna He's an offer is her mathematician and and she's a cyclist too and There are two infoboxes in her article an infobox for For a career of mathematician and for example, you can see the PhD supervisor The This piece of data is coming from wiki data and there is another infobox With information coming from ricky data too, but related to a career of cyclist she won a gold medal in the during the recent Olympic games in Tokyo and At the bottom of the article you can see you have some ideas of research ideas and you have some research Sports ideas and this data are automatically coming from wiki that Next slide please in Ricky voyage we have a kind of Similary infobox all data are coming from Ricky data population area some Metadata like that for a specific location, but we have another aspect really interesting With data not coming from wiki data, but we get these data this piece of data Riz wiki data Next slide please It's much graphic you can see here. It's a small city in France and you have the The the outline of the administrative map the shape of the administrative map and it's automatic It's automatically drawn actually and it's drawn according the relation Object in open street map. So in wiki data you have only The open street map relation idea and automatically we have the good shape Dwarf ring like that before it was necessary to to write every point of the polygon in a wiki voyage Now it's automatic and data are not specifically in wiki data They are in open street maps. So it's a kind of user of wiki data Where wiki data is a kind of hub with an external Data source that's all for me. Thank you Hello everybody, I'm Sergey Lishina I want to tell you about Russian Wikipedia and some other Wikipedia on the languages of the Russia smaller Wikipedia than Russian In Russian Wikipedia we use wiki data integration from the very beginning like five years ago six years ago And it's almost the same as in French Wikipedia, but with total different stack of models and templates this next slide So For now almost all info boxes in Russian Wikipedia supports wiki data and almost in all of One million and a half articles we have some data from wiki data and difference from other projects mostly that we use Complex logic to show Every type of and every property from wiki data We can use categories from wiki data show awards as icons show maps the same way as French wiki voyage do this And a lot more also we can have tables for population and Not only tables, but also charts But it's it's a lot of small pieces of code that shows Every property separately and also we tried to Generate lead section the same as mby bill, but it's only experimental, but some parts of lead section as Places and dates we widely use code to generate these parts and Also, we tried and we use after generated steps not with a lot of text but with info box and main Parts and short lead section Yeah, and also we have source templates that Little different that English and French Wikipedia has and a search a search control and External insta plate. I will show you on the next slide this Okay, little later and we have the same stack of Models and templates that can be used in different wikipedia's and because Russian Wikipedia is center of big hub of smaller wikipedia's we use the same code in Probably 20 wikipedia's which is smaller than Wikipedia and Difference from Russian Wikipedia's that smaller Wikipedia's really want to use Data from wiki data because they don't want to feel all this data by themselves and Different from Russian Wikipedia. They mostly use universal For box which is another model. It's really close to another model which called data box that very popular in another part of Wikipedia's and It can be used to generate Info box the same way as in commons like for every article and they also use our models for source templates and Because it's they need to create Particle as fast as possible they used some gadgets from to import data from Russian Wikipedia to wiki data and they use this data In the info boxes next like this. So on the left is For walks and Russian Wikipedia. It's totally generated from the wiki data with all pictures with all words It's not so easy to read for most of us But still it can generate blocks of information it can display awards as images and Show and calculate a lot of things and the right part is the box from Bashkirviki pdm It reaches universal box it shows It doesn't have Fields that It's like it shows every fields that it can show from wiki data. It's not not as good as Separating for boxes, but it's really good for small Wikipedia's that we don't want to manage Hundreds and thousands of info boxes next like this Yeah, and this is how we generate sources We have also separate template which is really close to site queue that developed an English Wikipedia and it's adapted for Russian Wikipedia for Russian standards of Displaying information and it's also used in a lot of Wikipedia's Next slide Probably the last one. It's a surrogate control. We it's split by Topics like social networks Encyclopedias and to surrogate control and a lot more and it's really popular in Russian Wikipedia. It's mostly replaced All external links in articles, so it's what what can be a used in many videos and Yeah, okay. Thanks. Next one. I just want to Come back to what Mike mentioned in the beginning even with all this technology even with all these Stuff that we can do with models and to place community not not Everybody in community wants to see information from Wikidata a lot of people don't like that we can control Wikidata and we can check if there is some vandalism in Wikipedia and It's it's still external project that difficult to edit and It doesn't have enough references so Like last discussion about Do we need to use Wikidata or we need to disable Wikidata was this month? even after all these years of using Wikidata, so When somebody wants to use Wikidata as a project. It's not As simple as it looks like. Thank you. Next slide Okay, thank you to everyone on the panel for the presentations and for the audience for listening to us and We'll now go on to questions. So as a reminder, please ask these on the etherpad And we'll select some questions now to answer and other questions will follow up on the talk page And or need to pad and we can also go to a break out room after this as well to have a chat if you want So the first question I'm going to pick is for Erica and So how long did it take to get all of that data into Wikidata before you and could start auto creating articles on Portuguese Wikipedia? Well for the simple templates that only uses a single Wikidata item It it didn't take too long because we were using items that were already available on Wikidata, but for the elections The for the election template it was more complex because we had to actually Module on Wikidata the whole structure of the thing so we could coordinate the Wikidata items and we scrapped our database from our elections For our government on the election so we scrapped all the information from that database And then we module on Wikidata, and we did this work on coordinating the items so we can build the narrative template So it was very complex. So this is why it took almost a year to build under this research Scenario, but once we did it It is possible to do it again with another teams So I would definitely Inspire you to go and try to build it yourself as well because the hard work we already did So it's it is we have proven that it's possible to do it Thank you So the next question is for Narelli And so is the work you presented also being applied to categories and there's a comment differences across Wikipedia's on category Searching a pay-tree significant Yes the work that I do is applied to categories as well in fact Maximum of the work that I've done over this time period has been applied to categories like for example, there was the import of Soccer way ID from those Which use the soccer ID template but do not have that in Wikidata. So that was based on a category of that then Now my focus has shifted to categories and considering lists so Yeah, my this script can also be applied to categories and lists all together Thanks. Thank you. And I'm gonna ask myself a question now Which was when will site QB integrated on Cytoid? So this was brought up during 30s sessions. So it seems like the sources template on Russian Wikipedia is very similar to site Q Which we talked about yesterday with Andy So it would be great if we can integrate that into Cytoid and we need to know who in who's developed Cytoid and who be able to help with that because I think everything with a template is ready to go but It needs some more from Cytoid And so another question and this one's for both Benoit and Surgy so if a functionality between French and Russian Wikipedia is similar Can you imagine a world in which you use the same stack of templates and models too? Of course the string shown to users will be translated to French and Russian. So this is from Amir So yeah, if you want to answer From my side that what can I say? For some templates it can be possible I think for also reticontrol for example At contrary perhaps for info box then there is a cultural dimension there is something cultural and perhaps some meta data are really important under French Wikipedia and not in the Russian Wikipedia and on the opposite side too. So I'm not sure. It's really so simple. It's not only a technical question. It's what I mean I think it's possible to create some basic data frameworks that can be used in both Wikipedia's but Yeah, some extensions for this framework should be done for every Wikipedia separately. So Yeah, we can communicate and we can create some Integration committee, I don't know to discuss our basics of integration, but next every Wikipedia Some rules that can be applied only for this Wikipedia Okay, I think we've got five minutes left So I want to quickly answer a question from Lillian So she tried to use the wiki data info box template in an article on the Spanish Wikipedia Now should they use this template or use the person tab which I guess is The info box for specifically for people. What's recommendation? So I didn't actually realize that the wiki data info box template This is one from Commons and was being used so much and so that's good to see it being used live on Spanish Wikipedia In general that info box has been designed so that it just pulls all the information in So you may want to use a more specialist information and template So like one specifically for people just to reformat and display things in a standard way for people But you can still use a generic one as well If you try to use it on the English Wikipedia, then someone will come along and remove it shortly afterwards Unfortunately, but on the English Wikipedia You tend to use the info box person slash wiki data and things like that, which people don't tend to object so much to And so Another question for Erica and so this generate is for a question from booth and saying this generated article is great Are you flagging to suggest humans might engage with it in a way that flesh is out or natural of the language? Well, we try to provide the template Sounding as natural as possible as in this case of the election because we were able to coordinate several wiki data items on this template So it was easier to model how we wanted the phrases to sound so it provides a better Result for the user by the end and Obviously this template is an automatically Publicated on Wikipedia without the real human revision so people can engage on Elaborating on the template that we provide even the simpler ones that are built with a single item but the idea is actually that we can provide this this Wikipedia entry structure in a way that people can work on more complex and human sections for example on the elections project that we did we we used with Students from the the journalism school here in Sao Paulo and they created these entries and they never edited Wikipedia before and they were amazed by how this tool was working and how it was providing this those complex tables and the text But we required them to add sections on the background of the election Analytical information about it so we could provide that even better experience and a better quality Wikipedia entry so to answer the question I think yeah people can engage with this tool But it also requires a lot of work behind it Before we can actually ask for people to start working on Creating articles through this tool and I think that this is linked to a comment that we also had here on the other pet by Darwin that said that actually this requires a lot of work on Wikipedia and yeah, that's true But if a single user is Encouraged enough to do this work I think that this can be a great resource for newcomers or even more experienced wikipedia's they're trying to build Wikipedia entries on thematic on similar thematic wikipedia entries, right? Okay, thank you And so we've got a few minutes left now and there are a few more questions But I think we should be wrapping up early and so I think yeah, we've got two minutes left to just sit here And so is there any last comments any of the panel would like to say anything we think we've missed or Is everything being quite well summarized? Unfortunately, we've lost me early or we think we've covered everything so There's been a lot of information about what how wikidators being used here And we've talked a bit about some of the controversies and some of the strengths as well So it's gonna be really interesting to see wikidators future over the next few years as these tools and develop and mature a lot more Because they're also very new the last few years, I guess and so and as I kind of bed in and become better should be really good to see I'm also hoping that the community will come more around to using wikidator particularly the English Wikipedia will be really nice to be Properly using wikidator rather than kind of having it on the edge and so something I'm looking forward to Sergey I see your microphone is unmuted. Do you want to say something? No, I just wanted to say that probably we will continue to talk on conference next So, yeah It's can you say where it will be So we'll be meeting after this in building six floor nine table B So I think it's a standard one standard meeting room for after a session in building three so building six floor nine table B Okay, we've got less than a minute left. So let's leave it there. Thank you very much for listening And this is very much an ongoing conversation. So please do come and join us in the table and and See you in the future wikimanias hopefully as well. Thank you so much. It was fun Thank you. Bye. Bye. Thank you. Thank you. Bye