 Oh, sorry. So apparently we started with the left foot because that wasn't the first slide. Okay, yeah, there we are. So, well, thank you very much for being here. Just in case, this is the session about web to sit tool to collaboratively fix automatic citations in Wikipedia. My name is Diego de la Era. I am from Argentina. So if you are Wikipedia editors, you might be aware, of course, that citations are a very important part of Wikipedia, right? We need to include references to support what we are writing on Wikipedia. And in certain citations can be a little bit problematic and actually quite, we have to invest a lot of effort on it because you have to find what the name of the authors are publication date, publication source. So there is a tool in the visual editor that's called the automatic citations tool. Let's just call it like that, that greatly simplify in certain citations. Because, for example, this is a tool that already exists. So if we want to site a URL like a newspaper article, we just get the URL for the newspaper article, we paste the URL on this dialogue that shows up, click on generate, and then out of magic, visitation appears. It gets the item type, the name of the authors, publication date, and so on. So far, so good. This is really cool. But the problem is that this doesn't always work as expected as you may also know if you're a user of this automatic citation tool. So for example, usually what most usually happens is that the item type is misidentified. So we are citing a newspaper article and it says that it's just like a web page. Yeah, it's actually a newspaper article. So it's better if we use that citation template instead. Or it misses the name of the author, or it confuses the publication source with the name of the author. So it says that the author is, I don't know, the times. Well, that's not the author. That's the publication source and so on. So to understand why this fails, we need to understand how automatic citations work in Wikipedia. This is going to be just a very brief explanation. So when we enter a URL in the citation dialogue and we click on generate, we are actually using a service maintained by the Wikimedia Foundation that's called Cytoid. So Cytoid gets the web page that we are trying to cite, it processes it, we are not going to go into detail how it processes it, and it outputs this citation. And to do this, Cytoid is actually relying on a third party library that is not maintained directly by Wikimedia community, that it's a library that is part of the Sotero project. Just out of curiosity, Sotero is a reference management software and they do maintain this, they call them translators. So translators, what they do is exactly this. You have a web page, these arrows here represent the translators. So they translate the web pages into citation metadata. So this little here represents citation metadata. So this is great. This works very well when the citation metadata has been embedded on the web page in a structured manner, in a standard manner. So if the webmasters have followed some recommendations to include that metadata embedded on the web page, then we can use what we may call generic translators to get the citation metadata out of those web pages. But the problem is that, well, this is like in an utopic world. In reality, many web pages do not include this metadata correctly structured or in a standardized way, or they do that partially. So the way how the Sotero community has dealt with this is they write a specific translator for each of these web pages, actually for each of these domains, or sometimes a group of web pages, like for example, if there is an editorial that uses the same format for all of the newspapers they publish, so then that translator can be used for all of those newspapers. But the problem with this is imagine first, we need as many translators as web pages exist out there, which is a lot. And also the problem is that sometimes if this web page is changed, and sometimes just very slightly, then the translator no longer works, and the citation metadata that is extracted now is wrong. So are you following so far? Do you have any questions and comments? We can also, we will have time so we can also make this a conversation as well. So if anyone has a comment so far, a question? Yeah. Do we have a mic for? Yeah, okay, we'll repeat the question. Okay, thank you. Thank you. Thank you. So you're saying that when there's not a specific translator for a web page that community of Sotero creates a translator for that page, but you're talking about sites, not individual pages, right? Just to make it clear. Exactly. Usually it's sites or set of sites, not individual, or sometimes it might happen then that within one site, maybe for example, I don't know, like the actuality or like, I don't know, like culture section of that newspaper. I'm talking about newspapers because it's usually like the most common case here. Maybe the cultural part of that newspaper uses a different format than the, I don't know, economics. But so usually it is by site. Sometimes it is by set of sites and sometimes you might need more than one per site. Yeah, but usually it's by site. You're right. Thank you. So, so far, like we saw, okay, well, so we may have a problem on the web page. Maybe the structure metadata, the metadata is not well structured. Maybe the translator, which is the arrow in the middle is outdated because the web page changed, we said. And also, well, we might that produces an error in the extracted citation metadata. So how do we fix this? Well, one way would be to manually fix the citation metadata. Yeah, but that's, you know, this takes time. And not only does this take time, but also we fixed it for that specific use case we wanted to make. But then if somebody else wants to cite another source from the same newspaper, then they have to do it again and again and again. And there's also one more problem that many of you, if you're Wikipedia editors must may have come up with, which is if the item type is misidentified. So if it's if site it sets is a web page, but you wanted it to be a newspaper, then you have to start from scratch. You cannot use the name of the author that might have been well identified or the publication date. You just have to say, I want to insert the citation manually. And you have to do that everything by hand, right? So this takes time. And it's wasted time in a way because we cannot help the rest of the community with that time. Another way to fix this is we go to the webmasters of the sources we want to cite. And we try to explain why it is so important that they use structured metadata. I mean, I think this, this is the way to go definitely. But, you know, this might not always work as we expect. So it requires discussions and well, conversations and so on. And then the other way to fix this is we fix the translator. So the Sotero community is very open. And they are very willing to collaborate and to accept contributions. So it is just a matter of fixing the translator. Why not? Well, the problem with this is that those translators are written in JavaScript. So you need to have some programming knowledge. I'm not only programming knowledge. So for those of you who might be familiar with programming, this is actually stored in a software repository. So you have to actually clone the repository, make the changes, ask the maintainers to merge those changes into the main code base. This takes time apart from knowledge. And then after that has been done, we still have to pull this into the, we say cytoid code. So this is usually takes lots of knowledge and lots of time. So that's that those were the motivations to go with this project, the Web to Sit project, which was financed by a grant from the Wikimedia Foundation. The idea behind Web to Sit is not to create an alternative to cytoid, but rather an extension, like an add-on, like a workaround. So where cytoid is working, we continue using cytoid. But where it is not working until it is fixed upstream on cytoid, we have a way to patch it in a collaborative way so that the community can take care of that in a way that is technical, but is not as technical as having to write a piece of JavaScript code. And in a way that is much more immediate because it's completely under control of our community, the Wikimedia community. We make the change. It's immediately live. We don't have to wait until it's merged into the cytoid code until it's pulled downstream by the cytoid code base. So, by the way, I will be very briefly talking about this, and this hour is not going to be enough to talk about all of it. After the talk, you can learn more about the project on that meta page. Luckily, the name was maybe unexpectedly widely decided so that if you just look for Web to Sit on Google, for example, or whatever your search engine is, probably it's going to be the top result, so you don't have to write it down. But anyways, the slides are also available from pre-talks and whatever. So, how does this work? Again, it doesn't replace cytoid. This is very important. It is just an add-on, a community-maintained add-on. So, where cytoid is working, we just take what cytoid says and where it is not working, we patch it with the community input. I will now demonstrate how to use this, but before going to the demonstration, are there any questions, any comments? Go ahead. Maybe you explained that, and I missed it, but these, well, simplified translators, can they be automatically converted into Zotero translators, or is this something that needs to be done manually? Beautiful question. It's a very good question. It is one of the ideas that we had in the original project, but we are not there yet. Yeah, it would be nice to do that. But still, the translators that you can, translators or extractors that you can define in Web to Sit are relatively simplified because we wanted to come up with something that could be done by people who are not proficient in coding, in programming. So, whatever extraction strategy that you may come up with with Web to Sit would be suboptimal for a Zotero translator. I mean, it could work, but definitely it might be wiser to eventually create a Zotero translator or even come up with a completely different strategy. Like, well, when I present this, usually the question pops up whether we shouldn't be relying maybe on artificial intelligence to produce these citations in another way, like machine learning models, providing them with lots of examples, and have that model learn from those examples so that the output is produced in a more automatic way. But yeah, I mean, we have this as an option. Actually, we are tracking every open task and suggestions on fabricator projects. This is one of the tasks that are open, but they're definitely not even far from being addressed yet. Any other comments or questions before I proceed to the demonstration? Okay, so now yes. Oh, sorry. Maybe if you can say how applicable this will be for the different languages or it is the focus was on English language until now or how you see that for all the other languages? Yeah, sure. Yeah, well, actually, when we were thinking of this, we were thinking where should be because the idea is that the community is going to be collaboratively defining a set of configurations Web to Sit configurations for different domains, right? So we were wondering where should we store this configuration should we store it on Wikipedia maybe should we store it on I don't know like on a on a git repository should we store it on meta. And finally, we maybe on I don't know on comments. Finally, we decided to store this on meta, because we want the configuration to be across languages. Because we thought the metadata that comes out from a newspaper, it doesn't matter where you're going to use that metadata doesn't matter if you're going to use it in English Wikipedia, in the Spanish Wikipedia, also by Johnny Wikipedia, the metadata is always going to be the same. What changes is how you accommodate that metadata, but that's out of Web to Sit business, because that is taken care of by citation templates. So as long as citation templates are taken care of by the week by the court by the corresponding Wikipedia communities, citation metadata shouldn't change from one language to another. So answering part of your question, whatever configuration is made by an English user, a Russian user, Russian speaking user, sorry, a Spanish speaking user, it's going to be it's able to be reutilized by users that speak other languages. And also when we started this project, we thought this would make later. I mean, we did some research. This is very long, but the project included a research sub project. Our initial assumptions were not exactly supported by the research results. We thought initially that this would benefit Wikipedia languages other than English more. And we thought that because many of these maintainers speak English, we were expecting that there were more Sotero translators for English sources rather than for other sources. This wasn't supported. I mean, we cannot say from the results of our research, this is true. But of course, it usually happens in research. We cannot say it's false, but we just can say it's true. But yeah, we were expecting this that because it sometimes happens, like if you're an English speaker, and you have a source that doesn't work, imagine that you know how to deal with GitHub repositories. You're pretty much well prepared to ask someone at the Sotero team, hey, I'm having this this problem. Can you please help me here? Whatever. But if you're a Spanish speaker, for example, and you do not speak English, not only have not only do you have to know how what a GitHub repository even is, but also you have to understand that you have to actually write in English, because most of the Sotero contributors are English speakers. So of course, that represents a challenge. Yeah. Well, I don't know what that was. But maybe it's the ring reminding me that I should go to the demonstration. So thank you. So again, this is going to be a brief demonstration. The tool, as I said, it's a it's still a technical tool. So don't expect this to be very easy, definitely not, unfortunately. But if you want to get more documentation, I would recommend you to go to the project page where you will find lots of documentation pages. Again, and as a disclaimer, we know that this could be simplified way, way more. And we hope that if we get more people interested in the project, then we can justify continue working on it and making it simpler to use. So first thing first, first things first, and sorry, the production team that I'm messing things here. I'm sorry. I don't know what they did. Probably I just unplugged this. Are we okay again? Thank you. Sorry, sorry. Okay. You know, they were telling me maybe you should do that. I was no, I'm going to do it my way. And of course, they were right. So sorry. I should either I don't do my presentation for this part. But let's see if it if it goes back. Okay. And of course, it's not full screen, but it doesn't matter. Okay. So the first thing how to use it, we don't have to actually contribute to web to sit to use it, we can use the configurations that other people have created and actually leverage from those contributions. So to use web to sit, the first thing to do is to install it as a user script, you know, that we you can extend how your Wikipedia looks like different ways you can change your preferences, you can install gadgets, and you can install users scripts. So web to sit is enabled on Wikipedia on whatever Wikipedia actually that uses visual editor. By the way, and between parentheses, if you're not a visual editor user, you can also use web to sit, but I'm not going to focus on how to do it now. I mean, we can discuss how this can be done. But the easiest way if you if you if your Wikipedia uses the visual editor. So starting from the web to sit homepage, you will find like a section which is getting started. And well, in the getting started section, you have the installs section. So I mean, following the instructions here or even the more advanced documentation in this other in this other web page, you will install the web to sit user script in your Wikipedia. This is something that you just have to do once. Yeah. And actually, it really surprised me that for example, we presented this at the wiki media hackathon in May. And somebody for from the alseba Johnny Wikipedia really liked it. And for example, even he was one of the maintainers actually of the Wikipedia. And he even installed this by default for all of the Wikipedia users, which was something that made me really happy for poor people, actually. So well, how to do this? Well, you I'm not going to go into the details, but you have to open a file like a page on your Wikipedia, that is the common JS file. This is a personal file of yours. And you have to add some code to this, to this file to enable this users, users script. So actually you go to this common JS file, which is your personal common JS file, and you will copy this code down here and paste it at the end of this file, which is actually this part here, probably this part in my code. Mine is, well, it looks different because I have it actually installed somewhere else, because I want this to be working on all Wikipedia's. There was also a way for you to install it for all Wikipedia's, not just, for example, English or Spanish Wikipedia. But again, so just to be clear about it, first step, you install the user script following the instructions that appear in how to install web to sit. And how do you know if the installation has worked successfully, where, well, if you're editing a Wikipedia article, in this example, I'm editing my sandbox, which is like this area where you can just test your Wikipedia editing skills and screw it up. So if I go to edit, and remember I'm using here the visual editor, and I click on site, you will see this little web to sit checkbox there. So if the checkbox is on, it means that the automatic citation is going to use web to sit. If it's off, it means that it's not going to use it. So let's go with this example. I got this example yesterday, just a random example from one newspaper in Singapore. So I get this URL. I want to enter this URL as a source in Wikipedia. So I paste the URL here, and I click on generate. So because I have the web to sit user script installed, I should get two results instead of one, the result, the result of hello, hello, hello. Can you hear me? Yeah, okay. So the result at the top is the, the regular result, the one that comes from side to it. This one, if we want to change it, you know, we can do it manually, we can convince the way the webmaster to change it, we can change the sotero code. Remember what we said before. And the one from below is the one that comes from web to sit. So far, both are the same. And that's because web to sit has not been configured, has not been configured by the community for this source yet. So it's just using what site it returns. But the difference with this second result is that we can change it using web to sit. Okay. So if I want to do that, I click on where it says web to sit. Sorry. And this is going to open, let me just maybe zoom in a little bit. This is going to open like the translation summary page. So it says, okay, so for, for this, for this path, for this web page on the today online.com domain, this is the translation output. This is what web to sit is returning for this specific web page. Like this is the item type. It is a web page. That's actually one of the, one of the errors we identified here. So it says it's a website. It should be a newspaper article. The title is okay. And also maybe the source it says today, which is the name of the newspaper, but it doesn't have the publication date. And it doesn't have the author name. Yeah, that's missing. And that's reflected, of course, in this, in this table. It says it's a web page. The title is okay. The publication source is fine. The language has been identified as English, which is also true, but the publication date and the author names are missing. So always when you want to edit web to sit configurations, the first thing that you should do is tell web to sit and also the community of web to sit users, what is actually the expected output? Maybe we don't know how to get it, but the first step is this is what we should have gotten. Maybe we will later see how to do it, but first things first, this is what we should get because this guides the rest of the community to understand what is actually that we are expecting for this web page. So to edit that, which is this part here, the expected output, which is currently empty because nobody has configured this yet. So I go here where it says edit, I will add a new translation test, and this translation test will correspond to a very specific web page, which is the web page that I tried to, to insert. So I'm just going to copy the URL path and paste it here. Just the path, which is the part that goes after the domain. So this is the path of the web page that we tried to insert. And we will start field by field, telling the web to sit community what we are expecting from this web page. So I add my first field, which is the item type field. Remember we got web page, but what did we want? A newspaper article. Exactly. So we're going to choose that from this list. It's a newspaper article. So we go next with the next field, which is title. The title, remember, so the site result was fine. So we can just copy it from here, but we can also just go to the original source, select the title, and paste it in this field here. Okay. So we continue with the author name. I'm not going to go into details here. We have two options. We may split first and last names. This doesn't necessarily make sense always. Sometimes it's the name of an institution. Sometimes some languages do not make that distinction or it's not clear. So the author last name field may also be used for the author full name field. And that's the field I'm going to use. So back to the original source. This was written by Charlene Goh. Okay. So next field would be the date field, which is the publication date field. It was published. It says here in Summinin published on the 14th of August 2023. Librarians would tell me if I should put August 14th or August 15th when it was updated. Please don't be mean with me. I will just choose August 14th. So 2023, August 14th. Yeah. I have the mic here. The caps lock in the author name. Okay. In this case. It's a good question. When we create the procedure for extracting the data from the web page, after we selected what data we want to extract, we can apply transformations to that data. There is one transformation step that has not been implemented yet though, but there is one transformation step that is a capitalization transformation step that you can tell it, make it a title case, make it all small case, make it all uppercase. This is not yet implemented. So it's not working. So you won't be able to deal with it at this moment, but that's the idea to deal with that at the transformation stage. But that will make sense when I go to the procedure. This is just the test so far. So here we say what we want the way we want it. If we want it in small letters, we just write it in small letters. Yeah. So I continue. Well, this is the data again. Sorry. I'm not respecting the format. It should be like this as it says down there. I'm going to zoom in a little more. So the next field would be the published in field, which is the name of the, it says here the work, the name of the work, which contains the cited resource, which is the name of the newspaper. And again, we think that Psytoid was okay with that. So I'm just going to write it down today. That was the way how Psytoid got it. And then finally, we have language field. We can, we can write it, write this in different ways. As it's explained here, I'll just use EN for English. So this we said is going to be saved as a configuration file. So we go here where it says review changes and save. This is going to produce a file that we don't understand. We don't need to understand what it means. So we just go to the bottom of the page. We may write like edit summary what we did here, like configured, a test case. And we just publish this as a wiki page. It's being published on meta as a configuration page for web to set. So if I go back to the translation summary page, which you may remember, if I refresh this page, now the expected output column is going to show what we are expecting for this web page, right? So we are expecting a newspaper article, mud out web page. So because we have defined an expected output, which we can get for each field as core. So for the item type field, this is failing. We're getting a 0% score because web page is not newspaper article. For the title, we're getting a 100% score because the titles match. What we got is what we wanted to get for also for the published in field and for the language. And for the author and publication date, we're getting a 0% score because Cytoid was returning nothing and we wanted something. So far, we've defined the test case. Next, I'm going to show you how we can change this column, what we actually get. But before going there, I'm wondering if we have any questions or comments. I know this can be tricky and complicated, but also know that we do have some resources, like online workshops that we have made in the past that are recorded and available on the web to Cyt page. But we can also, if you have comments now, we can address them now. Yeah, it's coming over here. Thank you. We want to hear your voice. If someone else edits the expected output, what will happen is diff in that wiki page? Yeah, exactly. That's also one of the reasons why we decided to use meta as a repository for these configuration files. We could have come up with a repository of our own, but that would have required extra work to have like a repository that would use the same like user permissions that wikipedia uses that tracks differences between one revision and the other. So you can just check maybe not in the most friendly way because it's just a JSON file. I mean, we do have some ideas to make this better and easier, but you can just check a diff as you would check a diff on whatever wiki page. So you can see exactly who contributed to that file, what they changed from that file and what exactly from what to what they changed. Yeah. And because we are sharing the same file across the whole community, if you said that the publication date should be the 14th of August and somebody says that the publication date should be the 15th of August because that's the date it was updated. They can change it. If you don't agree, you can change it back again. They would start an edit war and you would go to the discussion page and talk why you should it should be the 14th and not the 18th, 15th or the other way around. Okay. Any other comments or questions? Yeah. Okay. Thank you. Diego, I'm falling at the first hurdle. I'm trying to follow along with a problematic website we have in New Zealand called papers past. I'm failing at the first hurdle because even though I've got web to sit enabled, it's only giving me one option, which does not include the web to sit link through which you accessed this thing. I'm assuming there's another way to access this tool to actually put in the test case. So it says you say that the checkbox is enabled or the web to check is enabled, but I've only got the site or each option. I don't have two options for some reason. Okay. Oh, nice. Okay. Yeah. Okay. Yeah. I mean, I get two options on other websites. So it's not that the tool isn't working for me. It's that papers past isn't working. Okay. Interesting. Well, we can, we can definitely get to those and see what's happening. And maybe we can open a ticket on fabricator to have that address because that would be a bug. Thank you. I like this. I always remember when, well, this is maybe not the best example for a Wikipedia conference, but I remember when I think it was Bill Gates that was presenting, I don't know what version of Windows and it crashed during the presentation. It was amazing. So if it happens to Bill Gates, it can happen to me. Okay. Okay. So let's go. And if we don't have any more questions or comments, I can show you the other part of it, which is the, how we change the output. Okay. So let's go. So I'm going to change, I'm going to click on edit here to change the translation output and to trace the, to change the translation output, we will have to define a translation procedure. Yeah. So this procedure that we, that we will define, we're going to be basing that on a specific webpage template, but then the same procedure will be able to be used on other templates of the same domain. Yeah. So this is important because otherwise, if we have to define a procedure for every webpage, this is, this is becoming more complicated and actually, you know, fixing the citation manually in the end. But the idea is that we define this for just one webpage template in the domain. And as long as other web pages from the same domain look like the template that we've chosen, then that procedure will be able to be used for other web pages. We won't discuss this in detail here today, but should there be different formats within the same domain, then we can define multiple templates for the same web website so that if the page looks like this, use this procedure, if the page looks like that, use this other procedure. But in this example, we will just show how to do it with one single template. So I add a new translation template and I'm going to say what webpage path I'm basing this template on. And it is very important that I specify this here because if other web-to-seat contributors come later and want to change this, they need to understand why we define the procedures the way we define them. So it's important we tell them on what template we base this on. So this is the web page we are basing this procedure on and we will start again field by field saying how we are going to do to get the expected output for that web page. So first is the item type field. So it is likely that all web pages from this domain are going to be a newspaper article. So it makes sense to tell web-to-seat just return newspaper article for this domain no matter what webpage you are getting from this. So for that we have in this procedure stage we have two steps. The first step is the selection step where we tell web-to-seat where to get the data from and the second step is the transformation step where we change the data. Like we change it to remember small letters instead of capital letters. So we have a selection step type which is the fixed selection which is always return the same thing no matter what webpage it is and we are going to tell it to always return newspaper article. And that's it for the item type field. So we continue with the next field which is the title field. Do you remember was Cytoid returning something good for this field? Yes. So there is a selection step which is the Cytoid selection which tells web-to-seat just use what Cytoid says for this field. And what field do we want from Cytoid? Well we want the title field because that's where the title information is. So we just say for the title gets what Cytoid says for the title. So we are done with the title field too. Because of time constraints I'm going to skip some interesting parts. I'm sorry. So maybe I will skip. Let's see what happens with the time we have available. So for now I will just change the item type and I will reuse the rest from Cytoid. And if we do have some time I'm going to show you how to change the author and the date field which are a little bit more complicated. So we continue with the published in field and again Cytoid was okay here. So when I add a new translation procedure the default configuration is getting what Cytoid would return for the published in field. And finally for the language I add a new translation procedure. So just use what Cytoid says for language. So this is the template that I'm going to configure for now. I'm going to save this configuration. This is going to be saved on a separate file on meta which is the translation template file. So I'm going to put here a configured translation template. So now if I refresh the translation summary page the item type has changed to newspaper article instead of web page. Well the rest remains the same because we told web to sit to reuse Cytoid and author and publication date we didn't configure it so we are still getting nothing there. But at least it improved a little right now. This is actually maybe a good example of why it is important to do this collaboratively. Maybe that's the only thing I know how to do. I just know how to change the item type. I have no clue how to change the author date, the author or the publication date but I was able to configure what was expected. So I've done a lot. I configured a test and I changed the item type. And then someone that may not speak English so maybe for them it's difficult to understand who should be the author or what should be the publication date but understands how to use web to sit a little bit more can come and configure the procedure for the author and the procedure for the date. This might I mean in a conference where we're all speaking English it might sound weird that we may not know what the author or the publication date is but I've come up with pages that have characters that I don't even know what they are and for me it is very difficult if that's it what's there it's the public the author name it's like or maybe it says published by John Doe how do I know what is the published by part and what is the John Doe part I mean it's very difficult so having someone say that before what is the name it's very useful. So if I go back now to Wikipedia and I generate the citation for this web page again now the item type has changed in the the item type has changed in the citation that came from from web to sit yeah which is the only thing that we were able to configure so far okay and well do we have I'm gonna show just one more thing we still have those sometimes some time but do we have any questions yeah I don't understand why you we have the check box if you are still going to see both outputs so why not have it always enabled and people can just choose the one that is most appropriate yeah it's a very good question and the reason is that I'm a fearful programmer and I wasn't sure whether I would break stuff so I wanted to make sure that users would have a way to go back to original behavior without having to uninstall the user script which which is actually a little long to do this is already quite stable and I haven't seen any situation where it actually breaks things worst thing that might happen is as Temse found that you didn't get the result from web to sit is though although it's a little weird and that's the way I would like to check but why because it's an interesting situation so we could remove it but but yeah that's the reason why it's there but definitely we could just remove it yeah and by the way and what I wanted to show this is one article right but then I said that one template should be able to be used for other web pages from the same domain so this is another I mean I'm not I don't even I'm not even reading the title so I hope I'm not doing something wrong here so I'm just copying this URL for this different article from the same source if I copy that URL here and I click on generate hopefully the result from web to sit should say newspaper article instead of website because it is using the same template that was defined for the other URL for this new for this new URL and well let's just let's just do it like I'm not gonna explain in detail how I'm gonna get the author name but let's let's just show it so that you you know that it's possible that I'm not lying to you but but also so that you know that I'm not lying to you that is not easy because we want to get this that it's here for example so how do we get this part in an html an html file is the file that defines how the web page is going to look like or what the content of the web page is so sometimes to refer to specific parts of a web page there is a language that it's called xpath that lets us refer to specific parts of the web page like for example pick oh they want me out like pick the element that has this specific class or maybe pick the element that is under this other element yeah that's a way so you have to use a language that's called xpath of course this is technical you need to understand how to for example use this that opens here on the side which is the inspector of the source code yeah so for example and I'm gonna I'm gonna go a little fast here because I don't want to bore anyone but sorry so if I use like this selector and tell it like okay where's the name of the author here so for example this is the element that has the author name and it has this specific class I know this is not the best way to to refer to this element but I'm gonna use it anyway just to show you that this is possible so this is the name of the class of the element where the name of the author is it's like again this is a technical parenthesis I promise it's not gonna last more than two or three minutes just for those who may follow but we're gonna go back to to more simple stuff in a moment so I just edit the translation template once more I add a new field this is gonna be the author field and I'm gonna tell it use the xpath selection and the way how I'm gonna configure the xpath selection is get whatever a element this is an a element sorry for those that are not following it's just one second so this is an a element so pick the a element that has a class that reads like that this should work let's see so I review the changes and save this is gonna update the configuration file I'm doing this myself but somebody else could come and do it instead so if this worked when I go here and I update the translation summary this may not work by the way it's getting the author name now and now if I go back to wikipedia and I use the url again then the author name is here right um now the last thing that I wanted to show you is yeah I'm no expert but it seems to me that this is pretty easy so I don't understand why ctoid wasn't doing this I mean it's no I mean I mean the name wasn't there something it wasn't the name wasn't there is by chance I mean it wasn't it was it seemed to me that it was well formatted I mean should it be should it be easy for ctoid and Zotero to find the name with such a class or it is complicated for some reason I don't understand because I don't really understand anything about this what is your intuition why do you think it is well formatted what were the visuals yeah the tag was the author profile or something so it I mean maybe it's not so common I don't know that that's not so common okay that that class that the webmaster chose to define the class of the element that contained the author name is something that he came up with okay okay thank you very much there is a structured way by the way which is very common lately which is called jason ld jason ld is how is one of the ways how you can include um structured metadata in your web page this is increasingly being used web to sit does support jason ld I didn't show you how to use it because I mean it was another way to get it the problem we have is that Zotero is not supporting jason ld they plan to support it they've been planning to support this for very long so there are lots of web pages that do have structured metadata it's like you cannot complain to them because they will say yeah we're using jason ld which is a very standard format but Zotero is not supporting this probably they're not supporting this because they want to do it in a better way with web to sit we came up with a good enough way to deal with it so some websites that do have the data on jason ld you will be able to pull it from with web to sit and the thing that I wanted to show you is the last part web to sit also has that you can also access through the the project's home page there is I'm not gonna just I'm gonna just go there and you will find it on the on the home page I'm sorry because I want to have some time for questions there is a monitor that is regularly checking all web to sit configuration files it's checking the test files and the template files so this list here are all the domains for which the web to sit community has configured configuration files right so regularly every time there is a change on the configuration file or every 30 days the monitor runs the tests and check if the output is matching the test or not and it returns a score according to that so if for example for www.cranista.com which I think is an Argentinian newspaper the tests are matching the output so we have a hundred percent score I can open this and check like the details in this in this page but then down below I see that for example this elespectador.com which I think is a uroguayan newspaper the score is just 52.38 so probably either it was never configured well or it could also have happened but one day the page changed and what was working one day is no it's not working anymore so good thing is that you can subscribe to this page as if it was whatever page so whenever the score changes you should get a notification telling you hey this web page that had a hundred percent score now it's having a fifty percent score so maybe you should come and check what's happening maybe you should update the template or maybe somebody vandalized the like the or vandalized or maybe just unintentionally broke the the test file and maybe you have to check what happened and change it again yeah again I'm saying this very fast we do have a problem with those notifications unfortunately because of the way how this bot is working we should fix this soon you you wouldn't get an email actually you would have to check on the on your notification on your watchlist page which is not ideal but know that we're working on it so that's it for the presentation I have another question how do you deal with paywalls because maybe I saw the author names because I paid or I I have so I had some free articles I could read but the the software that you use to watch the page cannot access the article that's an excellent question and and there we behave just like cytoid behaves which is we do not deal with them we cannot deal with them so if they we keep media servers do not have access to the source then cytoid or web to see it won't have access either unfortunately yeah it depends what error it is being returned by the web server yeah ideally you should get 0% or whatever for mismatch I think there is a bug going on that in some cases you get an error instead of a 0% an error might translate to the problem that times he found which you don't see citations at all um yeah that that's actually it's I think it's something that it's being discussed for cytoid even and well and that's the before because I want to continue discussing this until the very end but before that I would like to to thank my my my colleagues in the web to sit project this was again a project that was financed by the Wikimedia Foundation we had a community and communications team led by Eveline Heidel from Wikimedia's as the Uruguay we also had a developer team where I was there and also Dennis Tobar from Wikimedia Chile and we also had a research team which also from the web to sit a project page you can find out more about the research that the research team did which was the people at the research team were Jimena del Rio, Nydia Hernandez and Romina de Leon so thank I would like to thank to all of them and also thank you for your attention but not we still have five minutes and 36 seconds I think so if anyone has any questions or comments we can have them now and you can also contact me on using whatever of this email or Twitter or mastodon or I don't know handles and also you can also write on the discussion page of the web to sit project and also on the web to sit project page you have a section about contributions so how you what you can do to contribute I mean we can also discuss about this during the conference as well but if you have any questions or comments now okay thank you thank you check check I was going to ask you told that you are planning to further develop them into Zotero what are they translators translators so I was wondering for example would do you have in mind other outputs such as creating entries in wiki data if sorry can you create creating entries items in wiki data for for these citations for the sources okay let's see if I if this replies your question because again we try to keep the web to sit project as small as possible so that it's just an add-on for cytoid I do know that there are discussions around cytoid well first like if cytoid should for example check whether there is an item on wiki data for the item that is being cited and replace for example that citation with a site q template that refers to the qid of that source for example or also like if there is not an item yet to create it and also there is I know there is also discussions about whether there shouldn't be an easy an easier way to create wiki data item out of the cytoid response but because web to sit is an add-on to cytoid whatever is addressed on the cytoid level should include would include web to sit so if this is addressed it makes no sense that we address this at the web to sit level we should actually focus on addressing this at the cytoid level which is actually like below or above I don't like yeah okay does this answer your question okay thank you and I think yeah I think we are um but you will kick me off why should I kick myself off oh and just one one last comment because I made this look so easy you know you know when you know when you when you go to like for example like a circus show and you see the people doing acrobatics and they're smiling and you say oh this is so easy I should definitely try trapeze and you go back to a home and you cannot believe like how somebody can do that well um you might have that feeling when you go back to your like home or whatever and try to do it just if possible not be frustrated immediately and contact me and I would be able to to help you hopefully to sort it out okay so I think we could just close it so thank you very much for being here today and looking forward to seeing you in the around the conference thank you