 Excellent. So shall we begin? Yeah. Please take a seat. Hello, everyone. I'm glad to be here to present actually some of my thoughts about the new trend that is occurring all over the world about using LLMs for actually enriching Wikipedia and Wikidata. So that project actually is a part of a most important project about the development of the use of Wikidata in the clinical practice. Yeah. So this is a lecture, but it is not a very technical lecture. We are actually mainly focusing on the practical side of the thing. And we will try to simplify what LLM mean in a way that it can be intelligible with people with no background, with no knowledge in computer science, etc. Yeah. So as you already know, this year was where it was a heavy actually echo about LLMs. We have that chat GBT strike. And then we have many competitors that have appeared on the scene. And actually, as you have seen during the conference, there were many sessions about LLMs. Have you attended some of them so far? Yeah. Yeah, quite a few. As you see, there are many important applications of LLMs. Many people say that they will change the world. Yeah. So now we have Bart that is issued by Google, chat GBT of OpenAI. And there is another competitor that is growing a little bit that is called the cloud. Yeah. So the main assumption I have seen around is that we can let LLM edit Wikipedia and Wikidata. LLMs can create Wikipedia and Wikidata items from scratch. LLMs can evaluate Wikipedia or Wikidata. Well, this is not true. At least for the moment, a very simple explanation of LLM will let you know that LLM are not that reliable to let them take the lead. Well, how LLM works? So what anyone will do is that he will formulate a question. But this question actually is biased. What that means? That means the answer that you will get will depend on the answer of the question you will formulate and give to the machine. And by that, there is all new discipline called prompt engineering that is concerned specifically in how to define the question called as prompt. Yeah. So what LLM will do with that prompt? So it will divide it into some chunks of a few words. And so it will use these series of words actually to find out what the question means. And so what will be done by the LLM after doing that is that it will search around the internet using embedded techniques and positional coding to find what fits to that question in some way and then generate the answer in a way. Yeah. So how the answer is generated actually? The transformer will use the tokens of your prompt to identify the beginning of the sentence. And then what will happen is just it will use probability to complete the answer. So it will take some words and it takes a question and then it will autocomplete the answer one by one by one by one as you see here in blue until it is finished. So you see after a few moments actually and after this incremental approach is finished, you will find the whole paragraph. But this is the problem actually because if the LLM generates long answer and the chunk of the words that is used to build actually the answer actually is limited. The LLM will kind of forget by the end when he did his start. So actually he will generate some kinds of annunciation by the end. And also it is mainly trained to give the answers that he was that it was actually used to train it. If the answer does not exist, it will not give actually the right answer. It will just kind of predict the answer or guess it. Yeah. But the only thing that is good about LLM is that it can give you the right shape of the answer with certainty. Because it is mainly based on embedding and positional encoding. And these things actually kind of are very straightforward actually to find out how the answer should look like to be credible. And this is the thing actually that probably the Wikimedia community should benefit of. Yeah. So concerning actually the thing about hallucination, there was a paper right before this vague about Chagipiti that has appeared two years ago. It is called On the Daesher of Stochastic Parrots. And this is actually written by Emily Bender, one of the most renowned linguists in the world. And this kind of paper actually shows you why LLM actually hallucinates. Yeah. So now that we have known how LLM works, so probably we can jump in to see what are the rules that we can follow actually to use LLMs to edit Wikipedia and Wikidata instead of letting it do all the stuff. Okay. So the first rule is quite predictable. It is that we should not let LLMs do full pages, quick statement batches. We should not also rely on Wikidata IDs that are generated by LLM-based chatbots, etc. So as you see here, there is an example. And if you evoke it, for example, a very common concept that exists in Wikidata like a plant, you will find the right one. For example, here, if we ask, we charge it about the concept that corresponds to plants in Wikidata, it will return Q756. That is the right answer. But if you search about a market in a place, in a random place like this one in Akragana, you will find something random. Alright. So instead, what we should do instead? Actually, we should do as usual, do a search on Wikipedia or on Google Scholar to actually find scholarly publications that are corresponding to the topic that we would like to arrange. For example, if we would like to go for a Wikipedia page about Tunisian science, we can actually search for papers related to Tunisian science and actually use them to write the paper using, to write the Wikipedia page using chatGPD. As well, actually, we can also do our check on whether a source is reliable using the criteria on Wikipedia reliable source. So it is quite simple to do and it does not cost anything. The third thing, actually, is that you need yourself, after collecting the information from different resources, actually, you need to generate the summary of the paper. You don't have, like, when using the classical format when you edit Wikipedia, to write it as a structured paragraph. You just need to write valid points about what each reference includes. And so, like this. Main points, like I have done here, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6. And you can even sometimes copy, if the text of the resource of the reference is short, you can even paste all the paper or all the article, the newspaper item, if you have one, and paste it on chatGPD and ask it to do the valid points for you. And it will generate that if you compare the paragraph on the top, but with actually the main generated points, you will find that they completely fit. But you need to verify that it does not actually generate valid points that are not in the text. So human verification actually is very important. And actually, if you write in the prompt that please rely only on the text that I have provided to you, actually this will help to prevent the LLM to generate hallucination for you. The point four is that you need to do a precise prompt. That's the key point. So you will have to experiment with the prompts, write several queries about for the LLM to generate you the right Wikipedia page for you. So for example, you will say to it please include all the references that I have provided to you, provide all the valid points. If you are using, for example, Wikidata ID, please be specific. Say that I am talking about these items. These are the Wikidata items that are interesting for me to include in my quick statements batch. And please use them. And these are the Wikidata ID of the staff that I would like to include in my batch. Also use the ref tags actually to include the reference after every new statement. This is what you need to specify to the prompt as well. So you need to tell it please actually add these ref tags all over the articles so that the article can be very viable. Also, you can ask the LLM actually to include categories and info boxes and they generally work. Yeah, the rule number five actually is that you can use the LLM for language proof reading and translation. And actually, many of the papers have about chat GPT and part and many other resources have proved that LLMs are better in translation and proof reading than in geratin text. So if you have a text, give it to the LLM, it will translate it for you. And just like the state of the art machine translation models. Yeah, but but for that, you need to check that the source and target languages are used to train the LLM because not all the multilingual LLMs are trained on all languages. So there is a specificity that you need to check before applying this to an LLM based chatbot. The six rules is that do not directly put the input of LLM in the Wikipedia page. So please, if you use that, please put it in a sandbox so that administrators and other people verify it before sending it actually to the main to the to the main name space. Yes, the same for wiki data. Please check quick statements batched before uploading them to wiki data. Yeah, so that's all what I have to say here. I hope that the session was a bit light. I tried at the end. So if you have any question, please jump in and ask me. These are my contacts. You can actually ask me any question if you need, for example, my business card. They are here. Please contact me. I will give my business card as well. Thank you. Yeah, any question? Yeah, the slides actually are available on comments. Let me show you. How do we get the slides? Actually, the slides are available on comments. You're right. On comments, the category is wikimania 2023 presentations. And you will find there all my presentation. There they are. Actually, the presentations are also streamed on YouTube and you can find them there as well. Thank you very much.