 really happy to present my work here. I'm indeed a computational linguist by training and I was actually trained at Tilburg University in this field in the late 80s of the previous century that is when there was a small but very active group of NLP researchers in our university and I'm really happy that this tradition continues until this day. In this talk I'm going to be talking about GPT-3. GPT-3 is one of the I think the breakthrough AI models of the of the last year. I'm going to be talking about the problems that these kinds of models have but also the implications and benefits that they have for our research. Actually I have another recommendation to start this talk with. It's about a story that Jorge Luis Borgesa wrote, the Argentinian short story writer. It's actually a second recommendation if you don't want to read 4,000 pages on American presidents. These are really short stories and one of my favorites is the library of Babel. Maybe you know it's really a fantastic story about a library where all possible books of a certain length are available in that library. You can imagine that this is really a very vast library. It's incredibly big if you start to think about it. There's one question that I always have when I look at this story or when I read it and that is where do these books come from? There are so many of those so it's impossible that human authors would have written those. Now I'm interested in computers that write texts so you can imagine that a computer could possibly have written all the volumes that are present in the library of Babel. A couple of years ago together with a colleague of mine, Albert Gett, from the University of Malta, we wrote a survey on computer models that can generate texts and there we actually also start with the work by Jorge Borgesa. Now the kind of approaches that we describe in this paper is mostly data to text generation. These are systems that take a database, that take data as input and they generate a coherent narrative about this data. Here's a very old example that we worked on I think 25 years ago. It took data from tailor text data about soccer matches and it describes the soccer match in a coherent textual representation of it, a story that you could even run in a newspaper. The survey that I just mentioned was written in 2018 and of course as luck would have it one of the major breakthroughs in our field happened in 2019, just after our survey was completed and that is when OpenAI released their GPT model. The GPT2 model was released in 2019 and just last year in July 2020 the GPT3 model was released together with this paper. Language models are few short learners that I also highly recommend if you want to have something to read over the summer. This is from of course OpenAI and OpenAI is also one of the companies that Elon Musk famously opened co-funded. So what is the basic idea of this model or of this approach? It's actually a very simple idea and we are familiar with it all of us. Everybody that uses WhatsApp is familiar with the basic principle of GPT. The basic principle is word prediction. It just tries to predict the next word like WhatsApp tries to do. If you go to WhatsApp and you start typing something just start typing let's say then WhatsApp comes up with solutions which it thinks are likely candidates for the next word. So just start typing in or just start typing this or just start typing A and then as a user you can select the word that you wanted to type this in this case and then WhatsApp makes a new prediction of what the following word might be and it's a very simple idea we're all familiar with it and this is also what GPT does in a nutshell. Now you might wonder why is this then one of the major breakthroughs in AI and that is because they really took this to another level. So if you think of how this is done in mobile phones then these kinds of word predictions are really based on just a very limited history. WhatsApp just looks back to the last one or two words that you typed in and then there's a very simple statistical model that tries to predict what the next word is and GPT is an incredibly complex model. It's a transformer model that's what the T in GPT stands for transformer models are the state of the art in NLP at the moment and it's really a huge model it's a deep learning model with 96 layers so it's really massive and also it doesn't just look at one or two words back but it takes very big context into account and it's also specifically developed so that you can give it a few examples of a new task that you wanted to carry out and then it uses all the knowledge that it has about language in general to also adapt to that particular task this is this few short learning that is that is mentioned in the in the title. If you look a little bit closer then the the main novelty of the GPT model is really the massive size of it all so it's trained on an enormous text corpus containing about 300 billion English word many of those come from Common Crawl Common Crawl is a non-profit organization that just collects all the texts that are available on the web there has been some selection before the GPT model was trained on it but it's you can't think of it as really a large chunk of all the texts that is available online then they developed and trained the model and that took 355 GPU years and bear in mind that these are not just your average garden variety of GPUs these are really high-end computers and then it takes 355 years to train that so this is that's really insane and if you think about it it's also something that we couldn't replicate combining all the GPUs in the Netherlands is my prediction. I already mentioned that the resulting model is very big it has over 175 billion parameters so yeah you can imagine that this is really huge it's also interesting that just training the model already costed a few million dollars in energy costs so there's also a concern that these models are not really good for the environment but there's also an upside and the upside is that if you look at the output of these models then yeah that's really impressive it's also the case that the output of these models and their capabilities are often misunderstood and that is something that I want to highlight here and you see that for instance in the papers that report about the impact that GPT will have so there was a lot of talk halfway last year that it's going to be the next major breakthrough and Forbes even mentioned it as the most important AI breakthrough in 2020 specifically because it was so good in generating news reports so let's take a closer look at that this is the example that actually comes from the paper that I just mentioned it starts with a prompt I mentioned that you can give GPT some small instruction before it starts generating text so you give it a title in this particular case United Methodists agree to a historic split and then there's a subtitle those who oppose gay marriage will form their own denomination and then you instruct GPT to write an article and that is what you get here in in boldface or in dark and they report in the paper that this is the text that people found most difficult to distinguish from from human written text but if you look closer at it then there are some problems with this for instance if you I had no idea what this topic was about I hadn't heard about the United Methodist church and their issues with gay marriage so I did a google search and then you quickly find a paper like this which was reported on January 4 in 2020 when there was apparently something going on in the United Methodist church and you can assume that this kind of newspaper article in this case from NPR is actually part of the common crawl data set on which GPT was trained so it's sort of copying information that is already available or that was already available online and if you take a closer look at what is actually in the text some people have done this and they did a fact check on what is in the report that is in the paper by GPT itself and it's also clear that there are many mistakes and things that are stated as facts but are simply not true I won't go over the details here but if you're interested then there's this overview that Steve Schwartz collected of the mistakes that are in the in the example of the text that GPT generated and mind you this is probably the best example that they that they give in the paper there's another interesting paper that that came out early this year where they they they specifically addressed this point and they highlighted these kinds of models of like GPT and there are also other models now they liken them to to parrots so these systems they just produce language and the language is really of high quality but the system really has no idea of what it is talking about and there's no guarantee that the information that is in the text is actually true and it's important to contrast that to the to the old example that I mentioned very early in the presentation about the soccer matches there there is a guarantee by the algorithm that the text that is described is actually true and that is not something that happens with the GPT generated texts and this paper makes the the nice claim by or the nice description by saying that these algorithms are essentially just parrots that rely on statistics which has all sorts of problems this is this is an important paper it's an interesting paper there's also in a way nothing in there that we didn't all know already before it but it's a it's a notorious paper because it let's actually or it was claimed to lead to the firing of timid gay brew one of the open leaders of the ai group at google and you notice that there is another co-author here with a with a weird looking name schmarger schmitzel who was at that time still working at google but when the paper appeared everybody already knew that she wouldn't be working there very long anymore and indeed she was also fired quickly after the paper was published okay so GPT has a yeah applying GPT in in real life has its problems but on the other hand it's also really a great resource for for research and I want to end this talk by just giving a quick overview of the kind of things that we're currently doing with these kinds of large language models one that we're looking at is whether we can use GPT in sort of the way as these old data to text systems were used and given that GPT is a few-shot learner you may wonder can can we not train the model to generate text that is literally true and if so how much training data do we need for that that is a project that I'm currently working on together with Tiago Tiago Castro Ferreira who you see here preliminary findings are here that for simple statements we can really rely on GPT but there's there are a few weird glitches for instance when it comes to representation of numbers but for longer text it's really not clear whether that will work another thing that we're working on is using GPT in chat bolts chat bolts are currently receiving a lot of attention these are computer systems with which you can communicate in natural language we currently have a project where we are developing chat bolts for smoking cessation purposes and we are wondering whether GPT can play a role here now of course we would never unleash this kind of model model to unsuspected users but the question is if we have protocols existing protocols and we if we have scenarios can we use GPT to come up with yeah nice and interesting contributions to to the interaction we pay a lot of attention to monitoring the output of GPT of course so we develop special purpose offensive language detection models and also classifiers where we have a number of different models running in parallel and see which one works best this is work that we do together with Erkan Bazar who's in the AI group at at Radboud University finally a really interesting question is also how you evaluate a system like GPT in our group there's quite a bit of expertise on evaluating output of NLP systems I do this together with Chris van Bele among others and Emil van Liltemberg and a model like GPT raises all kinds of interesting evaluation questions simply because it is so big in general I think it's fair to say that syntax is really not an issue with this model had a kind of sentences that it generates are to a very high degree grammatically correct but semantics had the meaning that is conveyed by these sentences can still be an issue and then questions come up like how can we determine hopefully automatically whether a text that is generated by GPT3 is truthful or how can we are there ways in which we can determine actually that a text was generated by a computer and not by a human that that's really important for this for the application of these kinds of systems I imagine that all our students have access to GPT3 then that would mean that assignments are going to be difficult to evaluate because GPT3 can probably generate a reasonably looking essay that we would score with a six or seven and we might want to check whether whether students use GPT or not so are there fingerprints in the texts that reveal that it was computer generated GPT2 had clear fingerprints it often started talking about unicorns in the third paragraph but that issue has been addressed in GPT3 and another interesting question also ties in a little bit with my general research question on how can we teach computers language the question there is what did GPT learn about language what are the what's the implicit knowledge of language that it has and that it extracted from all these billions of words that it was trained okay that completes my my talk if you find this interesting then I can highly recommend going to Huggingface which is really a great source for NLP modules that you can find many different versions of GPT there including a Dutch version of GPT2 that was developed by Rieze de Vries and Moffina Nysim from Groningen there are also many demos on transformer based writing systems on Huggingface that you can play with if you want to and these days it's also relatively easy to get access to GPT3 from open AI itself that concludes my talk thank you very much