 So this would be rather humble talk, simple, nothing like too sortical, a story and sometimes a cautionary tale about one company and few software engineers trying to put a large language model into production. So the trouble with speaking about AI on the conferences like this is that you sent your abstract in March and you speak in July and in the middle like of the things happened. So just to give you some hint like what happened just past month or two is we now have generative flanking Photoshop. If you recognize those are like the very famous memes and the Photoshop just draw everything around that, at least for me it looks very realistic. We have roughly the same now in the mid-journey so if you now generate the image in the mid-journey you can basically zoom out and the mid-journey is like drawing what it found to be realistic around that image. Just yesterday I found that somebody made the Barbie girl song with the voice of the Johnny Cash and the voice cloning and the lip-syncing is going really hard, something that used to be just for expert is now taking over and they are like the free tools that anybody can use. And also we now have the lawyer that have been fined 5000 bucks for using CheGPT so his client was suing airlines and he basically just asked CheGPT to do the filling and the problem was like it looks like really good unfortunately all of the cases that were cited were hallucinated and were not real. He was not stupid. He asked like whether those cases are real or not, unfortunately he asked CheGPT. So CheGPT said yes, they are real. So I am slightly worried that we will get more stories like that in the very near future. This is some small thing so we now have more open models, the Lama 2 is publicly available two days ago as you might have seen on the lighting talks yesterday. We have code interpreter that in my opinion will be the big thing. We have more CheGPT competitors like the Claude 2, Barth is finally available in Europe and for like small languages and new companies are starting based on the CheGPT. So who I am? My name is Peter and I have two jobs. I am the researcher at the Maserick University doing large language models for proteins and DNA. You might have seen the talk yesterday by Eva Clementeva that is the same group as I am. And I am also doing large language models for media monitoring. So monitor media is company based here in Prague. We collect everything that appears in Czech and Slovak Republic in print, online, major TV shows, major radio shows, even the podcast. And we usually have quite good relationships with our media sources because our clients are typically some PR agency or for the large company some PR department. So those are the people that are paying for the advertisement in the media. So media is usually quite happy that we made it like simple for them to evaluate the campaign they are paying for. And I have joined, I have started to collaborate with this company in November. For some of you might know what else happened in the end of November, CheGPT appeared. So from the beginning it was like something we have been like playing with and before I will go, I will like tune my talk to like how many people here like use CheGPT almost daily, almost every working day. Interesting. How many of you use it for coding? Okay, a lot of the people. How many of you have access to CheGPT+, so this is the paid version, less people? So many of you have ever used the API, so basically the Python, okay, a lot of the people more than I would like guess. How many of you is using BART or CLOT, the competitors of the CheGPT, interesting, not so many people. So I might talk a bit about like what is the reason to use them. And like if this would be a small room, I would ask like if you can share some creative use or any hiccups, for example, I'm using CheGPT for cooking almost every week. And the hiccups is something like the problems is something that I find important because we don't want to be this lawyer that use the CheGPT the wrong time. So it's important to like understand what are like the strong points and the weak points of the CheGPT. My favorite weak point is this one. If you ask the CheGPT and even the CheGPT+, Great Britain versus Madagascar, which is bigger, he felt that Great Britain is significantly larger than Madagascar, which would be probably even mighty. But then because he's smart, he knows that to make it more convincing, he should cite the area of one and two. And then he realized, oh, but Madagascar is bigger. And now if that would be the person, he would edit the beginning of the sentence. But as some of you might know, this is generated sequentially. So she couldn't edit the beginning of the first sentence. So he rather like put this extra sentence on the end. It's not in every generated text, but every generated text I have ever seen always saying that Great Britain is larger than Madagascar. So this is just like one of the things. Quite often people are coming to me with like other errors. Many people think that they are testing like the encyclopedia knowledge, like, oh CheGPT, do you know what is? And then put some like Wikipedia type of the word. I don't think this is like good use for CheGPT. And also like many people are coming to me with the problems like you can make even a simple math question that CheGPT failed, but only the free version, only the three and a half. Like when my friends are coming to me, my most often answer is have you tried the CheGPT plus? They said no. And like, I would say in 90 cases, the CheGPT plus will not do the mistake they are asking about. So first take home message is if you use CheGPT a lot, if you are one of those people who was like using it for the coding and using it almost daily, like you probably want to pay 20 bucks per month to get much better tool. Okay. So we have other AI uses in monitor app, even before I joined, like one of them is the news aggregation. So we are trying to find the articles on the similar topics. But when I joined, what somebody suggested that would be a nice thing is that sometimes when you are, you know, searching for some words like the AI, you get like the long list of the articles about that. Sorry, this is in check. So how can you get through all this article? Like you can read just the title. You can read those one few lines from the beginning. But it would be really nice if you would get, you know, summary, just a few sentences on the beginning. And that's what we have used large language models for now. And this has appeared in the production on, I believe, March the 2nd or March the 3rd, something like that. So from the technical point of view, it's six, seven lines of Python. It's nothing really complicated. You just call open AI. You need to have the key generated on the web page. You, yeah, you basically get the output of this. And you select the model currently. We are using chgpt. But when we have started, we start with the curry. Why we started with the curry and not like the larger model, like the Da Vinci, was purely for like the financial reason. We needed to make it on a reasonable budget. This was too expensive. Also we didn't summarize all of the paper, all of the articles. We just summarized the one that somebody opened. So we did it on the fly. And why we didn't use chgpt API? Because on March the 2nd it was not available. And this happened on the first day. Actually it happened before I was able to get to the office. So it was not the good start of the day. My problem was when I was testing that, I was using like the real newspapers. But we are, for some of our clients, putting into database anything what they want. So for example, this is some small city in southern Bohemia. And this is from their web page. This is the screenshot on the left. And this is like the text translated to English on the right. And it was just, you know, they met the city council, met. And then there is the PDF, what they were really talking about. And the summary was like on extraordinary meeting the city council. And then they feel they should like put something there, but there was no information. There is no information here. So they put her, oh, they dismissed the mayor. And because the name of the mayor was like somewhere there, they put her the name. And like when this small city like saw that, they were like really not like very happy to say it very politely. So what was the problem? The problem was that this text was too short to be summarized. So it was easily to be solved. We just like change the minimal lengths to be summarization. Because if you see something like that, you don't want to summarize it anyway. It will like not add more information. Also, we were happy that something like two or three days after we start to run this into the production, Chech GPT API appeared. We did the test. It worked like so much better for the same price. So it was like good coincidence for us. Other thing is, may I ask how many people here doesn't have English as their primary language? So okay, almost everybody. Could you somehow put the hand on how much is Chech GPT working for your language? Like this would be the English. This would be not at all. So for the check, it's roughly here. It's not as good as the English. It can't do the rhymes. It can't sometimes get the tone correctly. But most often it's okay. And like many people thought that the Chech GPT is working for a check. And one of the things that if you are using Chech GPT for other language than in the English that will always be the problem is the tokenization. Because tokenization has been done on the primary. So the tokenization is splitting the text into the small parts and then numbering those small parts. And because it has been done on the primary English, you can see that the English is almost like each word is one token. Well if you take some other languages, you can see that it's more like one character is one token. So the rule of the thumb for English is something like four characters is one token on average. So when I started to make my back of the envelope calculation based on this and use Chech text instead, I was very wrong. Because for Chech it's more like two characters. So because you usually pay per token, you are paying twice as much. So you can only, because there's some token limit, how large text you can like send to one of those, that means that you can send only half of the length that would be for English. So keep that mind if you like, because still most of the discussion about those models are among the English speaking people. So keep in mind if you would then do it that it would be like better, worse for your language. Second is you rely on open AI Chech GPT servers. And what happened to us is that when the servers are slow, we need the summary, as I've said, it's calculated on the fly. It's of course like saved and then used repeatedly. But for the first reader of the article, it's calculated on the fly. And when we started, it was like five, six seconds to get the summary. And then after a month, it was 10, and then it starts growing. It's not really practical to get the summary after 20 seconds, because in 20 seconds you probably can read half of the text anyway. So this is like, we are really scared because we, our Django upstart time outing those things and really like not very nice kind of problems that you want to have on the Friday afternoon. And so this is something, again, keep in mind that you observe open AI server in like one state, but it can change and it will change. Now it's like really good, because they probably bought more GPU cards. And last thing that I didn't get in testing, but I know that I'm not the only one person because I have heard it from other people, in the rare cases for Chech, it's one person. But for Slovak, for example, it's more summarization language is wrong. So we are putting there like the Chech text and we are getting English summary. So it's typically like from other language to English. So it's very easy to get the turnaround to just test the language of the output. It's nothing complicated, but it's just that you need to care about that. So this is Chech GPT alternatives. We are still using Chech GPT 3.5, there is now new version from the end of June. Chech GPT 4 is awesome, much better as I said, but the problem is it's also much more expensive. Claude, now there is the new version, Claude 2 is awesome, but for summarization, I don't know why it's not giving me. So for my personal use, I'm using Claude 2 all the time, but for summarization, I'm not getting such a good result from the API. There's now Bart, like this is very recent. When I was last time tested the API, it still didn't accept it, Chech request, so it tests the language as well and said it will not talk to you. And it's Bing chat. And each of those tools is like special for something, like I would say that each of them has its own use case. So the Chech GPT is cheap and easy. Chech GPT is having now the code interpreter, which is awesome thing. You put the Excel file, you ask which graph do you want and you will get the graph or statistical analysis. If you do the biostatistician as me as the training, it feels like really strange because this used to be like a large part of my job and then the Chech GPT is doing it maybe better than I used to do. Claude 2 is awesome because you could input like 100,000 tokens so you basically can copy the book into it and you now can do it even like the multiple PDFs. And then asking the questions about what is inside. And the Bart is having like another superpower, like most of those models have been trained and the training set is basically fixed somewhere in the past. Like for the Chech GPT, it's the September 2021. So all your data and information is basically two years old, but the Bart is trained continuously. So if you ask the Chech GPT who is the Chech president, you will get the wrong answer. Well for the Bart, you will get the right answer. From the beginning, my colleagues pushed me like, okay, this is the OpenAM model but we want our own large language model. So we have started with the Lamas and friends. So Lama was released by Meta in February 23. It's basically application of something that has been done by the DeepMind in the Chinch La paper, the optimal ratio between training and the number of the parameters. It was less than 10 days before the torrent appeared on Fortune. And later in March, the weights appeared on Huggingface, so basically anybody can use them. I was able to legally get to them through the university, but otherwise the people were experimenting on them a lot. And just two days ago, like new version of the Lama appeared and it's now with very open license. It's not totally open. It's not MIT or a patch, but it's pretty open. And why do I care about Lama? Why don't I use like many of the other models that are available? Yeah, I like Lama, so I like to put. But Falcon is also nice. The real reason is that if you only take care about English, it's fine. You have like now retrain Lama, you have the open Lama, because I'm using the small Lama, just 7 billion parameters, so I would be fine. But the trouble is that for check, it depends like how much check text was in the training corpus and only Lama is having like reasonably, everything else was much worse than Lama. So I was still hoping that meta will open source it and it finally happened, so I'm happy. And I would take home message, hugging face, both libraries and the hub is awesome. It's something that used to be complicated is now a piece of cake, anybody can do that. So this is the list of the open models you can use. They even like run them on some benchmarks, so there is the leaderboard, this is the old one because Falcon is still running, that's not true anymore. And you might ask like how am I doing that? Wasn't like running large language models basically only available for the Google meta and companies like that? And the thing is thanks to the hugging face, it's not anymore or more like implementation of other techniques into the hugging face ecosystem. So even to load the model into like consumer size GPU card used to be impossible, now it's possible because instead of using 32 bytes for the number, we are now using only eight or four. There are some tricks how to do that. So if you are using the neural networks like that, it's hard to train them because you have the billions of the model. So the trick is we usually don't train the whole model, we use like very small fraction of the parameters and train on them. The trick is called Laura, there is a nice hugging face block like how this is used. So currently another take home message is that the training is easy, you collect your training set, there is the script you can get from the hugging face, you train it on your language, on your data, the hard part is running into the production. And that's like the sad end of my story, we still don't have our own model in the production because basically it doesn't make sense. Like we can run it a bit cheaper than the CHE GPT, but it would mean like taking care about all the infrastructure and having some dedicated person for that. So make sure like thinking about production and running your model in production from the very beginning, okay? There is some like now the issues, I don't know if you know this guy, this is Karl Brumann, the famous best statistician. And some people are now have like really negative emotion, there is AI act that made through the EU parliament, it could have like very negative consequences, not for the CHE GPT but for the open models. So it's totally possible that running open models would be basically illegal in the European and or at least in the production environment. So yeah, I still don't know how this will end up, but it looks like pretty doomed. But among the 2023 for me is still the year of miracles, like the new models, the new things appearing every month, like the things that I would think that are impossible just a year ago. Okay, so this is our team at the monitor, we are still hiring if you are in Prague based like Kamenchat and this is me hope to see some of you of the check by come that will be in September here in Prague and if you have your own experience with running large language models in Czech or in any other language or running large language models in production, I would love to talk to you. Otherwise, thank you for listening and I'm open for questions. Thank you very much, Peter. So for anyone who has questions, please come to the microphone because sessions are recorded, so we want to have your voice on the videos. Hi, thanks for the talk, really, really interesting. You've mentioned you've encountered issues with hallucinations. Did you experiment with the temperature parameter in the CHE GPT? Yes, so we like the standard temperature is something like 0.7 if I remember correctly, and we are using lower. We are not using the 0, I have now my colleagues that is pushing for the 0, but I don't think that's a good idea. So yes, so this is something that we are using, but to be honest, that was with the first model and after like switching to CHE GPT and like basically tuning a bit like what like size of the text we summarize, I think I have not really, I've seen one hallucination, that's a funny part, it was the article about the check actor that was supposed to give some like moderatum avert and the name of the actor was nowhere in the text itself. It was only on the like photo that was coming with the text. So there was no way how to do the summarization and the model knows, you know, there should be some check actor, like in the summary, so he hallucinated and every time I rerun the hallucination, he came with different name unfortunately never the right one. It sounds really plausible, that was like all real check actors, they could do the job, but yeah, so that was like more innocent, but it's working really good and CHE GPT is usually if you will like give him the information, give him the clear instruction, it will do the job. It's like more often that he doesn't have the information than he hallucinates. Okay, yeah, that makes sense, thank you. Thank you very much for the talk. I'm freelance data scientist and I'm about to start a project which is very similar to what you have done and presented, so it's very exciting for me and I wanted to ask you if I understood you correctly, you tried the open AI, open APIs and also you fine-tuned some models from having phase. Correct. Beyond the question of the costs, what other insights did you get out of that because this is a design decision that I'm also thinking about right now? Like some of the things that we take for granted because mostly playing with the CHE GPT are not so common for open models, so one of them as I have said, not every model understands Czech and Slovak, some of them at all. Basically, they have been only trained on the English text. The other thing is that CHE GPT is instruction train model and many of those foundational models, now it's better, now everybody knows that they need to make the instruction version of the model. But at the beginning, I have told them summarize the article, then I put the text and then instead of doing the summary, they just continue with the text because that's another way how to make the plausible continuation of what I did. So that was another thing. Yeah, and my experience is wait a week or two with the new model. For the LAMMA, my colleagues were pushing me like, we want that. That looks really interesting and I was like, oh, but this is running on 8GPU and we have only one and it really happened. Like if the community is awesome, around that it's really unbelievable. LAMATU is now here for two days. It's already on the hugging phase. The people already trained the model based on the LAMATU. There's lots of experience in two days. It's too good to be true. So listen to the community. They will help you a lot. Thank you. Yeah, really good talk. I know that, well, I've heard rumors about GPT losing fidelity over time. Since it first came out, they said there's been some degradation in performance. I wonder if you've noticed this and have you mitigated against it? So I have heard that about GPT-4. I'm not sure that the same is true about the GPT-3.5 and I didn't see it. OK. OK, we no longer have time for questions, but I hope, Peter, you will be around for the rest of the conference. Thank you again for the great talk. Let's give him another round of applause. Thank you for having me.