 We made it through the last moment of today. Thank you all for coming. My name is Vinny Jayaswal and I'm here to present about one of the most talked about topics. Of course, I'm traveling from United States to Japan, so this naturally makes sense, a topic. So a little bit about myself. I have been working in open source for over a decade now, and mainly I have been working on data and AI technologies. Through my time at Citi, Databricks, Bitance, and I also had the opportunity to work on some of the popular projects, Delta Lake and Apache Spark and MLflow. Some of them we started from the lab and grew it to a lot of subscribers. Now, since we have an interesting topic today, language plays a fundamental role in facilitating communication and expressing our interaction with the people, right? For example, I imagined myself a time where I was in Italy and I was trying to talk to people there in Italian, but I didn't speak Italian. So we relied heavily on cell phones to do the translation. Luckily, back then, Google Translator was a thing, so we could actually somehow make it work. Now, actually, it has grown a lot better, but that's why such human requirements allowed us to think about how we can make machines go beyond the traditional boundaries of programming and communication, and that's why large language models evolved as the cutting edge artificial systems that can process text and can help us communicate coherently. So as I said, the need for AI models, LLMs generated from our own requirements of translation, summarization, information retrieval, and think about lots of developers. We are an open source community. That means we are able to talk to so many developers around the world, and now we have grown quite a lot in terms of how we have contributed to the projects, as well as how we adopt projects from different countries, because now we are able to communicate. One of the main things when we code or when we talk on GitHub issue or PR is how will you get communicate, right? So communication is not just translation. It is also how about you document, how you summarize, how you are able to understand the context of each other, right? Because we all also come from different cultural backgrounds and we have to understand the nuances and barriers, right? So recently we have seen significant breakthroughs and we witnessed language models. So that primarily attributes to the deep learning techniques that we see nowadays. So how many of you have heard about ChatGPT? Everybody, right? So it released exactly one year ago and just it was a record breaking technology, right? Because this whole year was went into how we can make that work, how we can transition to using ChatGPT and making that a viable part of the technology architecture or the businesses that we are conducting, right? So just in two months it had 100 million plus users. But actually things kicked off in natural language processing way before ChatGPT and GPTs. The possibility of automatic machine translation has long been an object of fascination for mankind. It finally materialized in 1950 when Alan Turing, who actually began to create computer systems for human interaction. So what he did was he actually created some experiments and finally he came up with Turing test. So have you seen the imitation game? Basically Alan Turing, who was a British scientist, he and the main character of 2014 film noted that to learn properly, a computer should understand how to imitate human interactions and how you can trick a computer in communicating as if it is talking as a human. So even now the test he has created back then in attempting to have a communication language between a machine and a human is a standard test that a lot of language models go through. Like Google also used it to pass the Turing test to create their decoder model. So it basically judges how well you can trick a computer in understanding that oh, this is not a machine generated but human generated context. And if we double the deep dive into these inventions, this is the timeline. A lot of inventions have happened other than this but let's look at some of the key important ones. So Turing test I have already covered then in 1956, well around 1955 to 1956, John McCarthy actually coined the term artificial intelligence and at a Dartmouth conference he presented about the artificial intelligence concept. And then few more, for example, in 1968 there was a robot which was developed by researchers. It was named Shaky and then further to carry on, Siri was also developed around this time and then it was integrated into iPhone 4S in 2011. One of the other notable things that we see is IBM Watson. So if you were around 2011, IBM Watson was actually developing a lot of experiments on how Watson will work. Finally, one of the other things was Amazon Alexa that we all kind of use now in 2014, Google actually built a chatbot for allowing people to do that. And fast forward to 2020 when OpenAI actually had a GPT, well in 2015 GPT was introduced by OpenAI. It was a paper that they showed and fast forward to 2020 to they released a GPT, chat GPT model. So as I said, this year has been transformative because of what happened in November 2022. So this is a survey by GitHub. It actually shows you how much activity was done in GitHub alone by open source community. We had a total projects growth of 27% year over year, which is like 420 million. Developers are heavily using generative AI. It is another coding that people are, another projects that people are working on, which is how AI can be used in their own projects. And another key thing here is developers are operating cloud native applications on a large scale because now everybody cloud native solutions have allowed us to work on more experiments, more access to more technologies. And the highest number of first time open source contributions happened in 2023. And more projects around generative AI were created this year alone. So this actually shows you the magnitude of how much generative AI has taken over. Not just buzzwords, but also how much work is being done by the open source developer community. And this is a graph of how large language models have evolved over the years. In 2023, you can see that there have been a lot of large language models and they're still getting built as we speak. Now on to how do LLMs work? So this is a paper which was released by Google in 2017. It was about AI architecture called Transformers. Transformers basically I will show you an architecture a little bit later. They name this paper attention is all you need. That means all large language models use components of Transformers as a part of their architecture so that they can interact with machines with plain English text, which is often called prompts. Now sometimes you won't get the output as you expect. So you have to change the prompts. So how many of you tried 3.5? And how many of you compared that to GPT-4? You can see massive difference. And if you keep on repeating the words providing more context to your chats, it will actually develop more intelligent responses. And that's why it says attention is all you need because you have to get it to pay attention to context, pay attention to some of the complex prompts. So let me actually demonstrate a little bit of, for example, I will just ask can you give me, can you give me a summary of the book savings? Now, as an adult, I might be able to recognize those outputs more. It talks a lot about summary in terms of revolution, in terms of humankind, in terms of industry, scientific, capitalism, and whatnot, right? What it is missing is who the audience is. For example, if you are teaching this to a six year old kid, they won't understand this. So I have to provide a little bit more context on. So now it is actually doing a storytelling. So you are guiding the prompt to use the way you wanted to explain it to you, right? So you have to provide those prompts. And that's why, if you remember, there was a trend a few months back, prompt engineering. Everybody was hiring prompt engineers so that they can develop good prompts. I will show you in later slide where those prompt engineering efforts went into. But yeah, this was one of the examples on why it needs an attention for your prompts, all right? So let's talk about LLM architecture. There is a lot which is going on behind the scenes to just simply chatting with the chatbot, right? So as I said, Transformers was a paper that was released with released, the paper that Google released had Transformers. So Transformers has few components, sorry, Transformer architecture has few components, which is Transformers, parameters, tokens, and context, window, and length. So let me explain in detail what each of these components mean. This is the diagram from the same paper, and this shows you Transformer architecture. You can actually see there are two parts to this architecture. The left part is encoder, the right part is decoder. Now you can see there are different layers to this architecture. For example, under the input, it has on the encoder side, it has six layers on the decoder side, it has six layers too. And I will just take a very easy example for explaining this in simple terms. For example, if I want to translate Japanese to English, that will act as my input on the left-hand side, basically encoder will take that input, it will use different layers within the encoder and give it basically, it will break it down into different tasks. So one encoder will focus on a specific input, and this will then feed into decoder layers on the right-hand side. An encoder and decoder can be used independently or together to work on a task. Now, not every model adopts this whole architecture. What large language models we have seen, some of them just adopt the encoder side, and some of them are decoder-only models, and some do accept both encoder and decoder type of architecture. For just the encoder type, basically if you want a simple translation or summarization, you will use encoder transformers, but if you want a more extensive tasks, for example, generating tasks or generating a generative task like GPT-3, GPT-4, those language models use decoders. And most of the research also was done on the decoder side, just to call that out. Now the second component in our architecture was parameters. So this basically looks like a lot of layers, right? So when actually we talk about parameters, when we talk about large language models, we are always talking in terms of, oh, how big the parameters was, 170 billion, 30 billion, what not, right? So basically GPT-3 is 175 parameters model and Meta's largest Lama-2 has 70 billion parameters. But what do you mean by parameters, right? Parameters are basically a neural network and are variables that the model learns during the training process. So as you can see in the diagram, there are multiple layers which this neural network has, and every circle that is represented in this diagram is basically a node and it has an output layer. So basically how do you calculate how big the parameter is for a particular model? For example, this one on the left hand side, there are three nodes. So basically in the calculation, what you do is when I just talk about the first part, there are three nodes plus it is feeding into four nodes. So it will be three multiplied by four plus four, whatever hidden layer one has. So that will be three times four, which is 12 plus four, 16. And then when you go into the next section, it is four by four. So it will be four multiplied by four plus four because that's what the hidden layer two is. So it will be 16 plus four, 20. And then the last section is four times one. So it will be four times one plus one. So in total, it will be 41 parameters. So when we talk about large language models, it is combination of a vast majority of neural networks, but that's how you calculate the whole end-to-end neural network for parameters. And so I just showed you how to calculate the parameters for a fully connected network. All right, so now since we learned about the parameters and the variables that the models during the training process go through, let's look at how the next layer works. So third component I showed you in the transformer architecture was a tokenizer. So this is a screenshot from OpenAI. Basically you can see that there are few lines that were written in that prompt. If you see the output, each word is translated. So we always think about how GPT translates or outputs your query is word by word, right? No, it actually queries by word by it translates word into tokens. Even the spaces and special characters will have a token. So if you can see many words mapped to one token, each space in that line has a token. And that's how it calculates how many tokens are generated for this specific character. For example, if it is a tokenizer, it is made up of two words, token and iser. And even for full stops and spaces, it will give it a token. So basically a tokenizer is composed of number of tokens plus number of characters. That's how you get to the calculation. All right, so now we already know about transformers. We know about parameters. We know about tokenizers. The next thing which comes in LLM architecture is context length. This was actually generated by chat GPT. I'm giving a chat GPT talk, so. So basically context length is how much your chat remembers the context, right? Some, if you look at how many of you have played with OpenAI Playground. So in OpenAI Playground on the right hand side, there is a selection menu where you can select the underlying model as well as how many, how many context length you want for the model, how many truths, how many, how much penalty should it give for repetition and all those things, right? You can also describe how much context length you want for each prompt. So basically what context length means is how much of a context you are providing in the input sequence so that output gets better. For example, I think GPT 4, 32K has, 32,000 context length. So you can provide almost 50 pages of information into the chat GPT and basically get the output from it. Before, one year ago, you could not do that much lengthy input syntax. You have to minimize your syntax so that it generates the output for you, but now the size has gone bigger. So now you can also feed in the documents, feed in multiple information so that it remembers the context. You are basically providing it the knowledge of history and whatnot so that it can give you intelligent outputs, right? So that's what the context length means. Now imagine what it must be like in 2020. The transformer architecture in 2017, the paper I showed you was almost proving better than anything before, but this is the time when there was significant experimentation going on. In the opening, I released a bunch of papers. Did you know that how many researchers they hired over the years? They hired so many researchers around the world just to work on language models, just to work on those papers and finally the output that we see is tremendous. So some companies, I talked about the transformer architecture. Some companies focused on decoder portion and some companies focused on the encoder portion, but there were also the talks around which language models will perform better. That's where the scaling laws for language model performance comes into play. Now how do we go about deciding which language model performs better, right? So let's take a look at some of the results. On the first one, Y axis actually shows the test loss and the X axis shows how many parameters we have given. So you can see that the graph is not linear. It is going down. That means more the parameters, less is the test loss. So the performance will be better in this case. The less the test loss, the more the performance. If we look at data, oh first let me tell you why I'm selecting parameters, data size and compute because these three were the parameters, these three were actually the actually the considerations for how you determine performance. People talked about, oh we should train model in multiple times, we should maybe try to increase the data set size, maybe we should try to increase the parameters. So there were a lot of debates around which were the considerations you have to use to come up with high performance models, right? So anyway, going back to parameters, more the parameters, better the model is. Then data set size, everybody agrees, right? We have to have more data size so that our models can perform better. So you can see that on the y-axis we have test loss. If you have less data set size, test loss is higher. That means model will not be performant. And similarly in the compute over the compute, the more the compute the test loss is less. Now there are a few ways to go about this. For example, if you increase the model size, that means you have to increase the compute so that it finishes in time and it finishes better. That's why compute has a little bit of different graph. What this data tells us is language model improves when we increase the model size, when we increase the amount of training data and compute as well. Oh, by the way, compute is measured by petaflop days, if you are wondering what PF days are. Now when we take three of these graphs together, they are basically telling us that language modeling performance is better when you increase the model sizes. I talked about data for training and compute used for training. So basically whenever you are factoring in how I should go about training my model, those are the three factors you should consider. And then OpenAI team actually went on to propose that we should increase the model size. So there were also some arguments about how about we increase the steps in the model. But they actually proved it in a paper. So with the bigger model size, they also had to select more dataset size and it was published in 2020. So you might still question why OpenAI researchers went on for such a large model, given that some models when it existed, there were only couple of billions of parameters. But when GPT-3 was launched, it launched with 1.3 billion and then 13 billion and 175 billion. Now let me show you what the difference is and how you go about increasing the model size. So when we look on the y-axis is the accuracy of the model and on the x-axis is how many number of examples of context we are giving to a specific model. So when you had 1.3 billion parameter, you can see that on x-axis, even though we give so many examples of the context, it still is not performing because you are not increasing the model size. Parameters are still 1.3 billion. But as soon as you give more parameters, the accuracy just spikes. And then there is a zero shot and one shot on the top. It actually shows you that on, just by increasing one more, few more concepts, context in the example model, you can, the 175 billion parameter model actually spikes up. So it was like a factor of both the context as well as the parameters that performed well. That's why both of these factors matter a lot when you are thinking about model size. Now, chat GPT, as we already saw this, let's dive deep into what few of the offerings were from OpenAI. So we saw three offerings, GPT-3, GPT-3.5, chat GPT, GPT-3 was actually released a long time ago, but it only generated text and doesn't follow instructions. Then came GPT-5, 3.5, which follows instructions, but it was an upgrade for allowing language models to follow more instructions and uses supervised fine-tuning so that your requests are actually fine-tuned better. And then when chat GPT came in, it actually came in with optimized prompts. So you are able to talk to a computer and get your responses. It also allowed for programming languages. It can give you the output of code. So it was able to handle programming as well. And then GPT-4, basically in the previous one, you can only give text and instructions. GPT-4 allowed you for the ability to not only provide the text input, but also upload files and images. So now it can actually look at your images and provides you an output in the text format. Of course, it's changing as we speak because our brains are getting very creative. Our humans, we always wish to have more capabilities from AI, so it's evolving constantly. This is actually a comparison of different models, but let me actually show you a slide. Yes, this one. Tharm 2 was a model which was released by Google. And it performs so well, it had, it had, well, I don't have the, I don't think they released the parameters here, but you can see that when in April 2022, Tharm was released by Google, it had 540 billion parameters. And that was actually one of the fastest high-performing models that was released. And Tharm 2 had evolved, Google didn't want to release much of the information on underlying data because of how much competition is going on right now, maybe that's the reason. So basically Tharm 2 was used in some of the medical use cases. What it does is basically, if you have an x-ray, as I showed you, right, you can now give images an input and it will give you text as an output by determining what the image is. So if you give x-ray as an input to the model, basically the large language model, it will perform some experiments on that image and it will actually give you the analysis of that x-ray report. So you don't have to read in or if you don't understand the medical terms, you don't have to further go inquiring a doctor. Sometimes we don't even know what to ask for. So GPT now can tell you about your health record as well. And then, so we only talked about corporate models. Even though chat GPT seems like open, it's only open to use, but they still haven't made the code available for open source community. And that's why there were some researchers who wanted to open source some of the large language models so that community can actually build on those models and help them understand what people are asking or how people are building their own applications using those APIs. So there are three such open LLMs. Meta released open pre-trained transformers in the open source. And this is a couple of decode only pre-trained transformers ranging from 125 million to 66 billion parameters. And they shared actually with everyone. They also allowed researchers to request 175 billion parameter model because they really want to know how researchers are using LLMs for, not everybody is trying to build the next LLM, right? There are some amazing minds there who also have their own creativity that they want to come back and work on the model. And also the OPT is trained primarily in English. And then it actually provides you the code for experimenting with the models. Couple months later, hugging face, their research team actually received a grant for computing resources from French government. And they trained the blue model. What they did with the grant is they worked together with a volunteer team of over 1,000 researchers from different countries and institutions. And then they created 176 billion parameter model. This was decoder-only transformer model. You remember the architecture I showed you? A lot of work is being done on the decoder side of the transformer model when we are talking about generative AI. And then the team actually made everything available from the data sets they used in the open source and you can actually download it and run the model. So what this allows is for bigger organizations, corporate organizations when they don't want to pay for proprietary models, they can use this open source model. And also this did not take off as much as other models did as possibly because other companies like Google, Meta, already had a head start. They have been working on AI for many years. So sometimes they have more context, so they have a head start. And then the third one is Lama. Everybody knows Lama. It was released by Facebook in February 2023. And they actually released three models, seven billion parameters, 13 billion, 30 billion and 65 billion parameters. Now you know what the parameters is, right? Everybody talks about large language model in terms of parameters. So this is where hopefully your understanding is clear now like why they talk in terms of parameters. This is why. And actually one of the good thing about this is they used a lot of multilingual languages which are non-English. If you look at GPT, it was mainly trained on English data and multilingual knowledge was still missing but I think they have improved a lot now. So now you are able to communicate with GPTs in different languages. This visual gives you an idea of how impactful Lama is. This is by the community. As soon as they opened it up in the community, everybody started using it. And we knew from open AI research that training models with instructions significantly improves the performance of the model, right? But since open AI did not release the data set in public and Lama did, they became very popular. And also if open AI had released the pre-trained model, it would have made much more easier for other companies to make subset of the model and maybe run on a single GPU hardware. All right. So we talk about different models, right? How do you go about selecting a model? Like which model is better, right? So as a researcher or as an open source contributor, if you do want to know about how to compare the models, there are two main benchmarks, Helm, which is holistic evaluation of language models. Actually, if you ask chat GPT about what Helm is, it doesn't know that it's just a benchmark tool. It will think of it as a large language model. Maybe it has changed now with GPT4, but that used to be the case. And there is hugging face leaderboard. So this is the screenshot from hugging face. If you type in just hugging face leaderboard, this UI will come up and it allows you to play around with some of the model sizes. So depending on what you are trying to do, you can actually feed in model sizes, like what model sizes you want to compare with, what columns you should show in the report. So anyway, that's about the selection part, how you go about using this leaderboard. But you can see on this section that the models which are performing higher, they're underlying architecture is llama. So that's why when we say it's high-performing model, this is where it's going from. All right, so now that we understand how large models work behind the scenes, hopefully everybody's still with me. So now we know how it works. How do we communicate with AIS? Well, we have already figured out that we are communicating through interfaces, right? But AI assistance, AI is actually emerging as we speak. Now, text is no longer a question. Now people want their own personal assistant using AI, right? So this is a revolution that is happening right now. Every company is trying to work on AI assistance. So if you want a personalized chatbot to help you with shopping, a personalized chatbot, which actually helps you with daily reminders. For example, I'm a big fan of using Google Assistant at home. It reminds me every day of what my calendar looks like, what my schedule looks like, whether if I go outside. So those kind of things are actually coming in chatbots where you can actually personalize them. And because you control the input, you can also constrain the output, what it should inform you about. Another thing that is happening for personal assistant is you can make the input as personalized as possible without leaving your data in the open source or giving your data, right? Even open AI, when they made an announcement in November during their day, they also released that none of your data is being used for training because that was a requirement which was preventing companies to use chat GPTs and their assistant. So now you can actually use the APIs to make sure that your proprietary information is not shared. Another thing which is coming up is, you know, compliance as you can see everywhere, right? Compliance is a big thing that people are talking about on ethical AI. So well, technology is good. It's a good math and science, right? So it's good that government is now helping us, technologists to not completely turn off our projects, but helping how the usage can be controlled because that's where it comes in. There will be bad actors in the community, right? Who's always looking to do bad. And also, these systems should be used carefully because there is a concept called AI hallucination. So when AI systems generate content that appears to be real, but it's not. It's actually not produced by humans. I have to run into problems where it just outputs some machine generated information and it lacks the context. So whenever you use it, and even like some outreach is done by marketing or some companies, they don't even look at output data, you know? So, and then a chat GPD translation, revolution is here, but it's AI chatbot viable option for translation projects. It still hasn't reached to that space yet because it lacks a lot of cognitive ability that human has. So we still need that context from human because as I showed you, context length is very definite, right? Humans have the context of like a long memory what incidents have happened in the past. So this all combined makes your AI work better. So anyway, I think my time is running out. I had few slides, so I'll just run over. But multilingual translation is still a gap, although it performs better, but Japanese government is also trying to build their own LLM models in partnership with SoftBank. So those are some of the things I want to call out. And this is the response from chat GPD when I ask about what are the limitations on language? Yeah, so LLM limitations and for AI to really work, it has to be inclusive, it has to be influenced by everyone, not just technologists, not just one region, not just one country, but everybody, native languages, diverse perspectives. Think about doctors in different countries. They have to upload their cure. Sometimes they only have handwritten notes, so everything combined can make AI much more beneficial for humanity. Yeah, sorry, I ran over a little bit, but thank you all for listening.