 So yeah, hi everyone. I hope you had a good breakfast. I know it's like a bit early in the morning, but yeah, I'm really excited for this talk. I don't think this is the first time you're hearing chat GPT this week. So I think like, you know, as we delve down, you'll learn more about it and also learn about, you know, our progress as such and what we learned throughout this process. And so yeah, let me go ahead. So I'm Archana. So I'm currently a data product manager with women who code. I'm wearing their very cool shirt as well. And if you do come to the event hall, I think you can catch our booth with women who code Seol and Singapore. And apart from that, I'm also a board member with women in machine learning, which is another nonprofit. And yeah, I also produce courses on LinkedIn learning. I've one on learning tiny ML and we have another one coming on ML ops with the vertex AI that's maybe coming out in a month. Both of us have produced that. And yeah, I let Sohum go ahead and yeah, hi, I'm Sohum and currently the machine learning lead at a startup in Singapore called sleek. So yeah, as I said, you know, probably this is not the first time you're hearing chat GPT this week. And so we just this is a short preview of what we made. I hope this sort of makes sense. And, you know, as we dive into it, I know that we can't really walk you through exactly, you know, how we build, but we can in the 25 minutes, but for sure we can give you the access to the GitHub and so on. And so yeah, so today's agenda is what are we building today, a peak into like GPT as such. And most importantly, we want to talk about what they don't tell you about building these LLM products. And then finally, like ways to contribute. So yeah, let's start with the first thing, you know what? Firstly, I've definitely wanted to try our hands into LLMs. But most importantly, like, you know, most of us here may be like, even me, right? I speak three languages. So in my head, I'm always translating between all of them. So yeah, I one of the things that I oftentimes face is English accessibility problem, right? We oftentimes convert the text to English or in our head or otherwise. And maybe it doesn't sound right. So I thought this is like a perfect example for chat GPT to help out with. And so I went like I tried to figure out like on the web, like if there are people similar to me. So that turns out it is so you know that more than, you know, half of the world's population speaks at least two languages and many people tend to think and work in the native language was 75% of consumers prefer to buy products in their native language, highlighting the importance of language accessibility and 50% website with visitors believe if they can't read or understand the context and indicating a significant barrier for non native English speakers. So in short, long and complex English texts can be difficult for many people, including those with reading or learning disabilities, elderly individuals or individuals with limited English proficiency. So yeah, I think this is a great segue into talking about GPT. Just trying to make it easy for people to for bilingual people to people who speak more than one language to you know, talk in English, write in English and also, you know, like do communicate business English better. And we're trying to do that with GPT. So I'm sure you've heard of chat GPT by now. What it is is it's a language model that was created by OpenAI. I think GPT is like the most famous one right now, but there's a lot of others that people are working on. The biggest language model that I think has been released so far is GPT 4. It's been trained on data till September 2021. And a really cool thing about these models is they have this thing called emergent abilities, which is they can do tasks that they have not been trained on, and they can generalize across a lot of different tasks, right? And GPT how it was trained to do this was take feedback from humans and learn how humans want the GPT to answer questions and GPT will kind of try to emulate that, right? So how you interact with it is through natural language and it also replies with natural language. But the problem with GPT models are that they're very large. I think anything more than GPT 2, which is 1.5 billion, it's pretty much out of the realm of startups, open source companies, and even mid-sized companies to train. And if you want to deploy them, you have to kind of deploy them on GPUs, sometimes even multiple GPUs to make sure that the latency is less, right? But it's really good at so many tasks that these models right now serve as kind of like a baseline or a benchmark when you're trying to build new tasks. And that's why you can't really ignore them at all right now and you have to use them. And I'll talk more about that in some time. But yeah, so we saw this, we got really excited. We saw a lot of other people building products with it and we thought, okay, let's try building something. It seems pretty simple. And we'll just make it open source, we'll deploy it and hopefully learn a lot of things along the way. So yeah, that was the first step. So we started building and it's really easy. We're using GPT to go from zero to one and build something that's modestly reliable and it works pretty well. And you can do that using something called prompt engineering where you create these natural language text prompts and that tell the model what to do and it works most of the time. It replies with what you wanted to reply. But then it's different when you're trying to build like a pet project that you might use from time to time or demo at a few places and GPT works really well at that. But when you're trying to take that to production, that's when other problems come up. Problems like cost, how much it takes to run these models, latency, how long it takes to get a response from the model and also how reliable and scalable the solution is. And this is like stage four where we're super impressed by the capability of the model, but also it kind of lacks in a lot of ways and it's not reliable and it's expensive. And that's the point where you get really overwhelmed trying to fine-tune these prompts, trying to make sure the costs are less, the latency is less. It's really difficult to make that work. So before I move on to the problems and stuff, there's also another issue which is because this field and these kind of models are so new, there's not a lot of standards or best practices for how to even go about deploying them. So that's also something we're trying to learn while building this. Right now, our architecture is really simple. The extension connects to the extension backend which sends a request to our Python backend that connects with OpenAI's API and it sends back the result which goes back to the frontend. Both me and Archana, we've worked our whole lives in backend in Python. We have very little JavaScript or frontend experience. I can't even center a div in CSS. So the frontend as well as the JavaScript backend for the extension was actually built using chat GPT and it took us like an hour with almost no JavaScript experience. So yeah, so with that, I want to talk about some of the challenges that we face while building this. I think the biggest challenge is issues with the API. OpenAI has kind of openly, they've come out and said, we have no SLAs for our product, for our API. We have no latency or uptime SLAs and they'll be really happy to randomly deprecate models and just deprecate models. So when we started building this, we were building this on the winchee 002 model. We created a few prompts that were working really well, fine-tuned them to make them work better and then a few weeks into the project, they said that they're deprecating that endpoint and you should now move to this new endpoint. Transitioning to the new endpoint, it's easy. You just change where you're calling the API, but the prompts, they don't work as well anymore. The prompts are not as reliable, sometimes they don't give the kind of outputs we expect and so we had this really well-built fleshed out product but then we couldn't transition to the new API and that would take a lot of effort. At the same time, when they released this new endpoint and this new model, I think they stopped putting resources into their previous model, the one which we had built our product on and so our latency increased from a few seconds to minutes and that just makes the whole thing really unreliable and as someone who's trying to build products around this unscalable as well, I think. Another issue is with prompts and how you engineer prompts, so because you're talking to the model using natural language, natural language, it's ambiguous. You can infer lots of meanings from it and so prompts are also ambiguous and it's very hard to get reproducibility, very hard to get the same result from the model even using the same prompts and the model also tends to hallucinate and which is basically like it will tend to make up answers or give you wrong answers but it will do that really confidently which is also an issue because how do you even serve like the result of the model if you're not sure how confident you should be of that result and as these products get more complex, what you do is you chain like results from one model to another model and then that gives a different result and then that goes on and you also have like agents which is like models that can do some tasks so maybe like you know query a SQL database, search something on the internet stuff like that. These are really inconsistent. They work like one third of the time at best and if they fail especially if you have long chains, so let's say the chain fails in between. It's very hard to recover results from that failure because you don't even you can't tell if the chain has failed in the first place and secondly I don't know if you've used that GPT but if you tell them like hey, you've made a mistake. This is what you should do. It's very hard to get the get it back on track. It will just say something like oh, I'm sorry and then keep making the same error and there's also no evaluation metrics for these errors because it's natural language. You can't really evaluate whether the output is correct or not and finally like trust and security. So recently OpenAI kind of brought up, took out this like some policy or something where they said that if you you can opt out of data collection especially for training and all which is great because that was a big concern for a lot of companies but then the other concern is we don't actually know what data was used to train their model. So you know what if you're building like let's say something that does financial modeling, whatever, accounting maybe and what if that or like stocks and stuff. What if OpenAI trained their model on data from like the Wall Street Bets subreddit. You don't want that kind of data in your model or maybe you're doing like some kind of medical diagnosis. You want to make sure your model was not trained on data that consists of incorrect misinformation or like malicious information and then finally prompt injections and attacks. It's like almost every day that people are creating new ways to attack these models, make it to act in malicious ways or adversarial ways and it's very hard to defend against them. So these are some of the challenges but then it's not that bad. There's still some solutions to it. I think the first thing I want to talk about is like the cost of the model. So when you, I think what OpenAI has done with this is they've reduced the barrier to entry for startups and for companies to actually build smart intelligent products because previously you would have to hire like a team of data scientists and engineers to build like any AI machine learning based product and that's expensive and you don't even know if that product will work or not in the end. But what OpenAI has done is it's kind of made it very easy to just write a few prompts and get a product up and running very quickly. But then as your product grows in complexity and as you get new users, your API costs start to like exponentially increase. Right? So that's what you see in the first half of the first one third of that chat over there, which is as your product grows in complexity, your costs will kind of skyrocket and it'll be, it'll make your business unsustainable. So what you have to do is very quickly move on and fine tune a model. This will also allow you to make sure the model is trained on more relevant data. So it gives you more relevant outputs. It might reduce incorrect outputs from the model. It might also make your outputs more manageable and then also it reduces costs, especially if you can deploy that model on premise. Right? But the costs don't decrease a lot because you still have to deploy that model on a GPU, maybe multiple GPUs and also you have the same problem where even even though you might not be paying for an API, if your prompts increase, if your application complexity increases, your costs will increase a lot as well. And so it's at that point that you have to kind of start training a custom model. So what I would like, what I think will happen in the future is you'll have to do this process of buy while you build out your solution. So it'll help small companies, startups challenge incumbents because they can build our products really quickly. So that's the buy versus build problem and solution. Some of the other solutions. So for prompt engineering, what we've seen work is like few short prompting, which is where you give some context about the problem and then you provide also some formats for inputs and outputs that you expect from the model. And that works really well to make the output from the model more organized and easy to pass. And then if that will work for I think most applications and another thing that you can do is tell the model to do something called chain of thought. And what it does is you tell it to like run down like the how the model arrived at an answer logically and that kind of helps the model make less matters, but then you know you're generating more output. So your latency increases, your costs also increase. That's kind of like a disadvantage and there's a lot of other techniques you can do after chain of thought prompting, but at that point the costs get so high that it doesn't make sense anymore. You have to find your model after that point. Another thing that's been becoming very popular recently is something called vector databases. So the outputs from these models are like vectors which are like long matrices of numbers. So let's and so what you can do is instead of querying the API, you can kind of save that vector and if you have someone ask a similar question, you can query from that vector instead of from the API. It works really well when you have like documents. So let's say you're let's say if you you're doing something like a document question answering thing, right? So you so so so you may provide someone so someone may have like a large document and ask multiple questions from it. So instead of sending that document to open a multiple times, you send it once, save the embeddings and then query from the embeddings, saves costs and latency. Don't like don't use change. Don't use agents at all. They're not they're not good at the moment. Anything where you know the agents are completely autonomous will definitely will probably not work right now. That's what we've seen even with very simple you know applications. And then another thing people do is like to solve this issue is they have like another model like a watcher model, which is also another LLM to watch the original LLM to make sure it's not making mistakes. It works sometimes, but if it doesn't work, you have the same you know like snowballing effect where errors lead to more errors and then who watches the watcher and it just becomes like watchers all the way down. And then finally like I want to talk about some best practices that we think you should do. So all of these are also best practices, but then another thing is like these prompts that you create while building your product, they are now like your proprietary data. They are the equivalent of like an in-house deep learning model from like the old days like like like three months ago. So yeah, save those prompts, make sure they're not leaked. It's very important that you're you know like that you treat them very as important as like you know your API keys or something. And then for prompts as well make sure you version them and you test them you know treat them like you would like any data and and for testing you know you have to make sure that the same prompt the answers don't drift a lot and the quality of the answers don't degrade over time stuff like that. Yeah, so these are some general good practices, but I think we need like more from the community and from other people who are building similar products and you know as we also come up with more we build out our application we'll you know we'll share those best practices as well. Yeah, so I wanted to give a quick demo before I sort of progress because I know like we have been talking a lot about LLMs and I also want to go back to the product and help you see like how it works. So yeah, over here you can see like a input and we tried using Chinese. So yeah, and yeah, you can see like when it converts it says good I'll take the numbers but it doesn't sound that good in English. So you know you can take that text again and you can go ahead and formalize it. So these are the three functionalities that we have with like the Chrome extension right now it does these three things is summarization formalization and then translate. I want to also show you the GitHub. So we do currently have you know GitHub to yeah like where we have hosted the project. So you can see like the entire project and you can we have also like raised up issues where you can go ahead and see those issues and contribute back again like this is completely open source and we really want people to go ahead and contribute because that's how it'll help and most importantly I feel like both of us have like a tunnel vision where we just are looking at one problem but there's probably a lot more to explore and yeah, so this is like a short road map we see where we are right now is like something functional enough it does something small and we do feel like there's a lot more things to figure out like tone is for example right now only formalize but what if someone wants to make it fun someone wants to change it entirely so I think it should definitely work out for that as well and then have like a better user experience as Soha mentioned both of us don't have backgrounds in that and it's taking us time to understand what a customer might think and it's mostly like how we interact with the extension so if you are someone who has that experience we would love for you to contribute and finally like we are definitely dependent on the community to take it forward we are writing more about our you know journey as such on the newsletter but definitely like they come you to also do so so if you are interested you right now especially with our you know LLM rant I would go ahead and tell you how you can contribute so there's obviously back in ways that you know you can help build the build the product and the API as such this front-end development as well as I said improving the front-end features this newsletter content and we'll share you the newsletter as well we're trying to sort of journey this entire you know Chrome extension and you can if you love writing you can go ahead and send us and we can for sure put it up on our newsletter and finally infographics the infographic that you saw I created I feel like making infographics is very soothing so if someone let me do let me know and do contribute to the infographics and yeah reach out to us and you know finally also want to show you the newsletter so this is our newsletter it's tinyml.substack.com we actually covered one yesterday as well about the LLM conference as well I don't know how many of you attended it was way in the night but yeah you can also check that out and Soham's latest whatever he told you right now in a blog format so yeah that's pretty much the end but yeah feel free to ask us questions you know I know like this topic seems very complicated I mean it's I feel like GPT definitely like OpenAI definitely has made a great interface to interact with it but what happens behind the scenes seems quite complicated and we also started out like you know only a couple weeks ago so you know feel free to ask us questions you know don't worry about your questions sounding naive or something because you know we also started off nothing it can you know your questions can also help us and vice-versa so yeah does anyone have any questions you mentioned about vector databases in a question answering app how would you go about matching the query to a saved vector yeah so you convert the query into a vector as well and then you kind of search for similarity in the in the from the database to the query query vector okay yeah so you so you just output with with a very similar yeah with yeah you just find a similar vector and output that I'm pretty sure chat GPT does something similar if you go to chat GPT and ask something like summarize this archive paper for me and give it like an archive link so let's say an archive link from like a few months ago right which it's not trained on it'll find like in its vector database another archive link from like from when it was trained so it might so you know archive links are like two one zero four dot one two three four so it might find like two zero zero four dot one two three something like that and yes just something similar and output that yeah so something like some ball of the cosine similarity between yeah exactly so kind of like yeah cosine similarity is is one of the yeah one of the ways you can find similar things yeah awesome thanks uh yeah I got a question on that two questions so for the at least as a GPT three it's leaned heavily towards English content and I believe it's also true well I'm not sure how true that I haven't checked how true that is a GPT for well based on observations and going back to the idea of accessibility how have you observed its performance in other languages rather than say English and Chinese like say Vietnamese Thai or Indonesian yeah that's a good question neither of us know any of those languages you mentioned we've tried with Hindi and like Bengali and so works pretty well on that yeah also like we've just tried with very simple like business use cases so not like really long texts or like articles or something but like small paragraphs that you might write in an email or in a in a slack message something yeah and actually on that idea so since the tour right now is primarily for translation as as the demo you're showing off was consideration between just choosing the existing array of translation API is out there translation neural network API is out there versus like saying uses GPT's so I think like the translation is definitely like a part of it but the idea is also to sort of use the sentiment the way that GPT was built like because the fact is it can formalize it can change those stones really well and that's what we wanted to capture because oftentimes when you move from your native language to English a lot gets sort of lost in translation and we want to make sure that such users don't get left out as compared to people who usually speak English so the idea is definitely translation but an added part to it is this formalization and I don't think there's an API that does all of that currently and if there's something else on top as well we want I think like opening opening eyes like GPT API twice is that or maybe even just a safe cause a translate function uses one is the trans cheap the translation API and the formalization just uses GPT that's a cost I mean the cost is like we haven't actually seen the cost right because we use GPT plus and they give you like 18 dollars right so we haven't got to a large scale and we have more users using it can we actually understand the cost yeah good great questions any other questions from the audience we've got two experts here we've been playing with GPT's since the old days which is a nice quote any any other questions from the audience before we break for tea yes please hello I'm from China and then probably as you know we cannot easy to access open API like chat between China sometimes we can find the agent I try to use the image search okay ask chat GPT to provide some images but I cannot get what I want so would you help to explain what's the purpose of the what's the principles of the image generation in the charge of it yeah so yeah thanks we know that the latest version of charge GPT can use images and and you know combined images and text but they haven't released that API yet they've just shown us demos of it and then there's other models that can you know do images and and extract text from images and stuff like that we haven't right yet regarding like not being able to access and all so right now there's there are a few open source language models the problem with that is it's very hard to host those models because you need a GPU it's it's pretty expensive if you want to get low latencies that's something we want to work on eventually yeah but we're just figuring out the logistics of that yeah good I think one of the big learnings there's as you mentioned on cost is it's it's easy and quick to play with GPT but as soon as you want to go beyond just experimentation then you've got to take these other factors into consideration still open for questions otherwise of course the speakers will be around outside and over tea so sorry yeah we'll be in the event hall on the in the woman who code boot somewhere near there so yeah if you have questions you can always come there yeah okay thank you so much everyone