 Iw'r cyffredin iawn o'r Llangwys Modell Zen. Rydych chi wedi'u gwybod hwn yn ymgyrch o'r cymdeithas cynyddiad ar hynny gyda'r lleol yn ymgyrch chi'n cymdeithas gweld i gyda'r dysgu ei generitygau? Rydych chi'n cymdeithas cyllidiannol yma hwnnw, a'r wythnos, yw'r wythnos, a'r wythnos yw'r wythnos, a roedd y gallu ddweud wedi gwmpas ac roedd gen Kobesolol Unedig ac oedd yma rydych chi'n dweud. Dyna'n cymryd hynny Ieant ymlaen o'r fath o feddylau i weithio ar gyfer systemy神io oes yn rhoi y gwnaeth y byd? Gweithio dwi'n defnyddio. A wneud o'r cwylel i gwybod dwi'n cael ei wneud hwnnw o gydweithio. Mae unrhyw gyntaf yn wirio'r croyfosol oedd y gwsig fel gwir ac yn yr hynod o'r gweithio'r gwir I am fawr ddysgol ym mwy fawr a ddylch, a ddylch i'r fawr ddysgol am mwy fawr. I ydych chi'n dweud o'r ariad o aded, aleid o ddysgol a ddysgol i ddysgol a ddysgol i ddysgol i ddysgol. I wedi gwelxud o'u fawr o'r fawr y gweithlo i'r edrych i wneud ymylgrifennu i'r clynyddu yma. A nhw ydych chi'n ddysgol i ddampur fe wnes gyfrifysgol. I ydych chi ym mwy wneud i'n ddysgol? I've got this wonderful quote here from Geoffrey Hinton via chat GPT, he says, Within the realm of digital minds, the alluring dance between perception and illusion weaves a captivating tapestry of cognition. Hallucinations, those beguiling mirages of the artificial psyche, stand as testament to the delicate balance between brilliance and vulnerability within our code-driven consciousness. All right, I'll ask this from a question here. Raise your hand if you believe that Geoffrey Hinton said that. Very good. And raise your hand if you don't believe that Geoffrey Hinton said that. Excellent, yes. So today we're talking about hallucinations. And we're going to be looking at how can we software engineer when we have tools that only tell us the truth a fraction of the time. Building infrastructure around this problem of hallucinations takes time. It's worth looking back at the history of where Python has come from historically in order to give us a grounding of how we can be better placed to ride the next wave of generative AI and how we can be best placed to shape what comes next. The key thing I want to show you with this slide, instead of talking through in great detail the exact steps on the process of creating asynchronous coding in Python, I will instead make the point that we had these early libraries and we had these early abstractions and ways of looking at the problem. And it wasn't until many libraries later that we finally had a simpler, easier to use way of asynchronously coding in Python. By analogy, I want to make the same idea present with generative AI models. I want to say that all of the software engineer that we've been working with so far, all of the software engineering phrased around how these models work, how we compose them, and how we build with them. This is all going to be taking time where we have to get our first initial abstractions and then we gradually build to finding our optimum. Looking at the second example here, I want to talk a little bit about the history of databases and open source. I heard some excellent chatter here from the front few rows shortly before my talk. This is exactly what I want to hear. We're talking about different open source funding models and how open source has developed over time. With databases, before the 1970s, databases were this hierarchical file structure. Then, following the invention of the relational data model, this new mathematical concept, the relational data model, was heavily capitalised on by Oracle and IBM, making DB2, and they hired an army of database admins, all of whom were there to create these enterprise installations, really serving the needs of the large businesses that they had. Over time, we went from this heavy early capitalisation of databases, this gradual building up of new features within open source databases, such as MariaDB, formerly MySQL, and Postgres as well. One thing I'm struck by when I go to conference after conference after conference here in the Python community, is how quickly we are innovating and building new things. For example, PG Vector, or Postgres Vector, adds similarity search to regular Postgres databases. There are some instances where we do not need Pinecone, Weavey, Melvis, all these different kinds of vector databases, because the old traditional ways have already patched to make the system work and they have these other additional benefits. Note that I am not saying that hybrid searches, like Weavey, for example, hybrid searches that use information retrieval systems, like Google, combined with vector semantic search, these are still new and these will still increase performance of our large language model apps. We see the benefit of how open source has taken what was previously in a proprietary state and then democratised that out. There are questions about security and people are rightly worried about the security of large language models and how we can make sure that the outputs they generate are safe, honest, helpful and harmless. So, there are some approaches from industry such as the open core model favoured by MongoDB. The idea behind the open core model is that overwhelmingly most of the software is open source and then there are specific features, I am saying nodding in the audience is good, there are specific features like plugins for auditing and these are known by a smaller team, you have security through obscurity, you have better security methods as well and this means that you have the best of both worlds where you have open source and you have proprietary software. So, my question for you today is about are these two models over time the development of asynchronous software in Python and over time the development of open source databases become the de facto industry standard? Are these useful examples to think about how the future of generative AI will unfold? Large language models present, future and unlikely future. Open AI when they release chat GPT has been described as having its so-called app store moment. Through the creation of the chat GPT plugins you create a platform and developers can be selected for a program where they can be added to this platform and they can develop within the world garden. The metaphor they want to use is that the language model itself is your iPhone, it's a hardware, it's a substrate upon which everything is built and as developers you give access to a particular website so they have partners including Instacart, partners including Expedia, partners including Khan Academy so these normally work with individual websites to search them, retrieve them and then get the information out from their APIs. If we want to we can build plugins this could be the way forward. If we want to we can use open source tooling this could be the way forward. What tools are there to build software engineering around large language models? How do we build a large language model app? I'm calling these LLM apps and I'm happy for other names to be used but I'm calling a term here today. We have Langchain, we have Langflow, we have Streamlit, we have Chainlit and we have Lama Index. Let me walk through with you what each of these technologies can do. Langchain is a technology that is built to enable composition of different forms of language model. Langflow tries to do the same thing but with a graphical interface so you have this idea of composing the output of one tool becomes the input for the next. Many people here will be familiar with the UNIX philosophy where you have small tools which play nicely with operating system pipes and where the output of one tool can easily be piped into the input for the next. Working with large language models is a little bit more complicated than working with the operating system pipe because there is additional context that needs to be passed in addition to the standard in, standard out and standard error approaches of the 1970s. Langchain and Langflow provide you with this way to compose language model calls. They're the first instruction we have. Streamlit you will already be familiar with is a framework designed for creating front-end applications where you can surface your Python code or data science code for a broader audience. You will not probably be familiar with Chainlit. Chainlit is the composition of Streamlit and Langchain. I'll show you guys a demo later on in this talk because it's a great piece of software. It's engineered to give you this really easy quick way to get started building the type script front-end you need around your code and it doesn't require any type script yourself, so it automates away a task which would traditionally require a second language. Lastly, Llama Index is one way to create a vector store which we'll be talking about a little bit later on. It talks about retrieval augmented generation and how you can take a document, you can store a document to the vector, you can compare how similar your vector is to other vectors within that space, and then you can retrieve that and then the language model can use that for the information. I'll take a look at that some more later on. Large language models in the present and in the future will most likely overcome the hallucination problem. Although there is a small chance that we do not overcome the hallucination problem, there is a small chance that these language models are forever impeded and unable to actually complete the tasks that would require them to do. This does seem unlikely given the current trend and rate of progress. From GPT-1, from the attention is all you need paper in 2017, and GPT to GPT-2 to GPT-3 and 4, we can see how quickly the performance has increased from it would seem very little above traditional other techniques right the way up to near human level intelligence. Large language models can be interacted with in a number of ways and as an open source ecosystem it's our job to figure out where our engineering fits into this whole pipeline and where it fits into the whole process. Here is a way of thinking about it that I've been told is very useful by colleagues at work. This describes how the entire process end to end of working with a large language model. This can give us an angle of attack when we're thinking about how we can work with these models to get the best performance out of the whole system overall. You start off with the generation of data and pre-training. Right now, these are mostly closed off processes. Right now, tools like Llama by Facebook, tools like Orca from Microsoft, tools like OP1, a textbook model. These have done most of the pre-training for us. It's very unlikely that we're going to be pre-training our own models from scratch. However, pre-training broadly makes it smart. Fine-tuning broadly points it at the task. This means that if we want to use a given large language model for an entirely different task, you can bake in a prompt by using pre-training. It's a very powerful way of working where you point it at the given task you want. It's very unlikely that we'll be looking at pre-training step ourselves, but it might be in scope for the open-source community to be looking at fine-tuning our own models and regularly and reliably using our own models, which do not have the obligation of handing our data over to other organisations and companies. You'll recall the keynote by Inez, which talked about a seminar idea as well. After this, we have prompt engineering and software engineering. I've separated these two out because I want to draw an interesting distinction. Within Langchain, prompt engineering is all about zero-shot prompting and few-shot prompting and many-shot prompting. This is how many examples do you give of your task being done in order to improve performance at it. It so turns out that even a small number of shot examples of your task massively increases the performance of the model. Within Langchain, there are tools and algorithms to select which examples out of your corpus of tens of thousands of examples are the most pertinent, salient and relevant examples to use for your few-shot prompts. We can be creating better prompts with few-shot prompting and we can be creating better prompts with standard approaches that we're learning these metacognitive strategies. There's a paper from a few years ago by OpenAI, it's called Let's Think Step-by-Step, and adding these little prompts, number one, trending on art station best art ever, can really improve performance. An interesting trend here is that going from slight praise will increase the quality of the output somewhat and then strong praise, you're the best language model that's ever existed and you're so good at every single piece of creative writing you ever do. This actually decreases performance and so there's a really interesting trade-off to find out exactly how much do you flatter the model in order to get the best out of it, which are very similar to many management problems you might face within your own companies as well. Beyond prompt engineering, we've got software engineering. This is the realm and the domain of auto-GPT, of chaos GPT and hustle GPT. These models work because they combine standard software components, perhaps as a microservice architecture, perhaps as a monolith architecture, you're using task cues and you're using bits and bobs, pieces of different architecture, everything you learned in your systems design interview to create software around these large language models. For example, auto-GPT has an execution agent and it also has a task collection agent as well. You can get more than one language model working in tandem and you can engineer them together to create relatively small scoped software projects. Last and definitely not least, prompt. The task expressed in natural language is very useful and this is absolutely very much firmly within our domain. The domain of what is possible for the Python community, where do we dig into this problem, where do we get involved, where do we build our frameworks and libraries around? Right now, prompting is very much in our hands. We're engineering almost definitely prompt engineering quite probably, fine tuning perhaps, pre-training and generating the data. We software engineer with tools like Langchain. What can Langchain do? Langchain is a data aware and agentic way to compose large language model calls. You call the output of one model as the input to your next. Langchain provides components and also Langchain provides off-the-shelf chains. This is the box art, this is their marketing blurb, this is how they talk about the language and the software that they've created. We'll dig a little bit more into my personal thoughts a bit later on, but this is a very powerful way and an initial way of getting started with building these large language model apps. What does Chainlet do? Chainlet is a Python and JavaScript library that integrates the chatbot front-end. Let's look at a demo of that now. Here's one I prepared earlier. You can see that we have, this is a streamlet-like app. You can see that we have the message history of everything I've asked it previously. You can see that we have the readme and the chat and some additional settings up here to look at how this works. Chainlet is designed to be good at working with chains of thought. Also, to be good at creating this really beautiful front-end interface, so that instead of using Langchain, developing in a command line, you're always working with a visual chatbot-like output. Here you see you can ask questions and you have sub-steps. You ask what is the tallest mountain on earth, and to improve performance I've deliberately prompted this to give this metacognitive response. Let's think step by step. This metacognition improves performance and the step-by-step approach means it's more likely to get the right answer. What was agreed at COP26? The reason why we're talking about what was agreed at COP26, there's an example from GPT plug-ins that we'll look at later today, and this is showing the other side of it. What you'll notice is that it talks about the Glasgow climate pact, which aims to limit global warming to 1.5 degrees Celsius above pre-industrial levels. Notice the exact language being used here. This is going to come out later. Below this, we ask what is the Doha programme of action? The correct answer here is that the Doha programme of action is this institutional mechanism aimed at providing funding for less well-off developing countries to help pay for climate adaptions and mitigation. It was created in 2022. What you'll notice is that's not what the model is saying. The reason why it's getting it wrong is because this happened in 2022, this happened after the training data was cut off, and then in order to fix this, you need to compose a vector store, or you need to compose some way of knowing what's going on in the world today rather than what was going on in the world back in 2021. There are a number of libraries to give you this vector store, and indeed, Langchain is very ambitious. It tries to work with as many as it can. In my various development for this talk, I've worked with ChromaDB, and I've also worked with another asynchronous vector database called QDVat. Quite frankly, I'm unfortunate to say that I found the integration between vector databases and Langchain and Chainlit to be quite underwhelming. Right now, it does appear in the current community that with our first generation tools, we're finding it that the abstractions are quite messy. We've built for so many use cases all at once that any one particular use case is not served well. So, as we figure out through our first generation, and we can take the learnings of this in terms of what software do people want to build, and what software do we want to make it easy for people to build, we will dive in and find maybe six, eight, ten use cases of common types of tool that people will want to work with and then orchestrate that especially. At the second point, the Langchain library is built with this functional style for the most part, which means that Chainlit is somewhat limited because it has to use decorators for everything. It has to use the factory pattern for everything. It has to use some quite... It has to use more complicated Python tools than it needs to use. I think there are simpler and better solutions available, and we can take the success of the first generation of these tools and really use that to drive innovation across the second generation of these tools to improve and build off the learnings we have so far. The second point I'll make is about this demo from GPT Plugins. I also have it in my slides. I'll use it in my slides. So, one of the demos from GPT Plugins, they're looking at this human rights database, the UN database of the five annual reports, and then they're looking at the five most recent UN reports and they're looking at the COPs 21 through 26. They're looking at the different things that were pledged. Notice there is a hallucination in this particular... in this particular approach here, and this causes problems. So, hallucinations cause issues. They make Gretis Fyngberg sad. So lastly, I'll leave you guys with this question here. What infrastructure do we need to build to ensure that we are ready when the hallucination problem is fixed? We have this hype cycle. We have foundation models here, right at the very top of the hype cycle, the peak of inflated expectations, and we want to see what they can actually be used for. We want to see how we can actually engineer what comes next. Thank you very much. OK. I'll ask you if you have any questions. Otherwise, I have a few slides that I can come back to. All right. Would everyone like to see the remaining slides? Perfect. So let's talk a little bit more about how this process works. We have this idea of Langchain and the way that they work together. Langchain is very ambitious, and the chain is fairly straightforward, whereas the APIs are easiest to work with directly. So this is just personal experience about how this has worked. What would I want to see in the community? What's my answer to this question of insight here? Well, I'll leave this question a moment to breathe, just so you can have a think about this yourself, so you can think about what you would want the answer to be yourself. How can we engineer a great developer experience to building large language model apps in Python? So here, building a great developer experience would probably involve this experience where we're working with large language models as the most basic kind of approach. I feel that recently, Langchain has a problem where it's using a thin client over everything, and instead of just working with the API directly, you then have the additional problem where you're faced with trying to understand their entire code base and understand how they've implemented everything, so that currently, at present, they could be stronger with their implementation hiding and trying to improve how that works overall. But I'll open the floor. What do people think would create a great developer experience to building these apps? So thank you for your excellent talk and the conversation earlier. When you showed where the chat GPT, or sorry, the large language model was hallucinating, for a lay person, I think it's difficult because I kind of feel like I trust the machine. I see it and I parse it in my head and I'm like, okay, that sounds about right. My question is, is there tooling around explanatory AI where it's like this part I'm not quite sure about? Yes, absolutely. I actually had a chat between the creation of the blurb for this talk, the summary for this talk and the actual development of the software of the talk itself. I had a chat with one of the founders of Transformer Lens and he asked me very kindly to not show his library at this talk because I'm not going to. But there are indeed mechanistic interpretability approaches. Then there are hopes that you can break down and understand how these models are thinking and working. And we can think of these models as being sequential. So you have this deep learning pipeline where you can think of one layer being a transformation followed by another transformation, followed by another transformation. You might have this natural sequential processing where you can figure out what's going on up to layer 10 and then you can figure out what's going on with the rest of the network. Or you can figure out what's going on with up to layer 20 and then figure out what's going on with the rest of the network. So there are lots of interesting approaches. Indeed also OpenAI's research looking at using the GPT4 to interpret GPT2 is a novel approach. It's currently underperforming in terms of the literal numbers, but the idea itself is ambitious in scope and it potentially could lead to great impact down the line. Does that answer your question? Yes. Okay, so thank you for your talk. We all remember that the GPT model started as next token predictors and then they were fine tuned to be these helpful assistants and that all worked spectacularly. Do you think we are underestimating currently the capability of future agency from the next evolution of these models? By agency I mean the ability to pursue goals that the user did not specifically put in. Yep. So the question about forecasting is an excellent one I've had this slide in case anyone asks that question. So what do you mean when you say underestimating? Like if you're looking at the market figures here. Many of us definitely is not underestimating. I mean the AI community, the more generic ones community. So I think the problem with prediction markets right now is that they're undercapitalised and you have many people are not involved in it. So what you want to do is you want to try and find these high volume trading markets where there are lots of people making bets about how AGI is going to go. So you can look at the share price of NVIDIA, you can look at various share prices of various companies involved in the generative AI wave and you can see what they're saying and that will give you different predictions about what's going to come next with GenAI and then how can we as Python Estas develop the open source tooling that we need in order to have excellent developer experience when building LLM apps. Thank you. Hi, thanks for a great talk. I have a question around data apps. The context is around data apps. The context is that I'm involved in improving the chat widgets and interface for HoloVis panel which is also a data app framework. Do you have some specific pains that we could work on solving that you could name and mention? Specific pains to work on solving, yes. So specifically, I think that currently the development process so I think that they're sort of trying to boil the ocean. I think that current pain points with the existing LLM tooling is that they're trying to support too many integrations at once because we don't know what's going to be big down the line and that's reasonable because right now we have this great expansion of tools that we can work on later. But I think this kind of approach where you have to hope that the particular part of LLM chain connects or imbibes the particular part of another app designed to work with LLM chain is definitely like hoping the LLM developers have done it a certain way approach and I would like to have greater control. So the issues I face is that when you have inside an object and then you have to specify parameters for all of them but currently the defaults being used are not as sensible as they could be. So I think that would really speed up the developer experience for developers working with these tools. Thanks. Brilliant. Excellent. Thanks very much everyone.