 Welcome to this brief introduction to private AI chatbots. AI chatbots are widely used by now and lots of people improve their productivity using tools like ChatGPT. So why run your own? Firstly, there's the issue of censorship. Both the model itself and the guardrails built around it means that sometimes when you ask ChatGPT, ChatGPT simply says no. It refuses to answer your question. On top of that, you have confidentiality issues. For example, if you're working with client data or personal data. And finally, there are the costs. If you want to do large-scale applications like run ChatGPT through all of PubMed, it can easily cost you well in excess of US$10,000 to just do a single run. Today, I'll talk about some of the open models that are available from resources like Huggingface. I'll talk about software to run them, specifically two different tools. And finally, I'll talk a little bit about the hardware requirements for running such models. Let's start with Huggingface. Huggingface is an open repository that contains thousands of large language models that you can download and run locally. How large are these models, you might be asking? If we start with GBT-4, it is estimated to have more than 1,000 billion parameters in it. And GBT-3.5 has 172 billion. The largest open model currently available is Falcon 180B, which has 180 billion parameters. And other recent models that came out that are very powerful are Mixedroll 8x7B, which is a 46.7 billion parameter collection of models, and Saffa 7B, which is a 3.7 billion parameter built on top of the Mistral 7B model. And finally, you have small models like Microsoft Fi2 with only 2.7 billion parameters. Size, of course, does matter. I'm not going to lie to you. But smaller models can really surprise in terms of their performance. For example, the Falcon 180B rivals GBT-4 despite the latter having more than 5 times as many parameters, and Mixedroll 8x7B rivals GBT-3.5 that has more than 3 times as many parameters. However, if you want to run models on your own desktop computer or laptop, realistically, you're going to be running models of 7 billion parameters or smaller. These, however, are sufficient for many tasks. For example, rephrasing text, transforming data, and writing code, which can be done quite well by models that have been dedicated to that task. However, 7 billion parameter models are not going to be good for question answering because being so much smaller, they simply know much less. The arguably easiest way to get to run these models on your own computer is by using LM Studio. It's a GUI application that is available for Mac and Windows, and there's also a beta version for Linux. It's easy to install, and once you've installed it, you'll see that there's hocking-face integration directly via the GUI. As you can see here, you have both suggested models that you can download, you have search functionality for finding more models, and it even gives you hardware guidance showing which models are likely to be able to run on your hardware. Once you have downloaded and installed some models, you'll have a chat interface with lots of configuration options for how to run the models. This is how it looks with the main part of the screen being taken up by a chat. On the left side, you can have multiple different chats that you can switch between, and on the right-hand side, you have all the configuration options. There's also API access, so once you load up a model in LM Studio, you can access that model from, for example, your own Python scripts. Unfortunately, LM Studio is closed-source, and it is for non-commercial use only. If you're getting a bit more serious about running your own private AI chatbot, I strongly suggest that you look at the Uber Booga project instead. It is the Text Generation Web UI. It's available for Linux, Mac and Windows, and it's open-source. It is admittedly a bit harder to install if you've never used a command line before or never cloned a GitHub repository. This is maybe not the one you should start with. It does have some hocking-face integration. That is, through the web interface, you can download and install models from hocking-face, but there's no search interface or guidance. You have to find the model yourself on hocking-face and then paste in the URL of the model you want to install. Once you've installed at least one model, you have access to the chat interface through which you can interact with the models exactly as you would expect from chatGVT. It also has a lot of functionality that you don't see in LM Studio. For example, you can train models, you can install extensions, for example, to have multimodal models, and it runs as a web server backend which has some advantages. Because you can run this web server either directly on your desktop or laptop or on a GPU server if you have access to it and then access that one through the web interface. That gets me to the topic of hardware. What does it actually take to run this? Well, if you want to run in CPU mode, you can run small models on a computer with your 16 GB of RAM. However, it runs slowly in CPU mode. That's why you want to use GPU acceleration if at all possible. I've done all my experiments at home on my PC, which has an NVIDIA RTX 4070 Ti with 12 GB of VRAM. This means that I can load up models like Zephyr 7B and other 7 billion parameter models and run them very efficiently. If you have access to a GPU server with, for example, an NVIDIA A100 with 80 GB of VRAM, you can load up models like the Mixtral 8x7B, which can compete with ChatGVT 3.5. And if you have access to even more powerful hardware like a server with 8 A100s, you can even load up the Falcon 180B model and get something comparable to GPT-4 in terms of performance. This means that for companies and institutions, it is entirely possible to have their own private AI chatbots that are as good as what you get from open AI. So what are my conclusions? ChatGVT is everywhere, but we cannot ignore the issues related to pricing and security. Fortunately, open models and open software have become a viable alternative to ChatGVT. This means that you can set up a private AI chatbot where you have full control. If you are buying from a Titian or data scientist, I strongly suggest that you start using these to learn about them. And if you're a large company, I would say that by now there is no excuse for not having an in-house alternative that your employees can use instead of ChatGVT. That's all I have to say about private AI chatbots. If you want to learn more about the many exciting things you can do with generative AI, I suggest you have a look at this video next. Thanks for your attention.