 All right, thank you everybody for coming to the session. And today I'll be talking about the rise of vector databases. What are they? Why should you care? And in particular, how do they relate to the sudden surge of interest that we've seen in large language models or LLMs? And before we dive in, a quick introduction to both myself as well as the company. My name is Frank, a pleasure to be here giving this talk to everybody today. I do a lot of machine learning at Ziliz and they're my socials as well as email down there. If you wanna get in touch with me, if you have any questions, comments, or concerns, I'm happy to talk about them, I'm happy to take them. So a bit about us first, a bit about Ziliz. So we are the company behind Milvus. Milvus is actually a Linux foundation, specifically an LFAI data foundation project. Milvus is the world's most popular open source vector database, freely available on GitHub, coming close to 20,000 stars at this point. So we've seen a lot of great growth in the community as well. And for folks who aren't quite aware of what a vector database is, bear with me, I'll get to that sort of a little bit later on the first section as well. We are headquartered in sunny San Francisco or the San Francisco Bay Area. So one of the perks of being in Silicon Valley and the product that we provided, the managed service that we provide is called Ziliz Cloud and it is based on Milvus. And it gives you flexible, powerful, storage, search, indexing, querying for any type of embedding, any dimension, and you get lightning fast queries, absolutely zero ops overhead and very, very cost efficient storage as well, very important for any kind of database. So sort of a quick intro into some of the things that I'll be talking about in this presentation. First I'll go over what unstructured data and embeddings are. I'll then jump into what a vector database is and it'll tie in very closely with a lot of the knowledge that we gain in the first section. And then I have a very special section on vector databases plus LMS as well. I've seen a huge sun surge of interests in things like chat, GBT, Bard, Claude, so on and so forth. And I really wanted to dedicate a section to show you how these fit into, how these really fit together and do a single stack. Then we'll go over some key takeaways before we sort of end the session there. So without further ado, let's dive right in, right? Unstructured data and embedding. So I always like to start off these kinds of presentations with a question. What is unstructured data? And really unstructured data is any data that does not conform to a predefined data model. What does that mean, right? If we look at the history or the evolution of data, way back to the 1960s when the ENAC came out, really a big factor, a key motivation for having computers was storage, search and indexing of data. But a lot of data that started on the early days was structured data that could be fit into, let's say in our DBMS as a table, or it could be, let's say by the mid 2000s, we had a lot of document databases coming out, such as MongoDB, a wide common store, such as Cassandra. And now we are very squarely in the middle of the IoT, the mobile device era. And so much of the data that we generate today, whether it be graphs, images, video, geospatial data, audio, all of this is great examples of unstructured data. Data that you can't really fit into a traditional table, table-based database, relational database, or an object database, right? And you need some special purpose-built data or database to be able to store this type of unstructured data. But it's very hard, right? If data is unstructured, how do you give it structure? How do you make it something that is indexable, that's something, how do you make it something that a computer can understand? And the way to do that is with vectors, hence the name vector database. Now one of the key paradigms that we see here is taking a knowledge base that we have, whether it be images, video, audio, text, natural language, in this case, something like documents, and using deep learning models, we turn them, we perform inference on them, we turn them into these high-dimensional embeddings with these high-dimensional vectors, or tensors, right? And then we store them in a vector database, such as Milvus or Zilla's cloud, right? And this is just to visualize exactly how these embeddings work. Now, if we go back to the previous slide, we say, okay, these embeddings are generated from deep learning models. I won't go too much into the details here, but in this picture here, what you see is taking images, in particular, taking images of digits from the MNIST dataset and looking at nearest embeddings. So if I have, let's say, 128-dimensional embedding, embedding of size 128, all of my images can be turned into these types of, can be turned into these fixed-length embeddings, and I can see, okay, given a query digit, or query image of a digit in this case, what are my nearest neighbors? And credit to Eric Byrne. I think these were sort of a part of his slides a while back. But these digits here, what you see are the nearest neighbors to the digits that you see on the left, right? And I think there's a lot of semantic information behind all this. If we look at the bottom left-hand corner, is that a six or is that a zero? Well, I think if you look at the nearest neighbors, it gives you the neural network, or in this case, the set of embeddings that we have is a bit indicative of, hey, it could go either way, right? That's really the power of embeddings generated by these trained neural networks by these trained machine learning models. Now we can do the same thing for food. If you train a classifier, if you train, let's say, either in a self-supervised or a supervised way, images over food, you can see, hey, these are very related as well. At the top, I have fries. Then I have, that looks like something along the lines of Korean barbecue on the second row, and a variety of other different types of dishes. Third, fourth, and fifth rows as well, but they're all semantically related to my query image, to my input image on the left. Now beyond just images, we can also do, we can also look at natural language as well. We can also look at words. So this is a very, very old model. This is word divac, and this is the TensorFlow projector. But if you look, words that are a little more scientific or towards the top, words related to a lot of functions, processes, more mathematics, are over in the right. And then you have a lot of names on the bottom, right? And this goes to show you that, hey, these embeddings really do encode some level information. Unstructured data, which is more similar to each other, have closer embeddings. And that is really the key principle that vector databases leverage to be able to do large-scale search, indexing, and storage of this unstructured data. So, you know, a bit about, you know, I wanna, now that we sort of have a better understanding of what embeddings are, I wanna talk a little bit about nearest neighbor searches. Well, now as we saw from in the previous three slides, right, nearest neighbor search is really about finding similar data. As embeddings are closer to each other, we get more similar data. And vice versa as well, that means that if I have further embeddings, embeddings which are further away from each other by some distance metric that I use, it means my data is less similar, right? Closer embeddings equals similar data, whereas further embeddings equals more dissimilar data. But this property is great, and it allows me to do a lot of amazing things with embeddings such as recommended, you know, such as build recommender systems, you know, deduplication, so on and so forth, but brute force search is really incredibly slow for, let's say, anything above one million vectors. If I have, you know, if I have one million vectors and I try to do brute force search over it, it's gonna take me a long, long time, right? And you can imagine if I have, let's say, internet scale, this, that's a billion vectors or maybe 10 billion vectors or even more, it's pretty much untenable. There's no way that you can build a real time system by doing nearest neighbor search over one million vectors and that's really what brings us to approximate nearest neighbor search. The idea being that if I can build an index of vectors, all right, this will penalize, you know, I'll get a small penalty in recall. Basically, I won't have my exact nearest neighbors, but if I can get pretty close to say 95, 99%, I am okay with that penalty as long as it significantly speeds up my search process, right, so instead of doing brute force search, you know, let's say instead of doing brute force search, something like order n over the entire data set, maybe if I can get it to log n, right, that's something that would be very, very good for my particular real time application. And there are a variety of different index types possible. I can go for hash-based indexes, quantization-based indexes. I can look at graph-based indexes, tree-based indexes. There are a lot of various different types of indexes and these indexes are really what allow me to do very, very scalable approximate nearest neighbor search, allow me to do it very, very quickly as well. All right, now again, as I mentioned, I won't go too much into detail of each index, but happy to sync up with any folks offline. If you have any questions, you can email, connect with me on any of my socials as well, I'd be happy to chat about that. So what is a vector database, right? And a vector database is any database that is purpose-built, keyword-purpose-built to store, index, and search, search or query, large quantities of embeddings. And Zilliz and Milvus, Milvus in particular is a great example of a vector database, right? And why did we create a purpose-built database, right? What are the lessons from this particular undertaking? And really, you have these vector search libraries such as Face, HNSWlib, Scan, these are great libraries that give you high-performance vector search. But if you want to be able to go beyond that, if you want a lot of traditional database features, if you want replication, if you want failover, if you want automatic indexing, auto-index, this really, you have to have a vector database, a vector database such as Milvus, right? It goes beyond a peer library such as Face. And that is really why purpose-built is necessary, but purpose-built is complex at the same time, right? And if I have, oftentimes, I'm an application developer and I want to, let's say, maybe have an application that has high query load, maybe a couple months or a couple years down the road, I find, hey, maybe I don't need that much query load, but I want to be able to do high insertion and deletion. I want really, really high edit capabilities for my database. Maybe I want full-precision recall. Maybe I do want to use brute force search rather than an approximate neighbor search if I have a small dataset. And maybe I want accelerator support as well, right? And there's many, many more different types of applications and different types of query paradigms and different ways that people will use your database or your vector search library. And that is why purpose-built is very hard. Now, I think in prior times or in prior years, I would actually go into each of the different layers of Milvus. This is the Milvus architecture. Again, Milvus being the world's most popular open source vector database, part of the LFAI data foundation. But I won't go too much into that today, but just know that there are these different layers inside of Milvus. I'm happy to point you to some previous sort of presentations or work or articles that we've done, which talk a little bit more about this. But there are different layers inside of Milvus and there are very, very specific types of nodes that are there for certain things. All of this can run on Kubernetes. So it is a distributed database. I know that sort of scares some folks here saying, hearing the word distributed database. But we have standalone versions as well. We also have embedded versions, versions that you can just simply pip install if you're using Python, or simply as simple as really grabbing a library, downloading that and going from there, right? So all these different layers are very, very important in that they give you the capability to have different types of applications. So if I have, let's say a query, if I wanna run a lot of data over query, or if I wanna do a lot of querying, I simply expand the query cluster. If I want to do a lot of insertions, I'll expand the data cluster. And if I need to maintain a very up-to-date index at all times, I can expand the number of index nodes, the index cluster that I have as well, right? So again, I won't go too much into this, but happy to point you to some other presentations which dive a little bit deeper into the Milvus architecture. Now I wanna spend a little bit of time talking about vector databases plus LLMs, right? And we've seen a huge, oh, that's the right word to use, surge or something beyond, pick any word beyond surge. Surge and interest in chat GBT and a lot of other auto-aggressive language models, right? And you have chat GBT, you have Claudia of Bard and these are great. But oftentimes what you want to do is you want to sort of, not only do you have domain data, let's say internal domain data that you wanna inject into these LLMs, but oftentimes you wanna minimize hallucination, you want to be able to use these in production as you would any other production project, right? And I'll give a bit of background about how these sort of auto-aggressive language models work, GBT for those who don't know stands for generative pre-trained transformer. So it is a transformer that you have pre-trained in a generative fashion using causal language modeling and essentially GBTs are stochastic, right? They, you know, for folks who are similar to recurrent, to recurrent nets, what they do is they predict future tokens. So in this case, if I have a sentence, Milvus is the world's most popular vector blank, the highest probability for that particular blank would be database. Second highest could be search, vector search engine. And then I might have the long tail might include embedding, vector embedding database or vector embedding search engine, right? But the idea is that based on the data that has been put in that these GBTs or these, you know, causal transformers are trained with, I can get a distribution over the entire vocabulary on my output. And if I simply take the, you know, I can simply take one of the higher probabilities and have that be the output result. But there's a huge downside here, right? If GBTs are stochastic, that also means that it can introduce a lot of hallucination. Now, if I have one word, if you can imagine I have one word or one phrase that is sort of one filled in token that is incorrect, then yeah, for the rest of the, for the rest of the length, and you know, for the entire, for whatever other completions go beyond that, it's going to be incorrect, right? You can have these very, very plausible sounding but factually incorrect responses. And that is a big downside for GBTs being used such as chat GBT, GBT for, you know, Claude Bard for these being used in production ready environments, right? So, you know, a great example of hallucination is the question, how do I perform query with Milvus? And if I ask chat GBT, if you ask chat GBT, I might give you something like this. Of course, it'll give you a different result every single time. Again, because GBTs are stochastic, you have these, you get a probability distribution over all potential tokens. And, you know, if I ask chat GBT or just basic chat GBT, or even GBT for, I believe, you know, how do I perform a query using Milvus? I'll get this kind of response. And at first glance it looks right. Yeah, right? You know, it doesn't really look like there's anything wrong with it. First, I connect to a particular Milvus server, then I create a collection, you know, I insert some random vectors, I build an index and perform a query. It looks right. Yeah, but actually this is not correct. And the reason is because interfacing with Milvus is not done via this imaginary Milvus client inside of Python, but really it's done with a connections object, right? So you would call, you would call connection connections.connection and you would again get, you would use that as the object with which you interface with Milvus, not this imaginary Milvus client. So how do we fix this problem, right? What is the solution to hallucination? And really it's quite simple. It is to inject domain knowledge into chat GBT, into LLMs, really to try to force in whatever way you can these LLMs to read from a knowledge base that you provide rather than, you know, whatever it was trained on. And the key point here that I want to emphasize that this domain knowledge is stored in a vector database and a vector database really is the only way to store this domain knowledge. You can think about it, right? So if I go back way, you know, many, many slides go to this particular slide right here, you'll see that you'll understand that, hey, natural language can be represented as embeddings. In particular, they can be represented semantically as embeddings as well. Now this is to words, but it extends to sentences, paragraphs, even entire documents as well. You can do that representation there. Now if I go back to the slide right here, as a domain knowledge is stored in a vector database, that's great. How does this solve my problem? I'll sort of get to that in a little bit. I'll get to the architecture of a demo app that we've built. But really, you know, if I through this demo app, you know, this is an example, you know, prompt chat GPT or GPT 3.5 with documentation from Milvus, you'll see in this case that actually gets the correct answer. I do connections.connect instead of having this imaginary, you know, Milvus Python client, right? And then the rest of it is very, very similar. I simply create a collection. I have a query vector and I will query that vector and print the results in a sequential fashion. So we call this particular framework the CVP framework. And the idea is that we can view these large language model applications as a fully, you know, as a general purpose computer. Computers have processors, they have storage and they have code, right? And this is how we break it down. C in this case is chat GPT or any other autoregressive language model. This can be interpreted as the processor block in the CVP framework. You have V, which is a vector database. Again, this can be any vector database such as Milvus. You can interpret this as the storage block. And then you have P, prompt as code. Prompt as code is now, instead of using regular, you know, instead of using, let's say Python code, C code, C plus plus code, or even assembly code as the machine language, now you can instead use prompt, right? And this is what gives you that interface that human interface between your processor and storage blocks. So how do we implement the CVP framework in practice? Right? And I want to sort of harken back to a slide long ago using vectors to represent data. And the idea here is that we can take our knowledge base and represent them as vectors. Why do we represent them as vectors instead of using traditional search? It's because these vectors are semantically representative of your input data. So that allows me to match, let's say, a very short query of the semantic meaning of that query with the most relevant documents. That's very, very important, right? That's really what a vector database can help you do. And that's really a huge value proposition for vector databases themselves, such as Milvus, such as Zillis. And an example application that we built using, and using this framework that we just described, the CVP framework is called OSSChat. Now, if I actually go back to this slide right here, this was actually generated with OSSChat. And OSSChat is a project that we used that allows you to chat with open source projects. And essentially it implements the CVP framework, right? So it will actually take documentation online, right? Documentation about open source projects from GitHub, from the website. It will parse and store those documents as chunks into Zillis Cloud, as embeddings. And when the user asks a question, right? The question actually is then sent first to Zillis, which retrieves the most relevant documents and then sends those documents to ChatGPT as prompt. Those, the most relevant documents are given to ChatGPT as prompt, same with the question. And then ChatGPT is going to be able to give you much more precise answers. Again, going back to this slide, it gives you the right answer in this case using connections.connect, rather than this imaginary NILVS client that was written in Python, right? Rather than hallucinating this NILVS client. So that's OSSChat. There is a, you know, we have a blog post about this as well. And for folks who want to go and play with OSSChat, you know, yourself, it is available online at OSSChat.io. Again, that's just one word, OSSChat.io. Feel free to go online. If you have your own, if you have your own projects or open source projects that you'd like to see on OSSChat, we'd be happy to, we'd be happy to sort of take your suggestions there. Now we're constantly adding new open source projects every week. And there's, you know, we will continue to maintain this and sort of have that readily available for folks who are interested as well. So this is an implementation of the CDP framework that sort of, that we've created at Zilis and an example of how you can use a vector database such as NILVS to be able to really build out these larger language model applications, right? All right, sort of coming up on the tail end of the slides, I only have a couple more left, but I wanted to go over key takeaways, right? Key takeaways from the NILVS community. We've seen a great, really a huge amount of people who are coming to NILVS today, who are interested in using NILVS with LLMs. And the first is that ML models plus vector databases are key for unstructured data management. You know, if I go back to the previous slide, you know, we have all this unstructured data out there. How do we, you know, documents, images, videos, how do we manage all of that? The answer is we run them, you know, we have these powerful machine learning models that are trained in some way to be able to take your input data, your input domain. And we use these embeddings. We store those embeddings inside of a vector database, right? They are key for unstructured data management. NILVS is with you every single step of the way, right? And whether you're a small-time developer, you know, you're a solo developer, maybe you and one or two other folks building out an application, we have versions of NILVS for you. And then if you, as you scale, as you decide to go into production, as you do need to scale horizontally or vertically, whichever you choose to do, NILVS can do all of that for you. You can simply upgrade and migrate from an embedded or a standalone instance into a cluster instance very, very easily. And really vector databases are a key component of the LLM stack, right? So if you look at LLMs, if you, you know, if you look at chatGPT, GPT-4, Claude Bard and a variety of the other, you know, any of the open source, large language models as well, you know, stabilityLM, all of these, really, if you want to be able to minimize hallucinations and you want to be able to inject domain knowledge into those LLMs, you have to use vector databases, right? And, you know, these are really, you know, we had only three key takeaways from this presentation. It would be these three. And really, I hope I have this presentation. I know it's coming to a bit of an abrupt end, but, you know, thank you for listening. I hope this presentation was really useful for you. And if you do want to try Zillis Cloud, which again is a managed version of, uses Milvis as the underlying vector database and it builds it to a lot of enterprise features. And on top of that, feel free to try Zillis Cloud out for free at Zillis.com slash cloud, total of $400 in credits that are available to you once you do sign up. And those are our socials down there as well. Feel free to get in touch with us at Zillis Universe or our LinkedIn. And if you need any Milvis specific help, there's our Slack there as well. We're happy, you know, we're happy to take any questions that you might have. If you're interested in contributing to Milvis, feel free to join our Slack and go from there, right? And we also have a GitHub as well. We have a variety of sort of Milvis connectors and different ways that you can visualize your data inside of Milvis there too. So again, thank you. That was sort of an introduction as, introduction as, well, not really introduction, more of an introduction plus a deep dive into vector databases, Milvis in particular. I hope that was useful for you. Thank you for listening and I hope to see you sometime soon.