 All right, well I want to thank everybody for coming to my talk. I hope you guys can all hear me. Okay I know it's sort of getting later in the day closer to the end of the day So I appreciate everybody for coming here and and for joining me and talking about vector search and vector databases more broadly Before I dive into the meat of this talk I do want to get a quick show of hands how many of us are familiar with vector search with vector databases more broadly Okay, so quite a few quite a few and I imagine everything you know the whole hype around LLMs and around chatcha bt I imagine it certainly helped but What Milvus is and you know what I hope you take away from this particular talk is why we need a vector database That's really really scalable. Why do we need something that supports? Billion scale high performance has a lot of production readiness built into it and how do we build that? How do we build that? Why is it difficult to build that into a vector database? And why do we need to build something from the ground up to support vectors? I know there's been a lot of other great talks today that mentioned vector databases a lot of them talk Dive deep into some of the nitty-gritty when it comes to vector search. This one's going to be more about Milvus itself It's gonna be more about the architecture more about how we built it the evolution from Milvus 1 1.1 to 2.0 2.1 and 2.2 and where we are today, right? Where do we see vector search and vector databases more broadly going in the future? So without further ado, I'll get started. My name is Frank director of operations head of an ML here at Zillis and Zillis We are the company behind Milvus. Milvus is the world's most widely adopted open source vector database It's got something like 25,000 stars on github. You can go download it play around with it as you see fit and Really if you look at our company, we've been doing vector search We've been building out vector search technology and vector databases since 2018 actually so Zillis has been around Since 2017 we're based in redwood shores. So just a 30 45 Well 40 40 40 to 50 minute drive up 101 I would say depending on whether or not there's traffic and where the key maintain a variety of open source projects The most critical and the most well-known of which is Milvus that you see on the left over there So before I talk too much about Milvus before I dive into some of the details around that I'm going to give a quick recap of vector search and why is it so powerful? Why should you care about vectors and why should you care about vector search? And really if we look at it vectors are a great way to represent unstructured data and unstructured data is everywhere Right. I think if you look way back in, you know, the 60s 70s and 80s when computers were first around one of the Key things that they were good and one of the key things that they were Built to do is store index and search a large quantity of data and back then a lot of data was structured You know it was in relational relational databases and these tabular databases and there was a lot of Format there was a data model that was associated with all the data that we could store if you think of an employee database For example, you could have things like id number date of birth name address Those are all stored in individual columns in that relational database But as we have moved into the mobile era the iot era Where data is coming in from a variety of different sources. We have image data video data So on and so forth That is when we really come to see the need for a way to store index and search these large quantities of unstructured data and whoops and if we if we really think about it vectors Are what unlock a lot of this unstructured data analysis for us The way that we typically do it is you have a knowledge base or you have an internal Set of let's say images video audio or text something that you want to index something that you want to understand And you use embedding models you use deep learning models to turn those into vectors And then you can store them inside the vector database and I know there are many other vector databases out there We're obviously partial towards zilis cloud. We're obviously partial towards milvis But there's a there's a variety of different ways that you can generate those embeddings And obviously many ways that you can store them as well And really the analogy that I like to use what makes these vectors so incredibly powerful is that they really Encode the semantics of your input data depending on how your embedding model is trained This is from the image bind paper. I think we're at least maybe about half a year ago Maybe a little bit sooner than that and one of the really interesting things that you can see here is that you can do more than just search for nearest neighbor vectors You can do cross modality retrieval So if you see in the upper left hand corner there if I have audio that I turn into a vector And I have images and video that I turn to a vector I can embed those into the same space and they represent semantically Things that can be very very similar. So the crackle of a fire if I retrieve related images in video I get images of let's say, you know a bonfire or or a fireplace so on and so forth I can do the same for let's say Um, you know modalities from text to Images and put to potentially even text to molecules as well It's not something that you typically think of as unstructured data One of the other interesting things that we can do with vector search is embedding space arithmetic Now if we look in the lower left hand corner there, we have I'm not really sure what that it's like a pelican or a swan And then if we add the sound of waves, right? So we embed both of these we embed audio and images in the same space And if we add these together and we retrieve in this vector database The most relevant pictures the most relevant images all of a sudden we get that same animal Except the backdrop is now a lake or the ocean or the beach or something That's something that potentially has water or the sound of waves rushing through it And that is the power of vector search That is why I like to say that vector vectors are the languages of machines And that's why I think everybody in this room should care about vectors And what they represent and what they're good for they're great for a lot more than just text And a lot more than just retrieval augmented generation even though that is what they are predominantly used for today So, you know as I just as I just mentioned Retrieval augmented generation The ability to use a vector store in conjunction with a large language model That is the way that is used today and For for folks who are not too familiar with vector search or vector databases I apologize. I am jumping through things a bit quickly, but rag is the predominant way that vector databases are used today but if we look Further into the future if we look maybe two three five years down the road Vector databases will be everywhere vector search will be everywhere To a certain extent you'll see them Used alongside relational databases or no SQL databases with equal ratio in different organizations And the reason is I'm going to go back to this slide a little bit earlier is because we have so much unstructured data And vector databases are clearly the best way Perhaps even the only way to store index and search all those different types of unstructured data, right? So that's really where we're going. That's really where we're going We're going from a predominantly rag use case for vector databases today all the way to things like video similarity search Recommendation fraud detection so on and so forth. They're you they're they can be used for so much more Than just retrieving the most relevant documents related to an input prompt for a large language model So now we've we've done a done a bit of a recap of sort of vector search why it's important Why you should care about it? I want to do a bit of a deep dive into milvis and you know recently I think There's been I think a lot of folks have been are becoming more aware of what vector databases are And what vector search is as well But I think for milvis we've been developing it for a long time since 2018 We've developed we've been developing it for five years and a lot of people come up to me come up to me and ask Hey, I can build a vector database in a weekend where I can build it You know over two weeks or maybe even a month, right? What makes milvis so special? Why should I care about what you guys have built? Why should I use milvis over let's say another vector database or some other options out there? And the first is well the first is that's not written on here first is it's open source 100% Apache 2 licensed It's a part of the linux foundation as well specifically the LFA data foundation You can go onto github if you google milvis The github link is probably the first one that that that pops up or the milvis website, right? It's open source The second is that it is a distributed system in particular. It is a distributed database Now what does that mean, right? You remember the title? I talked about billion scale. I talked about high performance and if you want to do that Being a distributed system is 100% necessity. It gives you exceptional flexibility Gives you exceptional scalability And it is the only way for you to be able to scale into many many many vectors so that you can so your application can support let's say a variety of modalities In whatever embedding space that you want to support And at the same time we have real-time read and write it is not a batch It is not a pure batch based vector search solution On top of that we also give you the capability to add metadata to each vectors We give you the capability to add scalar fields and in the future You will have the ability to build indexes over those scalar fields In addition to indexes over those meta over those vector vectors as well And we've done a lot of data-driven optimization inside of milvis to give you the absolute best performance So for example, right? We have separated the vector indexing the core vector indexing and vector querying layer. It's called nowhere n o w h e r e From milvis itself and that gives us the capability to say okay as new vector indexing and vector search algorithms come in We can add that into nowhere and it can be automatically supported inside of milvis What that also gives us is the capability to Let's say tailor the vector index the vector search algorithm to your Application needs and if you want something that is very very high throughput At the cost of higher memory you can use something like h and sw Or you can use You know it's a variety of different ones out there that you can pick and choose from if you want something that has lower memory consumption You can use let's say quantization based or you can use this gain and an n that's also supported inside of milvis as well And we also have the capability to do batch processing So both batch and stream processing if you look at bullet points two and four there, right? So that's really what makes milvis unique That's what makes milvis different and that is one of the reasons building out all these individual features Is one of the reasons we've been doing it for so long is we want to provide Everybody right anybody who comes to our open source site Who comes to our our github page the capability to build these confidently in production at billion scale And have very very high performance as well And I want to talk very very briefly about milvis one to milvis two I want to talk about how we've evolved the architecture and why we have done it the way that we have now if you look at milvis one dot out released I want to say in 2020 I believe it was You have all these different layers and they are all in what you can essentially think of Running the single machine perhaps even as a single executable. There's a proxy layer. There's a storage layer There's the index and there's the querying right all these are inside of a single machine And now the storage layer in particular is talking to object storage So if we have so that we can actually store all those vectors and the raw data along with the metadata Somewhere on s3 or somewhere on blob storage as well We have these insertions and all these searches are coming through as well And all these are essentially hitting a single machine you can do replication But there is really a shared nothing architecture to milvis one dot out There's nothing that you can replicate across all these instances consistently And what we have done in milvis two dot. Oh, hopefully this animation works, right? I might have to do one more. There we go What we've done in milvis two dot. Oh is We've split all of these individual components. It's now in a distributed system Right the proxy layer is now split from the query nodes And in particular What you'll see is that we have Individual nodes for each of the individual workloads That we want to do inside of our vector database. We have query nodes or query cluster to do the querying We have index nodes which do the indexing and we have data nodes which do the ingestion That writes into our data into object store, right? So milvis two dot. Oh, we've taken all of these individual components Obviously, it's a little bit simplified But we've taken all these individual components inside milvis one dot. Oh, and we've really split them out into a distributed system And that's what gives you the capability to scale as you see fit to get really really high performance and to Really take vector search to the next level Now if we talk about the read and write paths quote unquote More so the insertion or the search paths What happens is a search request will come through it'll hit the proxy first and that goes directly to the query nodes What resides on the query nodes is simply all of the vector indexes and we'll talk a little bit more about Specifically how that works at the data level, but you can just think of these requests for the time being they'll come through the proxy I apologize spelled spelled incorrectly up there But then they'll go to the query nodes and the query nodes will return results to the proxy Which will then return a result to the user, right? That is the search path If we talk about the insertion path, it's a little bit more complicated And hopefully by the end of this talk, you'll see why it needs to be a little more complicated and how these How basically having an insertion like this gives us really superior performance First the insertions will go through a log broker. Now it could be Kafka It could be pulsar we're working on our own as well and the idea is that this log it can be its Vectors are written into different channels in Kafka and then they can be read out by query nodes or by data nodes So for example if we're doing ingestion if we're doing insertion The data nodes will actually read out the data from log sequence And it will store all of these into the right of head log into our data in index into Blob files into s3 so on and so forth now once I've accumulated enough data enough data to Vector data, excuse me I've accumulated enough Vector data to form what is called a segment or a sealed segment That will get that will then get sent to the index cluster The index cluster will build is responsible for building an index across that segment And then it will write it back into s3 right or write it back into our blob storage Once that happens the query nodes can then take that and They can oh excuse me As these segments are being filled it will the query nodes will actually read some They'll actually read the most real-time data and do a brute force search over the most real-time ones I'll get to I'll get a little bit more into the details of that in a second And the index then the built indexes they also get loaded from s3 They get loaded from blob storage and then into the query nodes and the query nodes are able to do the searches as they see fit Okay So what does this give us right? This gives us a very very scalable Distributed system in particular gives us separate storage and compute We can scale all this as we see fit through kubernetes via microservices So each of those that you see over there is a microservice And we have this idea inside of milvus where log is data right log is a single source of truth And we can have all of that be basically We can have all that ground a lot of our vector searches and so we can be confident in the results This gives us these four points that you see up there scalability Gives us resource resource optimization So if you want to have if you have a very very right heavy workload You can scale up your data nodes and your index nodes and you can scale down your query cluster as well Gives us isolation and it gives us pooling as well. So all of these are very very critical and all these are really Traditional let's say database or database features that you would see in relational databases or no sequel databases But we have built them for vector search We have made we have put the database into vector search right that is the key thing to remember And I won't talk too much about this, but it's helpful to know so it gives a little bit of context into some of the stuff That I was talking about before is that we have these data structures in the side of milvus That really enable you to move the data around and to be able to do a lot of this optimization At large scale. So first thing is shards right shards are specifically for the right path And the right path it you know if we if you increase the number of shards you can boost the insertion rate A segment is a single unit of vectors inside of milvus You can think of it like this right as I insert more vectors in my vector database I have a growing segment Once that segment reaches a certain threshold it becomes a sealed segment That sealed segment is then sent to the index nodes to the index cluster to build an index over it Which will then get stored in s3 and also retrieved by the query nodes as well All right I do see some confused phases out there. So do we have any questions before I go to the next slide Okay, well Feel free to come up to me afterwards and ask anything that you guys would like if there are any out there And then there's also this process known as a compaction, right? So as we build more and more segments as we delete vectors from our vector database Some of these segments will become very very small. They'll shrink in size So there's a there's a concept known as compaction where we'll take many many small segments And we will merge all of them into a big one and re-index all of it store back in s3 Give it back to the query cluster as well so that it can continue to perform searches It can continue to perform queries as as we see fit So this is the high level if we put all this together This is the high level milvus architecture, right? We have object storage the lines at the very bottom We have those worker nodes that you see in the middle And this is what I was talking about a little bit earlier the query nodes data nodes and index nodes And all of that is connected to message store to either Kafka or to pulse are connected to this log broker and log broker for example, it will what that does is it enables It enables us to You know both the data nodes and the query nodes They need to read vectors that are inserted right in real time Query nodes need to do that because as I have growing segments, they are not indexed immediately and I need to be able to search those right The very top we have the coordinator service and that coordinator service You can think of as just the brains of the database of our vector database And it will control all the resources that you see in the worker nodes. We'll control that worker layer This is really the high level architecture. If there is one takeaway from this particular talk It is how we have built this why we have done a lot of the design decisions that we have For this particular architecture, right? So again, I'll leave this up here I'll try to remember leave this up here towards the end of the talk And feel free to ask any questions about it afterwards if you'd like So I think I only have a couple minutes left. I'm actually at time right now So I will I'll try to breeze through this pretty quickly. I'll try try not to take any more than five minutes So I want to talk very briefly about the future vector search and in particular the future vector databases as well And I said earlier that the possibilities are endless. There's so many different applications that we can use out there All right, so you'll see in the top left hand corner That is one that we built that is retrieval augmented generation Over open source documentation. It's called the OSS chat The one on the right. We also built that as well as molecular search One of the very interesting applications of vector databases in my opinion where you can actually embed molecules Into one standardized embedding space and then you can search for more for most similar molecules to get let's say You know if I want to tackle a particular symptom or if I want to You know minimize, let's say some some type of side effect, right? And then in the lower left hand corner that is a demo for reverse image search as well So searching for existing images in my database using images that are already out there All these are great examples of some of the things that you can do with vector search and with a vector database And hopefully open opens up your eyes towards the possibilities, right? It is great for more than just semantic text search It is great for more than just retrieval of documents or document chunks using prompts that you guys have But are we done right now we've built this really really scalable database You can scale to a billion vectors You know probably more than any organization out there needs We just have to fix bugs and continue to continue to improve the I guess improve the performance and and really minimize a lot of the kinks that are in the system Right, so are we done and I think the answer is a resounding no There's so much more out there that we have to think about Vector databases. Yes, they are a database at the end of the day But also I would say I would argue that they lie somewhere in the middle somewhere in between a im l as well as databases and data infrastructure And we have to continue to catch up to where machine learning is today, right? So things like multimodal models sparse vectors so on and so forth and just as a quick Well as a quick sneak peek into some of the things that we're going to be supporting very soon We will be adding sparse vector support Now this is something I know that a lot of other vector databases have added Uh, it's been it's been out there for a while best, but for example has sparse vector support But for us, I think we have sort of Sparse vector support we view it as something that is good But not critical for vector search for vector databases Because again, it is good for multiple modalities and sparse vectors currently are used predominantly for text We also have what we like to call multi vector support So if you have a single row in your database You can associate multiple vectors with it not just a single one And then we'll be able to have the ability to build indexes over your metadata over your scaler fields as well That's going to be important moving forward, especially as we do a lot more filtering So I encourage everyone out there to go and start building with mollus Right try it try it and we have three different versions The first that you see on the left hand side over there is what I like to call our embedded or light version You can just pip install it Pip install milvis and then import milvis milvis dot start that's it. You've got a vector database up and running You don't have to worry about anything We have a bigger version called milvis standalone That's meant to run on a single machine single instance sort of like the mysql of the good old days And give you really really high performance vector search on a single machine a single server And then the big granddaddy of them all move this cluster Which uses that architecture that I showed a little bit earlier to give you really really scalable vector search Vector search at billion scale at very very high performance, right? So Wherever you are in your vector search journey We have a version of milvis that supports Your use case and we'll be there with you if you're just starting out you want to try let's say 10,000 100,000 vectors Embedded milvis or milvis light is a great option for you if you have Let's say a million maybe 5 million 10 million milvis standalone is maybe something you want to take a look at And then as you scale maybe you need Thousand queries per second 10,000 queries per second near support 10 billion vectors That's when you want trial milvis cluster or you can go to zillis cloud check out some of the stuff that we have on there as well So that's it right and I know I am I think I'm well not too bad So I think I have about five or six minutes for questions Would love to take any questions right now if any of you guys have them. Yes Okay, so I'll go back to the architecture diagram here real quick. Give me one sec Yes, so this architecture diagram is milvis cluster. All right Correct. Yeah, so the question is does it have An option of kafka or pulsar. Yes, you can choose either kafka or pulsar. You don't have to use both The recommendation is I think by default it comes with kafka pulsar The advantage they get with pulsar is that you can I think the there's less overhead when you Open up a new topic so you can in theory support more collections, but I would have to double check that Yeah, great question Anything else? Great question So what the the question here was what is the typical chunk size when you create your vector embeddings? And that's that is dependent on A couple of different things right so the first is your application The second is what embedding model you're you're you're using In when you talk about chunk size you're referring specifically to retrieve the augmentation to indexing documents right and The chunk size that you would want to use is I'm not going to give you a very very straight answer. It depends and I would do some experimentation And also I would probably use one of the frameworks that are out there to do it. So either llama Or lang chain And there's a couple of different retriever strategies as well where you can merge different chunks You can split off different chunks and so on and so forth But it is 100 constrained by the context window of your embedding model. I would say find I would say a pretty good sweet spot is maybe one paragraph All right Just make sure you have some overlap between your chunks great question So the question here is what is the typical delay between when we have incoming data and when it is searchable And this is actually a particular topic of moves that I didn't get into which is we have different levels of consistency in milvis So there is strong consistency as well if you choose strong consistency You'll probably see even with a very very high high high performance vector index You can probably see delays of you know 100 maybe even a couple hundred milliseconds Between when you do a query and when that's returned, right? If you do eventual consistency Things should converge at max. I would say within a second maybe a couple of seconds So data that you insert won't be immediately searchable but When you talk about vector search when you talk about a vector database because it isn't inherently a stochastic database Right because vector indexes are not 100 percent. They don't give they don't have a 100 percent recall Typically, it's something like 95 to 99 percent if you're using hnsw Or just an an an n that is why we say, you know, that is why we say we encourage most folks to just use eventual Consistency excuse me and not to worry about a lot of those A lot of those other other effects Yeah, another great question. So it depends on first of all depends on how you define your schema and I don't without opening up a huge can of worms typically a vector will only reside on one or The a single vector Will only reside on one of the query nodes unless you have replicas unless you increase the number of replicas So replicas replica count in milvis that is for the read or the query path and shards are for the right path right If you have replicas that is when you might have that is when you can let's say boost the performance of your vector database And that is when you potentially could have multiple vectors that reside in different query nodes but if you have only You know if your replica number is one and let's say you're trying to fetch a top k of 100 It will first it will hit all those the shards are distributed I'm not sure it's excuse me segments are distributed among the query nodes And then each of the segments will have an index built over it The query nodes will will perform the search over each of those indexes and then agglomerate the results Yeah, so question here is does it is it distributed across regions? No Not natively, but I imagine you could let's say potentially use something like spanner or you could use another You could you could build it you could distribute it Especially the object that especially the object storage and the worker nodes Those two layers in a way such that you could have it distributed across multiple regions But natively outside of the box. It is not meant to it is not meant to support multiple colos I'm happy to take a couple more questions as well, but I I am you know to speak cognizant of time I do want to end this presentation here I will leave this slide up if you guys want to let's say take a picture Take a picture of it ask more questions about it use it as reference on and so forth This is all this information is also available in our documentation on mills.io and Yeah, I look forward to to chatting with any folks that have any lingering questions After this, thank you