 The Databases for Machine Learning and Machine Learning for Databases Seminar Series at Carnegie Mellon University is recorded in front of a live studio audience. Funding for this program is made possible by Google and from contributions from viewers like you. Thank you. Hi guys, welcome back to another seminar talk here at Carnegie Mellon University. We're super excited today to have Jonathan Katz. He is a Principal Product Manager at Amazon working on Postgres stuff. He's a core team member and a major contributor for Postgres, so he definitely knows the internals of Postgres. The reason why we're here, he's here because we want him to talk about PG Vector, where he's the number two committer on the PG Vector project with a staggering four commits. So as always, as Jonathan gives his talk, if you have any questions, please unmute yourself, say who you are and feel free to interrupt at any time. That way he's not talking to himself for an hour on Zoom. And with that, Jonathan, the floor is yours. Thank you so much for being here. We really appreciate it. Thank you, Andy. So and for the record, I'm very proud to be number two for PG Vector because Andrew Cain has done such an excellent job. You know, and again, I'm looking for people to like surpass me, you know, in terms of internals are people who know the Postgres internals way more than me. So the first thing I do want to thank Andy for like this wonderful title of this talk, because this is not the, you know, the title actually had given the talk. I'm like, okay, whatever, I'll go with it. It sounds catchy. And, you know, Andy's very good at things like that. But what's really exciting about PG Vector is if you asked me a year ago what PG Vector was, I probably would have said, what are you talking about PG Vector? And then I'd be like, why do you need an extension to store vectors in Postgres? So, you know, the world has really changed a lot in the past year, particularly with, you know, the rise of generative AI, these large language models and these very big systems. And then the need to actually be able to store output from these systems and being able to query them, you know, quite often in rapid succession. And allow what's powerful about Postgres is the ability to extend it and add more functionality to it, such as what PG Vector does. And maybe that's like a good way to, you know, dive into what we'll talk about today, which is really an overview of PG Vector and the hook being how far it's coming, you know, even in just six months. But first, you know, let's understand like why do we need these systems? You know, I did look at a lot of the talks, you know, that have been given during this seminar. And I think, you know, a lot of them have covered AI, but I'd like to at least accept the ground for, you know, why is this important, you know, for, you know, for databases? And then, you know, why is Postgres a vector store? You know, Postgres, you think relational database, I'm just doing select star from, you know, pulling up some, you know, integers or, you know, some random text that I stored. Why do I want to run vector queries in it? And then we'll dive, you know, once we understand, oh, gee, like this is this can be a very good idea. We're going to dive into PG Vector, understand what it does and, you know, all sorts of, you know, strategies around using it, as well as discussing, you know, to a degree how the internals work. And then looking ahead at the roadmap, because I think there's a lot of exciting things going on. So let's dive in, very high level overview. And I think, you know, a lot of excitement around today is that, let's say you have a product. When I was designed this product, that's actually in my in-laws house in Florida. And I was in this like very, you know, I was in the sunroom, which is, you know, very Florida themed. And I figured like, okay, let's say I have a store. I have a bunch of like products in the store that are around like, you know, ceramic alligators and whatnot. And I want to create this immersive experience where someone comes in and they're asking questions and they can get guided to the correct product. So, you know, that's great. You know, we've seen, you know, we definitely like, you know, seeing this, you know, occur with a lot of the advances in generative AI over the past year. But how do we actually do that? Like how do we take that text information, like all this data you have in your database and turn it into this application where you could be interacting with it in real time. And, you know, what sets the stage here are these things called foundational models. So, again, just to like level set everything, you know, a foundational model is basically this, you know, the way I describe it's like this very large, you know, machine learning AI system that has trained itself on vast amounts of data. I mean, it could be as big as the internet, which, you know, last I checked has a lot of data. And it's able to, you know, look at all this data and build out models where that if you're able to like ask them a question in a natural language, it can produce a natural language response. It's looked over, you know, a vast array of data, but the day is often publicly available. And if you have data that's specific, let's say to your business to your organization, there's a chance that the foundational model has not looked at it. So in the case of my Florida theme product catalog, you know, it might not have access to, you know, this information I've kept in my Postgres database that I've been storing, you know, completely disconnected from the internet. But that data can still be useful to query. And this is where a technique called retrieval augmented generation comes in. And retrieval augmented generation or RAG is a way to be able to add additional in context or foundational model and you'll provide it with information that it might not necessarily have. In this case, my, you know, Florida repertoire. So the idea is that, you know, normally let's say you take, you know, standard foundational model, you ask the question like how much is a blue elephant vase cost is probably going to answer like, I don't know, because, you know, I don't sell, you know, blue elephant does everything on Wikipedia. I can ask you I can answer any question on Wikipedia but you know, not something that's, you know, transactional nature like that. So the idea with the rack is that you might have a knowledge base, you know, this case we're going to talk about Postgres you have you know all your product information Postgres contains your catalog the inventory the pricing data. So that when a request comes in to say like hey how much is a blue elephant vase cost, the foundational model you augment the response to the foundational model with, oh, a blue elephant vase cost about $20 and it's able to return that. And that's really cool because basically I mean there's a couple things going on here. One, it extends what a foundational model can answer. But two, you can use the information that's already in your database to get that. The next question is great. How do I combine the two how do I take the information in my database to be able to augment these foundational models, and you know the answer there and lies with something called a vector embedding. Now, again, I've a first, you know, pause and say you know from an academic perspective there's many ways to do this but you know us being database folks we typically want to try the most efficient way of doing it. And a vector is a very is a very effective way of doing this. So vectors you know, you know, give me first a flashback to taking real analysis back in college, which somehow I did a year of and I loved it. But I mean it's really, you know, for me it was like understanding how do things work in an dimensional space and that's exactly what a vector is is that it's a mathematical representation of your data. In the case of foundational models generative AI, the idea is that you take you know some kind of information, it could be a text chunk it could be an image it could be a video. You put it into your, your foundational model your embeddings generator. And what you get is you get this mathematical representation is a vector. You know, the way I describe it is that's the magic of the machine learning algorithm and I think you know to describe how that all works is you know likely a seminar in itself. But the idea is that we're able by doing this we're able to create like a common way of representing the information that we can plug either into other foundational models are used to query against other databases to be able to to have this work and retrieve augmented generated system. So a brief overview, you know how you know how this is actually used in action. One typical workflow is like let's say I have a bunch of PDF documents and let's say in this case you know in our Florida theme store we have a bunch of PDF documents about you know all the different products that we have. We upload them you first we need to chunk the documents. The way the embedding embeddings generators work are the vector embeddings generators work is that they take you know whatever text you have you you're able to give a certain amount of tokens. You know, words, you know what not, and it's able to turn that into that you know vector into that vectorized structure. So you put into embeddings model in this case, you know I use the Amazon tidying embeddings model. And you can store in a database, you know in this example I have an of our postgres database. So that's part one for enabling rag is that you take your raw text data you turn into vectors you store in the database next to text chunks that you use to augment the model. The next step is that you have a user your user comes into your Florida theme store and you're looking to buy a blue elephant vase. So first they ask a question, how much is a blue elephant vase cost. Well, you need to generate an embedding of that question, and then use that to query against your database. And that's where the vector is going to come in. So you're going to take the the vector that comes that embedding model, perform a nearest neighbor query which we're going extensively about. And so give back an answer. And then, once you have that answer that's going to be the additional context you can give to the question to your large language model. And your large language models then able to generate an answer with the additional context and say your elephant vase cost $20. So it's a it's a fairly straightforward workflow and yet it's very powerful because basically you're able to apply additional knowledge these large language models through using this very basic data type the vector. I mean that's what's so fascinating to that a lot of this workflow is enabled by doing a vector search or just using a vector which is something you learn in an introductory computer science course. And yet, I call this I call vectors and nasty data type because, even though they're so simple and so many levels, you're trying to work with them you're particularly at scale presents many challenges. The first thing that we need to notice that it takes time to generate vector embeddings. If, if you have a text chunk and you need to generate a vector embedding you know every single time it has to process it through the machine learning algorithm, and that takes time. So you can if you have a let's say your product catalog this 10,000 products for every single query you can't generate the embeddings for all 10,000 products all at once you need to store that data somewhere. That gets the need for a vector database. Okay, let's say we're able to store all of that. Well, one of the problems is the size of these embeddings. When I was in college, if you had a 20 dimensional vector for machine learning system that was that was a very large, a very large vector. Okay, like it blows my mind like I look at some of these LLMs I mean the standard size seems to be around 1536 dimensions today. But you know there's some that are even larger. Again, it just blows my mind but I can I still can't even comprehend what a 20 dimensional vector is and like here we're talking about like 1500 dimensional vectors like they're, you know they're going out of style. But let's, you know, let's go a little bit further so 1500 dimensions of four by floats is six kilobytes. And that's quite a bit of data that you're that you have to store here so imagine you're storing 10,000 of these, you know that's, that's a lot and I think you know a million of these are calculated are 5.7 gigabytes. And that's just the raw storage of this information so a million records, which is not even that much in a table, but what these 1500 dimensional vectors are 5.7 gigabytes and this before you even think about any indexing so there's a storage problem here. Now, when you think we start thinking storage problems you start thinking compression like okay you know I know I you know I'll start this text in my database but it's okay I'll compress it down or I can store out of line and postgres was you know with toast tables. But this data doesn't compress really well because think about it, you have a series of random floating point numbers within a vector. And it's completely random like you don't necessarily know what they are there's no rhyme or reason or pattern to it so they don't really compress. In fact, when I was doing some benchmarking against one of the PG vector indexes and you're really looking into areas we can you know try to get what's called some micro optimizations. I actually watched toast try to compress the vectors. And in the end it kept coming up with larger values because you couldn't really compress the data and you're basically paying for the overhead of the, the, the header to be able to compress the information. So all right, so it takes a while to generate the embeddings. They're very large you can't compress them. I mean it must get better right well, you know maybe maybe you can query them quickly, but you know forgive the you know the million records staying here. The key operation when you're comparing vectors is you want to find the distance between them and you will talk extensively about that. But there's no shortcuts to it. If you calculate the distance between two vectors, you have to calculate it for every single dimension. And you can see how long it took me to click through eight of these dimensions so matching happening go through, you know, 1536 dimensions. And granted, you know, CPUs GPUs are way faster than I can ever click a PowerPoint presentation. But if you have to do that a million times against a million records within your database, it's going to take some time. It's a, you know, it's an own squared problem. So, the, you know, the recap. There's a lot of challenges just working in vector data. You know, I think, you know, one reason I find so fascinating is because like it's so simple, yet it's so challenging and in given you know the prevalence of this, this information out there and the need to be able to retrieve it you know very efficiently. We need to find strategies to be able to query them more quickly. The good news is for the past 20 years folks have been, you know, folks way smarter than me, I have been working on strategies to do this, and they developed this idea of approximate nearest neighbor. So, the typical vector query is exact nearest neighbor that, you know, let's say I want to find the 10 closest coffee shops to me, you know, that's 10 nearest neighbors. Now, if I'm using a geospatial application, I probably want to find those exactly because I don't want to, you know, find out like the closest coffee shop to me is in, you know, Pittsburgh, not New York. Approximate nearest neighbor can work in applications near such as, you know, for which you've documented generation where you need an answer that's good enough, it may not be exact, but, you know, it's going to give you a good enough answer. So, I'll be able to find a good enough, you know, blue elephant base for my Florida collection. So the idea is that, you know, the reason why it's approximate is that in order to do exact nearest neighbor you have to search every single vector in in your data set approximate nearest neighbor you're trying to get the best answers without having to search for everything. So, the idea is that you're looking over reduced data set, but more likely than not you're getting the vectors that you want to that you want to see. The nice thing is that this should be faster than exact nearest neighbor. You know, it's much faster to look at say like 1000 vectors than a million vectors, and that's going to be able to give you more efficient results. So on paper the sense really good, but the key trade off this is thing called recall recall is a measurement of expected results. And as soon as you get to approximately nearest neighbor, you basically need to, you basically having the trade off of faster searches but you may not be able to see all the results. The way I like to think about recall is that 10 years neighbor query like I want to find the 10 closest coffee shops close, you know, nearby to me. Depending on my algorithm, let's say I only return eight of the eight close coffee shops and two close tea shops for lack of better comparison. In this case, this is going to be 80% recall that I was expected to see 10 results that were that were matching my preference I only saw eight of them. And that's going to be good enough because I might find like the eight of the eight of the closest coffee shops to me and I'm going to be perfectly happy with that answer but there's a risk that I'm again answer that's not favorable to what I'm looking for. Now. So keep in mind the trade off is that I, you know, I probably visited a lot less data in my database than had I decided to do an exact nearest neighbor search, but I might not have gotten all the desired results. That's my point because this is going to be the big trade off when we look at all the different algorithms on, you know, within PG vector and just in general with vector similarity search is that I need to choose. Do I want my data quickly at around my data, you know, in the way, you know, I don't want to say accurately because machine learning folks, you know, give me death stairs when I called accuracy but to the user it is accuracy like do I want the results that best suit my search queries. The last thing before we dive into PG vector, you know, something I've been staring at, you know, with my with my app developer head on, and maybe, you know, Andy maybe the fake background you've given it was that once upon a time I was an app developer which was true but you know, converted very far away from that over the past several years. You know, we like to always think about things like at the high end like how do I look at things or how do I get things as quickly as possible but when I'm an app developer I need to consider like what do I want to do for ultimately using data in my application in this case vectors. The first thing is storage. Do I do I need my results as quickly as possible do I want to be able to keep them in memory, or do I have so many results that like I can't, you know, pay for enough memory in my system that I need to keep them down at the storage layer. Because once I understand that I might understand my performance that sure I want I want to get the car that goes as quickly as possible, but I may not be willing to pay for that car that goes as quickly as possible. You know, skimming head to like the cost parameter. And also in this case there's a new there's a new thing. And this is like really weird for database peoples like this, you know this idea of relevancy because you know from the relational database world. When you write a query, of course you get the exact results back like why you know why would you ever why would you ever not expect to get that. And I can tell you personally like when I started playing with PG vector, like approximate nearest neighbor was weird to me like I was getting results back, you know, I like, you know, poor recall settings. I was getting results back that like made zero sense to me and I'm like, what's going on. And it's this notion that you do need to consider relevancy into your results. It's just like all these things are intention that you need to figure out, you know what matters the most to you. And again, you can take this box and you can build it out and say like everything matters to me. I'm a pay for storage and performance and maximum relevancy, but likely what's going to happen is that you're going to have to pick and choose that relevancy may be more important to you, but you know it's going to come at it's going to come I hit to performance and might end up costing you more based upon the system that you run so a lot of these are the practical considerations that vector storage. Now, I want to get into, you know, one of my favorite parts of this which is talk about postgres is a vector store. And again, to me, you know, I'll say that, you know, a vector ultimately is a data type, which means that you can basically put a vector and anything that has your storage processing system and you know this is true of postgres. So one, the first question might be well okay well why postgres. For one it's open source you know postgres has been around for over 35 years actually, you know postgres and I are about the same age as you know the ripe old age of 37. And it's not controlled by a single company now this is, you know, one thing that's had postgres become very popular through the years is that you know it's very community driven. You know, I was involved with a lot of different, you know, community projects and community work where it's not a single person making decision it's you know folks coming together and you're coming towards a consensus on what the best design for something it might be. But through the years, and you know, I was fortunate to really observe this, you know, particularly first as a postgres user and then you know postgres and contributor abate now on the coding side is the growth of postgres. So I did the features, you know in postgres that helped it to be adopted as widely as is today I mean, and I actually say it starts in the app developer just the data type support and the implementations of it just make it so much easier to build applications. One of my favorite data types is the range type. Once upon a time I was at a company where the principal thing we did was scheduling and being able to keep you know store a range of times within a database and retrieve those very quickly was huge. It made it so much simpler to be able to manipulate that kind of data. And so I was able to do the indexing support as well that, you know, in the case of the range type I could do overlap queries, and have them, you know, returned in, you know, sub, you know, sub millisecond time, as opposed to having to, you know, concoct, you know my own custom indexing system. And you know these features through the years from postgres, you know, have made it easier to run it, you know, both for small and large workloads, you know, which is certainly help with the adoption. The question becomes, you know, well why vectors, you know, why now, and I think the last piece that is missing on the slide is the the extensibility of postgres that if something's not there, you can add it. And one way folks have added things to postgres for the users for creating their own database system. But even if they don't forget postgres itself was designed to be extensible back from the original original Berkeley design. So if you don't like it, you can add the feature and package it as an extension and then you have it consumed by, you know, people on, you know, all sorts of different postgres. So that's the first thing it's like why use postgres or vector search as well, it's there, you can, you know, there is an extension for it there's PG vector. But, you know, if you look at it from a developer perspective, I can just add in, I can just add in PG vector and I don't need to do any additional work really it's you know it works in my existing tooling works with my existing drivers. There's some, there are some extensions to the existing drivers that can make it more efficient, you know, such as being able to support the binary vector format for PG vector. But the idea is that I don't have to do much more work to be able to support in my application. From a practical standpoint it might make sense to co-locate my data on the same database so I have my transactional oriented data and I have my machine learning data within the same database. And that's me to use one of my personal favorite features of postgres, which is the join. But, you know, it might also make sense just based upon you know how my applications into facing with the database. And meanwhile, you know postgres, you know, this isn't like a, you know, a one and done type thing is that you can work with other systems that process data upstream downstream and have postgres either in the center or as part of that transaction, because postgres is the transactional store. You might decide, you know, based upon your requirements that you do need, you know, completely in memory vector processing system, but you still want to have a place to store your vectors at the end of the day and, you know, maybe you don't need a vector index on top of it, but postgres can be there so you can, you know, load all of your vector data back into your memory system. You know, there's one thing postgres has been very good at through the years at storing data and, you know, being a reliable store for that. So you said something about like there's certain client drivers that can operate on vectors more efficiently. Is that what you said? Yeah, so I'll give an example. So postgres is the JDBC driver. So it's the way they can connect Java apps directly to Postgres. And if you go to the PG vector repo, there's actually a pack on the extension of the JDBC driver that allows you to take the vector stored natively in Java and transmit over the wire to postgres in a binary format. So instead of having to go from something like binary to text to binary, you can keep it, you know, directly in the binary format and save a transformation. Got it. Okay, thanks. Yep. So, and I think that's actually a good set of way into what is PG vector. Since, you know, we've been, you know, we've been talking about it, but we haven't defined it yet. Quite simply, it's an open source extension that allows you to do vector storage and search. And, you know, some of this is, you know, I'd say product raising around what it does, but it does really break down to like what it is at its core. And what I really like about PG vector is that it is very simple. And the session with the ethos of the project that it's a vector data type. And that's all it is for. It's for, you know, being able to store and process vector data. But that's the key that you also need to be able to search over it. And this is where the bulk of our talk is going to be for the rest of the day is that, you know, the indexing and searching over it. You know, there's two index types supported where you get to know them very well. There's IVF flat and HNSW IVF flat is a cluster based approach towards indexing HNSW is a graph based approach. And, you know, they both have characteristics that are, you know, they both have characteristics that are trade offs. But again, we'll talk all about that. Searching, you know, for good news for the database folks is that this does support exact nearest neighbor searches in Postgres. It also supports approximate nearest neighbor searches. And, you know, that again, like the whole, I thought it personally took me a while to wrap my head around how all of that worked. So just pause here too is that Postgres actually has supported vector searches for a long time actually going back to the Berkeley days and supporting exact nearest neighbor searches. Postgres has the array data type. And if they read a type you can you can define a distance operation and be able to find effectively all the all the arrays that are closest to you. It just doesn't have indexing support. The data type does have indexing support and supports up to 100 dimensions which when I was implemented I think back in 2000 2001 again like 100 who need more than 100 dimensions for a vector. And it uses the gist index for it and you can actually do an efficient K nearest neighbor just search for, you know, everything that's around you. But I actually I actually stare at the cube data type quite a bit before embarking on the PG vector journey, and to be able to extend beyond 100 dimensions would take some work. And, and again, you know, as we'll see there's reasons why I also say I personally played around with just and trying to see how far we could take just for doing these types of searches. And it is a very exhaustive process to be able to enable can and with just so probably a much longer discussion for a separate talk. So you can create a data so you can co look at information with your vector embeddings, it could be text chunks, it could be, you know, your entire product management system. And the nice thing is that there's a choice of distance operators. You know this is the secret postgres code for distance which is, you know, the various star wars type, you know fighters but the, the two most popular ones are the, the one on the left and one in the center, which is the distance and the cosine distance. And what does that actually mean. So here's a chart I'm actually really excited to start with this group because I've been trying to explain like had the different distance operations work. And typically, you know what I do is, you know, I have often have some props on my desk that my two year old leaves on this case I have a cat today and a red ball. So Euclidean distance is line of sight. It's a it's like how far apart are we, you know, we're looking at each other. Then there's angular distance which is or cosine distance which is angular distance, which in this case is measuring the angle between two objects. I guess I have a flashlight to today. So the one that I've had the hardest time figuring out to how to visualize the inner product. And I've actually I've like, you know, scarred the web for the best way to visualize inner product and the best I can, the best I've been able to come up with it's it's kind of an amalgamation of the properties of both like a line of sight distance and an angular distance. But I've yet to find like a good visualization for for how this looks and I'm a little bit concerned what I have here today might end up becoming my visualization. Because, you know, again, like the inner product you know back in back in my college days like all you did or you did and you know once you get beyond, you know, elementary calculus is you know everything's an inner product in some way shape or form but you know I kind of keep myself and I like stopping to question like, how do I visualize this so this is probably more than I'll ever talk about the inner product, you know, in hopefully my lifetime but given I have this group here is wanting to get anyone have like a better way of visualizing this. Probably not, we just asked chat to be tea. Well, again, I've tried it like I get back, I get back, you know, like just mathematical answers or, or like see Euclidean or cosine distance. We don't ask anything to chat to be they're trying to figure out who runs open AI right now. I'll run stable diffusion now. See what it spits out. I appreciate that. Anyway, what's also interesting so the distance operations by the way are the foundational portion of being able to do similarity search because you're basically trying to figure out how far apart everything is from each other. And once you have that you're able to start indexing it because again if we try to do an exact new neighbor search over a million vectors let alone 10 million or a billion, it's going to take a very long time. I've tried it, it can take a very long time. So we need to be able to index it. And what's cool is it's actually important to understand how PG vector index is a vector because basically it does a normalization of, of your vector. So a normalization is setting your your the magnitude of your vector to one so your magnitude I always like to think, you know, when we look at a vector you see like that you know the arrow pointing somewhere. The wrong thing to say is the magnitude of your length, but you know it's simply in my head it's you know it's the size of your vector. And, you know, the, you know, the, you know, the key property of the vectors that you know you have a magnitude and you have a direction of it. If you're able to eliminate the magnitude as, as an attribute comparisons then you only have to worry about the direction or you know ultimately the distance between, you know, two or more vectors in space. So that's what PG vector does when it's indexing a vector. It's like first it's good check if it's a valid vector. So, you know, a valid vector and it has certain properties but in this case, you know, for the purposes of indexing needs to have the same dimensions and needs to have magnitude. So it needs to have magnitude greater than zero. Then it checks if it's normalized. So a normalized vector is a magnitude of one. And again, you know, ultimately mathematically you could buy, you know, you know, we could set to anything right we could say a normalized vector has a magnitude of 10 and then you know compute you know, map all the magnitude of 10, but you know, mathematically we're just specific to one. So we normalize it because what's going to happen is that ultimately when we're doing our index operations, we're going to be able to cheat and we're going to be able to take out some of the calculations we normally need to do in some of the distance operations which are most noticeable in the cosine distance which has, you know, some division operations based upon your magnitude. Your magnitude is one, you know, you eliminate those operations and, you know, that's, that's less CPU cycles need to spend. And that's important because if you're, if you have to compare 1536 dimensions, every single time you do a comparison between two vectors, you want to minimize how much time you're spending in the CPU or GPU if you have that privilege. So this is this is an important content. This is one of the things you might also take for granted to when you're using something like PG Vector is that it's doing this work behind the scenes to make sure that you're able to get the most efficient index searches. Now, could you vector as to indexing methods as I mentioned IVF flat stands for who I probably should have said that I know it's like inverted flat index. I know the IVF stands, there's something else in the IVF that I'm blocking out right now, an H&SW, which is hierarchical, graphical, small worlds. Side note, I encourage everyone to read who's interested in this and getting deep to read the H&S, the original H&SW paper, it is a well-vidden paper and it is, you know, very fascinating and we'll make another mention about it later on. So let's compare the two methods, you know, as I mentioned before IVF flat is K mean space. The idea is that when you have your vectors in a vector space is you're going to find a certain number of centers and your cluster the vectors around, you know, one of those centers. H&SW is graph based, where the best way to describe H&SW is that you create like this web of, you know, whatever vectors all connected to each other. And based upon how you traverse that web you're going to be able to put yourself into a neighborhood that has the most relevant information. So that's the key of this next bullet point, which I share fast forwarded to is, you know, this is this is how you alter the organize it. IVF flat it's centers and lists, you define how many lists you want, say 100 lists, and for each vector you put it into one of those lists. With H&SW, you're going to traverse through the graph and do you find vectors that you're most similar to, and you're going to create a bunch of links that connect you to that neighborhood. The idea being that your position in a position in space that is going to be most relevant to like all the other vectors around you. What's interesting is that this actually affects how you build these indexes. Because you need to be able to calculate centers in vector space for IVF flat, you need to have data already in the index. You can't start from an empty index. You need, you know, ideally you actually have your index, you know, you have your table fully populated and you build your index around all those vectors. You can iterate and add more vectors to an IVF flat index, but you need to have that index already built because those that's how you're going to define your centers. Whereas with H&SW, it's completely iterative. You can start from an empty table, no vectors in your table, and then add them one by one by one or probably, you know, or if you have a very large set like concurrently or in parallel, and you just iteratively build up that graph over time. So that's very interesting too, you know, based upon what you're looking at, but also this gives you some, you know, different properties for building out the indexes. And finally, you know, if you look at, you know, if you look at insertion time for building the indexes, ultimately the insertion time for IVF flat is bounded by the number of lists. So if you have 100 lists, you're going to have to check 100 lists to determine which list you go in. If you have a thousand lists, you have to check a thousand lists, 10,000 lists, etc. So the insertion time on IVF flat can be very quick, but it's going to scale up as the number of lists grows. Whereas with H&SW, the insertion time increases as the graph increases. You know, kind of similar to what you might see with the B tree index, albeit you're probably doing some more computations than, you know, being within a B tree index. So there's different bounding properties, you know, IVF flat is going to be fixed by the number of lists that's going to affect your insertion time. H&SW is just going to grow as your overall index grows. So which method should you choose? You know, this is, you know, you'll probably answer this more as we dive deeper, but if you, the first thing is that if you just need your exact nearest neighbors, you don't build an index, you don't use an index, you're going to compute this just against doing sequential scan of your data. And remember, this is that tension we see between performance and relevancy. If you need 100% relevancy, if you can't miss any results, don't use an approximate nearest neighbor index. If your domain requires building indexes very quickly, you're going to want to use IVF flat. Because most of the work in doing the index is, you know, done up front is that, you know, you have to build the index all at once, identify the centers, and then it's very easy to add data to the centers. IVF flat we'll see also has some very nice parallelization today in, in PG vector. If you want an index that's easy to manage, I like to call it set and forget. You have H&SW. The nice thing about H&SW is that the, I'd say today the defaults in PG vector are pretty, are pretty good. They were well tested and you might need to tweak them in terms of building the index, but generally it's very, you know, it's a little bit more straightforward. The IVF flat there's more tuning you need to do both from the build, the build portion and the query portion. In GNSW, I don't want to say it's set and forget, like all these, all these different indexes require tuning, but it's a little bit, you know, it's a little bit easier to tune. And if you're looking for high query performance, you know, and recall basically a very nice performance to recall ratio, then go with H&SW because that's really where it's excelling right now is that you're going to spend much more time building the index than IVF flat. But when you flip it around, you look at query performance, you know, particularly query performance and recall H&SW has been, you know, has been one of the leading algorithms in its class. So for me or not, we're going to deep dive and I'm just trying to be cognizant of time. Are there any questions so far? I think you're good. All right. So I think the best, you know, the way I like to do this deep dive is going to the, you know, the best practices, because through the best practices, we need to explore how these things work. The first thing to keep in mind is actually storage, like how do you store these vectors within Postgres? An important aspect here is toast, the oversize attribute storage technique. So everything in Postgres is bound by the 8 kilobyte page size, or I should say anything that's bounded, you know, everything in Postgres restoring is bounded by the 8 kilobyte page size so long as you haven't forked it or recompiled it to modify the page size. So if you have a value that exceeds, you know, 8 kilobytes, we're going to have to store it out of line. And that's what the toast system does is that it's able, it's basically a table that's separate from your heap table and it's able to keep pages that extend beyond that 8 kilobyte limit. So really by default, Postgres starts toasting values over 2 kilobytes. And if you have a 1536 dimensional vector, Postgres is going to toast that. And what's interesting is that, you know, this does, you know, this will ultimately affect performance in some way. You know, what's good is that, you know, Postgres, a lot of things, you know, most things in Postgres are configurable and how you store your data in the column, you know, you can you can select that. What's interesting is you can't do that create table time. You have to do an alter table command to be able to set the storage and that's something I might bring up to the list as I, as I say that out loud. The two to keep in mind here today are actually the three to keep in mind are the first thing, plain extended external. Currently the default in PG vectors to use extended, where both you store your data out of line once once it exceeds the toast special and you try to compress it. But as we know, we can't compress these vectors. I've tried. I believe PG vectors actually going to shift to using external in the 060 release. So it's a default for now, but it's going to shift to external. The other is to use plain. Plain is where the data stored in line with the table. So instead of storing your six kilobyte vector out of line, you know, in the toast table, it's in the you sort in the heap table. Now, why does this all matter? Typically, you know, one of the thought processes behind toast is that when you're storing this large data and postgres, it's probably not on the hot path of your query. You know, originally was, you know, I, well, I probably know how, you know, originally, because I've talked to young work about this, but you know, you think about it like you're starting these large text blobs, you're probably not querying into them all that often, or if you are like it's going to be, it's not going to be in an index, right? It's going to be, you know, you have an index that's going to reduce your data set down to something relatively small and you're doing probably some pattern matching on the query, or you're just going to be searching like everything anyway, but it's going to be a fairly infrequent query. So you're going to store it out of line because it's not on the hot path, but your 1500 dimensional vector is in the hot path like you are actually, you know, performing those distance operations on it. So we have to make a jump to another table. The other thing is that, you know, the postgres planner, you know, did not necessarily conceive of a world where you're querying over very large data that's outside of your hot path. So let's take, you know, for example, you know, 120 dimensional vector. So this is going to be stored in line in your heat table. And we're going to do sequential scan on it. And we can see that the post that postgres is planning, you know, six parallel workers in this case for, for querying all this data. Cool. So let's do the same thing on a 1500 dimensional vector. And again, I don't expect you to, you know, be paying attention to like all the costing numbers in there because actually that's the key to it but notice how for like the same exact query in the 1500 dimensional vector, only four workers are planned for parallel workers for, you know, the exact same number of rows of data. And it's because when postgres is doing its estimate, it's not considering the toast pages that, you know, in, you know, in this query, even though like the, you know, these toasted values are the most important part of this query as I'm doing this scan. So this is on the list. It's something that I'm going to try to push a little bit more for us to be bitter about. But, you know, the idea is that, you know, in area where postgres can improve is that if, you know, if it knows that there's data that's in the hot path that is in toast that needs to be considered that it should be able to get better, better costing estimates for it. And you can see if you look very closely, the cost is higher for the smaller vectors than the larger vectors. So this is one of those gotchas that you know that you can run into, you know, particularly when you're working with this data. So there are some strategies. First, you can use plain storage, which again, you have to run an alter table, your statement on it. The one drawback with plain storage is that it limits your vector sizes to 2000 dimensions, which is probably okay for like most of the workflows I've been seeing. And I've heard of some legitimate use cases of vectors going beyond 2000 dimensions. So, you know, there's something that, you know, certainly we can get better at all around what we'll talk a little bit about that later later in this talk. There's also a postgres parameter in parallel table scan size, where you can induce more parallel workers. And that's exactly what I did on this 500 dimensional vector query is that I set in parallel table scan size to one. 11 workers, which makes a lot more sense because I have, you know, way more, you know, these are taking up full pages basically, and there's going to be way more pages to scan versus the 120th dimensional vector which things could be, you know, squashed down to, you know, you don't need to scan as many pages. So that's one way. Can I ask a question? Hi. This is Jignesh Patel and he's colleague. A lot of things that you're talking about kind of make, you know, this is debate about whether vectors and relational databases makes the rag example that you gave is very much about getting a very low latency search on vectors going. So I don't know if you're going to hit that in the talk. I know it does make sense to go and pack some of this together, but is it even practical because the latency to just get anything out of postgres is pretty high and asking for a friend who has tried no my startup has tried and you know we ended up going down a different way because it was just the latency was very hard to meet on anything that's even reasonably large. Let me jump to the end. So here's an example. I say a fairly, maybe not an optimized example, but, and you know, this is QPS numbers, not latency numbers, but these were, I think it's 10 million 1500 dimensional vectors. In this case, you know, I was trying to compare, you know, running them on different hardware, but you can see, you know, if you look at, if you look closely at the transaction per second numbers you can see the performance I'm getting being able to, you know, get the kinds of vectors out of rack systems from postgres. And the key, you know, this is where the algorithm is most important. You know, one of my friends and colleagues Peter Gagan, you know, says it best, you know, it's all about the algorithm and how you implement the algorithm that's where you can get the most performance gain and this is where they go back to the talk that a talk title and it gave me this is where H&SW, you know, is very powerful. You're going to pay, you know, where you have to pay is you have to pay on the indexing which, you know, we'll get to, but the, you know, the trade off is that you're able to get these very low latency queries. You know, as you mentioned, you know, I don't want to necessarily say there's a misconception that, you know, Postgres has higher latency than other systems. I think it depends on what you're doing. But one thing that Postgres does very well is looking at things from indexing, particularly things that are tree like, you know, like, you know, kind of like a bee tree, but H&SW has similar properties to a bee tree in the sense that, you know, it is, you know, it goes a little bit beyond that. It's a graph, but it is, it's a graph where you're going through a minimal set of information to be able to get, you know, the maximum, you know, the maximum amount of data out of it. So, have you done benchmarking against something that specialized that, you know, like a quadrant or any of these other engines, I think that's would be super interesting and maybe that's part of future work. And that's okay. Yeah, I would say I'd say I'm not the best person to answer that question at this time. I've definitely seen, I've definitely run my own benchmarks and what I've seen is that, you know, Postgres has, you know, has been able to be very favorable, particularly, you know, with H&SW, the H&SW implementation. And just like being like this at the time, I say I have 15 minutes left, you know, maybe, maybe, you know, I do want to skip ahead and, you know, not spend too much time on an exact nearest neighbor. So, you know, let's get to the, you know, the fast stuff. So I do, I do want to talk about IVF flat really quickly. And, you know, as mentioned, you know, it's a clustering algorithm. So the idea is that you have, you know, let's say if you're a bunch of vectors in space. You basically say how many lists do you want? Those are going to be your centers. Say I want three lists. And then, you know, going through the K-means process, you're going to be able to find your clusters and, you know, this builds out your index. The nice thing about IVF flat, there's only two parameters you need to worry about. There's the number of lists, which is, you know, it's good to find how you build the index. And then the number of probes, which is how many of these lists do you visit during the query. The idea being that the fewer lists that you have to visit, the faster your query is going to be. So the idea is that, you know, let's say we try to, you know, create just one list. That's going to be very fast. You find like, okay, this is the list I'm closest to. And then find me the three, you know, the three vectors I'm closest to. And notice that I need to change the highlight on this because that might not have been clear. But these might not actually be the three closest vectors because if you eyeball this, I see there's a vector that I'm closer to in, you know, in the list that's not highlighted. And this is, again, where recall is important because, you know, or relevancy is important because I can get a very fast query here by my not get all the results that I want. I just, just also another misnomer IVF flat, I think a lot of people see that IVF flat is slow, IVF flat can actually be super fast based upon how you define your list, but you run a risk of relevancy when you, if you have lists that are too small. And that's what we see here because if I go to to list actually see I'm closest to these three vectors. And, you know, that's why, you know, that's why that probe value is so important. Just briefly about IVF flat, you know, the key thing about IVF flat is defining your list. That is, you know, that's ultimately how you can be able to tailor your recall to it at least in a way where you can minimize your number for OBS. There's some guidance, you know, there's some guidance in the PG in the PG vector repo for how to choose the number of lists. General rule of thumb is right there, you know, less than a million number of vectors divided by 1000 otherwise square root. What is nice about IVF flat is that it is, you know, is actually very easy to parallelize. And, you know, through that and actually through leveraging some things in post guys you can build these indexes very quickly, particularly if you're getting, you know, the recall that you want. The other thing I mentioned about recalls that it can often be driven by your, you know, the algorithm using the vectorizer data. So it's not just, it's not just your indexing algorithm, it's also the, you know, whatever upstream system you're using. IVF flat, the data can also skew over time. If you, you might need to rebuild and recalculate your centers based upon, you know, how the data gets added and if it's, you know, particularly skewing the results that you want. Here's a quick parallelism in IVF flat. You know, when we're looking at IVF flat and seeing, you know, where we can prove the build time. We noticed that basically, we were doing sequential scan like once we defined our list. You know, we were basically pulling every vector out one by one and then assigning it to its appropriate list, which, you know, if you have 10 million vectors will take some time. But what's cool is Postgres has the ability to do parallel scan. So you can read data out in parallel and then assign it to its appropriate list and then, you know, and then you're good to go. And we saw that this was a huge improvement. At the most, you know, we saw index build speeds increased by up to 4x. In this random example that I did, I was only able to get 2x. I think there's a little bit of a smaller, it was a smaller days. It only has a million vectors because I rushed this one. But this is pretty cool. And it didn't impact, you know, it didn't impact recall at all. It just basically made it much faster to build the IVF flat indexes. Parallel builds are in PG Vector 05 and greater. And if you're using IVF flat indexes, you know, it's definitely, it definitely helps us speeding up the testing of your systems. I mean, the other thing, I think, you know, the main, the main lever you have an IVF flat is the probes. That does increase your recall, it will impact performance. There's some other things too, just to keep in mind, you know, based upon what we discussed about toast toast definitely had a big impact on IVF flat queries and it took a lot to get the costing correct for them. We don't see toast having impact on HNSW as much, you know, which we're about to dive into. So a lot of these recommendations are more specific to IVF flat. The one that's universal to both is shared buffers. The more of your data you can keep in memory, the faster your queries will be. I mean, I think that's in general for most systems. But, you know, that is, you know, that's something to keep in mind as you're dealing with this data. And, you know, the particular toxicity with the vector data is that it tends to be very large. So it gets into HNSW. HNSW does take a little bit more work to build both, you know, because as one additional parameter IVF flat and that the indexing time takes a little bit longer. But the payoff you, you know, basically what you pay in the indexing time you're going to make back up in the query time and perhaps, you know, in a greater way. The two key parameters are M, which is, you know, in your graph you're building links between other vectors around you. So a higher M means that you have more links to the vectors around you. So this is how you create your neighborhoods and your clusters. And then if construction, which is essentially your search radius as you're going through, as you're going to be building the vector. The idea is, it's go through building the index and the idea is that a higher EF construction means that, you know, you're, you're looking at a greater set of vectors and you're more likely to get better results meaning better recall. HNSW index real quick. And in part this is to showcase the work involved in building one. So let's say I have this orange vector. I'm going to, I'm going to go in and I'm going to start building out my index. HNSW works in layers. Layers go from less dense to greater density. And the original algorithm actually had a single layer. In the navigable small worlds, there's no hierarchy to it. And the office discovered that if they're able to break out the hierarchy if they were able to actually get, you know, better recall levels, you know, just from going from a sparse space to a denser space. So at the top level, you might just link ourselves to, you know, single vector that's closest to us. We then use that to descend to a lower level, where we might start linking to more vectors around us and basically building up a denser tree. So finally we get to, you know, the final layer. And at the final layer, that's where we're going to create the neighborhood, like the dense neighborhood of vectors that are around us. Because ultimately when we do the search, that is how we're going to, we're going to search the graph. Now, in a real, you know, in a real HNSW index, you might have more layers. You might just only have a lot more vectors than this, but the idea is to give you a sense of how you're traversing. So it takes a little bit of work because, you know, you're looking for like a local, you know, that local optimum of all the vectors that are around you. But you're going to create like this tight knit group that is going to allow you to, you know, do these efficient K&N searches. So this gets into, you know, querying. And querying, there's only one parameter, EF search, and PG vector defaults to 40. The key is that the EF search value has to be greater than, greater than or equal to your limit, because EF search is essentially how many vectors am I keeping, you know, in my search, in my search list. So if you have a greater, if you have a limit greater than the EF search value, you're going, you know, you're going to miss out on some of the vectors. So how do you query it? The same as query vectors is blue vector, and I'm at the top layer. I'm going to find the vector that I'm closest to my descent down to the next they're going down that. And then I'm going to go around and try to find a vector that I'm closest to in that graph. I'm gonna find it. I'm going to send down to the final layer. And I'm going to see that there's all these actually not the final layer yet. I'm almost there. I'm going to send down to the next player. I'm going to again, find the vector and closest to the send, you know, to send down that. layer, which is going to be that dense graph that we talked about. And I'm going to find, again, the vector I'm closest to, and that's going to be, that's where I'm going to start billing out my neighborhood from it. And as you can see, through this very quick search is that, you know, the idea is that I'm going to be in the vectors that I'm closest to, most similar to, but I'm not going to have to do as much work as IVF flat, because I'm not list bound, I'm more dare I say tree bound. It's going to be much similar to traversing, you know, traversing something like a bee tree that I'm used to, where I don't need to touch as many pages to be able to get to the answer that I want. And if I'm able to build these neighborhoods correctly, you know, at least on this, you know, I'm also not going to be hopping around as much on this because I'm, you know, again, I'm in this like dense neighborhood of data that's most similar to me. So we do more upfront work and construct the index, but the payoff is that we don't have to search as many vectors as we're, as we're, as we're going through it. So again, I can tune all that. Like if I increase EF search, then I, you know, I will be searching more vectors, but based upon, you know, how well I was able to build the index, I might be able to keep that value relatively low, you know, for the neighborhood that I'm looking for. The other thing is that, you know, there's different, you know, there's different ways of implementing HNSW fights, for example, influence version three of the algorithm, at least last I checked, PG Vector implements version four of the algorithm. And it has to do, I might not set the difference between the versions, but it has to do, I believe, with how you're storing the distance within the, you know, within the index itself. You know, it's a, it's a different optimization. And one thing we noticed when we were testing HNSW against the in and benchmarks originally was that we were getting higher recall for, you know, the same, the same parameters versus some of the other implementations. And, you know, again, this is the, this gets into the performance recall trade-off, but, you know, choosing the correct algorithm or choosing how you implement the algorithm can impact ultimately your recall. The one thing note, as I said, you know, a lot, a lot, one of the nice things about HNSW, at least PG Vector's HNSW is that it can be said and forget that the defaults seem to work pretty well. We tested these defaults again against the known in and benchmark data set and, you know, these seem to give the best bang for the buck. Currently with PG Vector's HNSW implementation, it doesn't support parallel index builds. So we've been recommending starting with an empty index or an empty table and then populating that and using concurrent inserts or concurrent copies to be able to speed up build. So for example, I took a million 128 dimensional vectors, which I know that's a small vector these days, but we could see the impact of concurrent inserts on it that we're able to, we're actually able to build the index pretty quickly, you know, pushing a lot of clients at it and, you know, versus you know, just having the loading, pre-loading all the data and having a single build. We did test that this method does not impact recall. We'd actually, we actually extensive testing on that, you know, before recommending it, but it's pretty cool. Now, fast forwarding to the end, there actually is going to be parallel build support for PG Vector in 06. I was committed, I believe, about 10 days ago. So, you know, some of this advice may not hold or may hold. We're actually going to do some comparisons in terms of like when it makes sense to use each method. Using M and EF construction too is a little bit of an art in the science, but one of the reasons why we chose the defaults that we did was that we saw like diminishing returns once EF construction was above 64, that you could certainly boost recall, but by not as much of a factor, and it hadn't, you know, we saw like a, you know, big impact on a build time. Like, you know, I don't want to say it doubled. I don't think that's fair, but we certainly saw a measured increase on it. So, again, it depends ultimately what you want. Having a higher EF construction means that you'll be able to get better recall with lower EF search, and lower EF search is typically faster queries, you know, probably that slide that we jumped ahead to. But, you know, again, you might have to test that based upon your dataset. So, finally, what is this one? Oh, so this is M. So, we found that increasing M significantly increased build time, and it definitely did help with recall, particularly for lower values of EF search, but at a great cost. Like, we saw a much greater jump in build times when increasing M. So, the growing advice is if you're trying to boost the recall of your queries, first start with EF construction, because the indexing time is smaller, and then if you're still not seeing the results that you'd like, you know, certainly try increasing M. I call this more like the pragmatic testing advice versus, you know, necessarily anything theoretical. But in part, you know, this is why we experiment. So, jumping ahead, I do want to touch on filtering real quick. So, filtering, the idea of filtering is, you know, I have a where clause. Can I use the index? The short answer is yes. There are some techniques that you do need to use for it. I'll just jump back to that. So, first, you use a partial index. So, partial indexes, you define a where on your index clause, and that's one way to do pre-filtering, because you're only indexing part of the data. I know Andy is shaking his head. Don't I, Andy? I got you. Yeah, I was thinking like, oh, partial index, of course. Yes, it's Postgres. You already have partial indexes. Yeah, it's great. But there's more. So, you have partitions too. So, you can partition your data if you have that natural partition key, and then just build the index on the partition. And I've seen cases with users who only need to index a subset of their data. They can have like a default partition, and exact nearest neighbor surges are fine there, and then, you know, they can put some indexable data in a partition. There's one more thing coming, though, and actually coming very soon that's already up in the repo, which I can't wait till the slide to get to it. I'm excited about it. But there's a paper proposed called HQANN, which the idea is that it's effectively a multi-column index where you have your vector data, and you have a certain set of attributes, and you're within an hnsw index, but you're basically building appropriate links between vectors that have similar attributes, such that you're able to, you know, traverse the index based upon those attributes. There's a patch for it up in pgvector today in the HQANN branch, and I've had some known users testing it, and they're seeing, you know, really good results. So the idea is that you build, you know, effectively a multi-column vector index where, again, one column is a vector, and you have, you know, a couple of attributes available for your metadata. It could be something like a category ID. And again, you can then just like write, you can then just write like a query like this, and like boom, it works. It pre-filters it uses the index, and you're getting very high recall. I've seen some results of it, but because they weren't my own results, I can't share them just yet. But like the reason like I'm like breaking all my talking protocols and talking about it ahead of the slides, because like I think it's really cool and exciting. And I think it's going to be something that, you know, at least, you know, at least for time being could be, you know, unique to pgvector. And again, the beauty of Postgres, you have multi-column indexes. You can find custom multi-column indexes. You have, you know, and if all else fails and you need something like, you know, partial indexes or partitioning, like it's there. So real quick, so, you know, I shared this slide, you know, hardware selection matters. Briefly, like I compare Graviton 3s to Graviton 3s in this experiment. And it wasn't that just the Graviton 3s are faster, like we expect them to be faster. But particularly as we stress the workload and started doing things not above more CPU, particularly higher values that you have search, we have to compare, you know, more vectors to each other, you know, we really, you know, we really saw a speed up with that. You know, beyond just, you know, the stock speed up you would get from Graviton going from the 2s to the 3s. So it matters, right? If you see, you know, you might be able to squeeze out some extra performance by upgrading your infrastructure. So real quick, looking ahead, I mentioned, I would ignore that, you know, that date, but it seems like it's trending that way. Parallel builds for HMSW were committed. It was actually committed while, before I wrote these slides. It's like, oh, it's coming soon. So like, oh, that's pretty cool. And HQNN, HQNN I mentioned, so finding more ways to pre-filter your data on the work clause. There's more data types. There's actually a patch committed. And this gets into like, can I index values that are greater than 2000 dimensions? Or can I, you know, can I store values in plain format greater than the 2000 dimensions? Yes, you can if they're smaller. So being able to support, you know, float twos or UN dates, you know, which we see come up. Again, if you're machine learning or, you know, whatever, you know, vector generation method you're using already like spits out float twos, great, like this is going to work just great for you. But if you're going from a float forward to a float twos, you may lose information. So, you know, check with your data scientists before you know, you start, you know, reducing your data types. There is a popular technique. This is implemented in a bunch of different systems, quantization. There's product quantization and scalar quantization. Scale quantization is actually what I, similar to what I just described is like, I might take a float forward and map it to a UN date. The idea being that I'm going to retain some information, but I might lose a little bit. I might lose a little bit of recall in that way. You know, it's not could be, you know, super dramatic, but it's there. Then you have product quantization, which I think is like a wild technique, but it works well. It's like, hey, how do I take like 128 dimensional vector and then map it effectively to an eight dimensional vector that points to like a bunch of different, you know, you know, centers off in space. A very effective technique at reducing the size of a vector, but it does impact recall. Originally, we actually had those higher up on the PG vector for a map. And again, might say roadmap, it's an open source project. So, you know, it's roadmap as is, you know, anything an open source. But, you know, as we saw more people using PG vector, you know, we leaned, you know, we leaned in on more on like the active problems, like getting HMSW into PG vector, you know, supporting, working to support, you know, enhance filtering. So I think quantization will happen. You know, I think, you know, it's a matter, it's a matter of one. Again, I think for some reason this is Q1 2024, I don't know where that came from. But, you know, I think it's likely there'll be the next bundle list after, you know, we handle these things. And then parallel query, which interestingly, I think HMSW has really helped with eliminating some need for parallel query. Again, I think as we see larger HMSW indexes, you know, it might, we might need it more. But I've seen, you know, I've seen HMSW on, let's say, you know, one billion, you know, one billion records, like literally, I mean, first off, I wouldn't have done it the way that the test result, which is we put a billion vectors into a single postgres table and then put an HMSW index on top of that table. I mean, normally you would first partition that table before you do it. But we were seeing, like we were seeing really good, you know, queries per second on this, you know, billion records in a single postgres table. And I don't think I'm allowed to share what those numbers are that, you know, right now. But there are, you know, once I run my own dependent test on it, I can, I can share something on it, but like, it's super fast. Like I was shocked how fast it was. So I think, you know, to conclude, as I know, I'm a little bit over time. When you're looking, you know, this is actually general guidance. You know, this goes beyond just PGVAC, but I think it's important because the first thing, and again, as, you know, as a recovering app developer, the thing that I didn't fully understand was recall coming into this. And the idea that I might be getting results that I don't expect, because that's what it is. Like I'm getting unexpected results. Like it's not the exact results. And that's going to always be the design decision, which is, I've got to choose between performance and recall. And HMSW is a little bit magical, that it makes that decision a little bit less dire than some of the other algorithms, because you can get both, and you can get both pretty well. But there are going to be decisions that might help impact it, like how much time I want to spend, you know, building the indexes. And then you also decide, like, what do you want to spend on? Is it storage? Is it compute? Is it your indexing strategy? And, you know, that, you know, these become very practical considerations as you're building out these things. And the last thing, and, you know, I think, you know, Andy, you know, when I was looking through a lot of the other talks in the seminar, you know, I saw, I saw this as well, is that everything in this space is rapidly evolving. And even though we have, like, 20 years of research on how to, like, how to restore and process vectors, I mean, that's new by computer science standards. You know, 20, 20 years is nothing, like Postgres is almost 40 years of research on it. Relational databases have, I don't know, I saw your talk in New York, I should, I should know that, you know, at the top of my head, but, oh, 60, 60. There we go. So I see the death stare on that one. But the, you know, what's interesting is, like, this is still a new field. It's rapidly evolving. And the interesting thing too is that people are making decisions today about how to go into production at the same time, while, you know, things three months from now may be very different. I mean, if you look at PG Vector back in February, you know, and, you know, you saw some of the public performance numbers on it, you would be like, why would I ever adopt this? I mean, yeah, it's in Postgres, but, you know, it needs work. But guess what? Work has been done, as they say. And, you know, PG Vector is, you know, super quick now. It has a lot of, you know, modern, you know, modern mechanisms in it. And it's, you know, I would say it arguably is innovating. It's adding new things, you know, not every day, but, you know, very rapidly. And, you know, it, you know, I would say it's mature. I know people running it in production. And it's on Postgres and Postgres is a, you know, very mature database system. So it's a little bit, you know, if you're looking to bring a vector storage system into what you're doing today, there's a bit of plan for day and plan for tomorrow. You know, choose something that you're going to be comfortable running. Know that this is the spaces that keep changing and evolving. HNSW does appear to be a winner, but like, I know there's going to continue to be work on it. You know, HQ A&N is a good example of that. So with that, I conclude and this is the, this is the thank you slide. Awesome. So I will clap on behalf of everyone else. Over time, so we have one question for Victor if they want to go for it. Yeah, thank you for the interesting talk. Two questions. Number one, is there any benefit running pgvector on GPU? So this has not been tested yet, but I'll discuss the challenge of GPU, particularly with databases. Well, first, if you're interested, if you're interested in this topic, there is a Postgres extension called PgStrom, which is designed to run workloads in GPU. The problem with GPU and databases is getting the data to the GPU and that you need to make sure you have the appropriate bus available to be able to get that data both from memory to GPU, back into memory, back into disk, et cetera. You know, that path is optimized for getting to CPU. You know, particularly, as we know it for databases. For GPU, you need to be on the appropriate hardware and that hardware does exist. But then beyond that, then you need to be able to have PgVector, Postgres, et cetera, work with the GPU as well. And again, there's an extension that does that today, PgStrom. And if that's something you're interested in, definitely dive into that. The biggest, I'd say to date, the biggest challenges with PgVector have not been processing related. They've been data related in the sense that, you know, for a while, PgVector was memory bound. That's where you were seeing the, you know, a lot of the performance results are on IVF5 because basically you're pulling like a ton of these pages into memory. And if you're on a memory constrained system, you're swapping them in and out. HNSW does alleviate that, you know, quite a bit, based upon its different search path. But again, like a lot of this data, you know, particularly when you're doing these searches is going to be more memory bound than CPU bound, particularly if you're only looking out for like a smaller set of your data. One thing we've been able to do with HNSW really well is like we've like pushed it to like high levels of concurrency. And we've seen PgVector and PostgreSQL pretty close to linearly, up to like, you know, a pretty high number of cores. Because in a lot of these searches, particularly for a lower EF search, you're not making that many comparisons. So you're not using that much CPU. So you just have the classic database problem of being able to pull information, you know, in and out of memory and in and out of disk. As you scale up and you start getting, you know, as you start getting more cores involved, or you, you know, you increase your EF search, you do start stressing the CPU more. And that's where a GPU could ultimately kick in. But again, I haven't seen enough data. You know, I don't think we've fully pushed the CPU far enough yet where you wouldn't necessarily see the benefit of a GPU. Now, a year from now, I think it might be a different story, you know, particularly as we get better with the, you know, just the general processing of this data. But right now, I would say there's still, there's still some headroom, you know, both on CPU utilization and how we can, how we can, you know, actually traverse the information. Oh, so I think you already answered, sort of answered my second question, what difficulty you have for C to support running on GPU. So basically, I think if I understand correctly, you're basically saying it's because the way GPU is being used is kind of accelerated with CPU. That's why getting data back and forth is the problem. Yeah. Yeah. And I think, you know, just maybe to close that thought out, I think the biggest benefit of GPU today would be on the, excuse me, the index building, not the searching, because the building is where most of the time is spent, particularly for HNSW. Again, for IVA5, most of the time is spent on the other search, actually. But for HNSW, I think there could be a benefit of GPU. And that's if that would be an error to invest in using GPU, well, I would pick that one. I still think there's headroom for how we can effectively use the CPU based upon what I've seen.