 The Databases for Machine Learning and Machine Learning for Databases Seminar Series at Carnegie Mellon University is recorded in front of a live studio audience. Funding for this program is made possible by Google and from contributions from viewers like you. Thank you. Hi guys, let's get started. We're excited today to have Montana Lowe, who's the co-founder and CEO of CTO, I forget. CEO. CEO. That sucks. It does. I'm in the same boat, yeah. He's the CEO and co-founder of PostSML. It's a hosted Postgres instance that also I am deeply integrated with ML frameworks and he's here to talk about what the stuff they've been doing. And again, as always, if you have questions for Montana as he's giving us talk, please unmute yourself, say who you are and find your question, Adam. And feel free to do this anytime and that way he's not talking himself on Zoom for an hour by, you know, which can get lonely. Montana, we appreciate you being here. The floor is yours. Thank you so much. Thanks for having me, Andy. I really appreciate it. Yeah, as Andy said, feel free to interrupt me. It'll probably improve the coherence of my words coming out of my mouth. But we'll dive right in. I like to start with a recap of the whole talk of summary so that you know, like what we're actually going to talk about. Zoolander is one of my favorite movies, but the reason the reason you should care about Postgres ML and the way that we do things and the way that we use databases. I think Andy was mentioning this a little bit before the talk is we're not, we don't deeply change the internals of Postgres by, you know, using columnar storage or doing distributed compute and things like that. Most of our usage of Postgres is much more similar to application usage of Postgres, but we move a lot of the important machine learning things into the database. And the reason we do this is because it gives us more efficiency or reliability and scalability in the database. And when you look at, compare it to a lot of the point solutions that you'll did Postgres ML is competitive with in the ML landscape. It's a lot more capable because it's built on top of the entire Postgres ecosystem and foundation. Obviously Postgres is one of the most compatible cps of software, every language in the world has bindings for it and can call into it. So you don't have to worry about like, is there a client or whatever. And when we, one of the big components of Postgres ML is that it's part of the open source ecosystem and not just the Postgres is open source database. Our extension is open source, but all of the models that we serve, and there's lots of publicly accessible models and algorithm implementations. It's all open source and to end. So that means, you know, as a as an end user, you have a lot more control over how the system is going to work. It's also a very opinionated system. There's there's a lot of what like any application there's 101 ways to skin a cat. But Postgres ML we've we've made decisions that work well together to give you a complete platform, rather than having individual touch points like yes, sure. Pinecone is an excellent vector database, but I hope that you have bindings in your language for it. Otherwise it might not work very well. It's like if you're trying to ETL data from your data warehouse, for example. And finally, you know, this this project is really just fun for me and a bunch of the other people that work on it. It's, you know, we're, we're fairly small project at this point, and we have fairly new technology so that we don't have all the technical debt of some of the machine learning systems that I've worked on in the past. And, you know, we'll enjoy it while it lasts. So I'll dive right into some of the motivations for why we actually started thinking that it was a good idea to move some of these machine learning workloads deep into the database. Because I mean, I'm sure Andy has had too many times in his life where somebody's added something to his one of the databases he's responsible for. It's an obscene workload that should never, that just takes the whole thing down. And this is a very common refrain. And so a lot of DBAs. And for all of you who are taking a databases course, you know, you'll need to protect the database from unwarranted abuse. And this happens all the time. But one very classical use case for a database in the world is the common web application architecture. You know, you've got your software application up top, and you've got your database. And then of course it's connected to the internet. And there's, there's a couple important things here is that apps are stateless. And the databases are responsible for maintaining all of the state in the system and persisting that long term. But, and because these things are connected to the internet, there's latency inherent in the system. And so any of the latency introduced by separating the state and the statelessness is, you know, very small compared to the latency inherent in crossing the continent of the world over a network connection. And so, when you when you look at like, actually, this is this is great for your first, whatever prototype MVP. Everybody should start with something very simple like as a very simple architecture, I would discourage anybody from getting overly complicated with their database or their application technology choices. It's much more important to get something working quickly. But when you talk about the scaling of the system like this, you often think like, okay, as soon as we get more users, what we really will do is we'll scale the out because scaling stateful processes is hard. But what inevitably happens if you're successful is that your app keeps scaling and every time you add more application workload that does increase your database workload. And eventually you reach a point where your database is starting to get a little bit hot. And so you start looking at ways like how can we remove database workload. Quick, quick, easy ways to do that are let's catch more stuff in the app. And so your app starts to become stateful over time as you pull database workloads out of the database. And it gets more and more complicated and hopefully your application is or your business is very successful and you have this problem. It's a great problem to have. But eventually, the amount of state that you'll be managing in your app will continue to grow. But at the same time, you're still going to be adding more and more load to your database. Until the point where you finally cave and you actually start googling on like, how do I actually scale a database and you'll realize that, oh, it's actually pretty easy with Postgres, you just stand up a replica. And there are so many workloads in the world that can be handled by read only non transactional queries that are executed, you know, outside of the normal scope of things and so it's perfectly fine for those queries to go to a replica. And they can be dealing with data that's one to maybe seconds, maybe days, they'll, and so you a lot of systems can be scaled pretty far this way. Just to give you some examples, I think Instacart was doing hundreds of millions of dollars in revenue before we got to the point of needing a replica. I think Figma and Spotify are similar stores where they scaled these massive, massive businesses on a single database, basically, and then they're like, Oh, what do we do next? Oh, okay, replica, that's easy. But eventually, you know, Instacart went through the pandemic and they became a household name, and that meant more and more app servers were required. And, you know, the dreaded day came when we had to shard databases. And when you have to shard databases, it gets pretty messy. You have like the application has to decide what goes where. If any of these databases fails, which happens more and more frequently, you have to have failover logic. You're now managing a whole fleet of databases. And it's like, how do you, how do you operationalize database management? It's not quite as simple as like, Oh, we'll just spin up Kubernetes and we'll dockerize this stuff. There are more considerations that you have to make when you're dealing with stateful systems. And so, in the end, you end up with a lot more database if you're successful. And so this, this architecture, you know, there's, there was a huge backlash in the industry in the maybe ops in the web 2.0 days. And everyone was like, Hey, let's move to microservices. And you know, one, one way of thinking about or defining microservices is that you're trying to design a service that will never require that second database. You know, replication is old, but it's not that old. In terms of functionality that's always been easily available to people. And so there's this idea that like, Oh, we'll have, we'll just have services and we'll just keep breaking up the application into smaller and smaller services. And anytime our database gets too big, we'll just break that database apart. But this, this actually gets pretty slow because you get more and more network latency. And eventually what happens is in a, in a clean service oriented architecture, an entire web request can be serviced by a single service. So like one easy example here is like maybe you have a metric service and all of your user metrics that you collect in from the client side data, all it is a single post request and just record some JSON about some event that happened somewhere. And that can be, that could go into one database table that's your events table. That can be a standalone database. It's very small, very self contained. But eventually you get cross cutting concerns. One example of these cross cutting concerns is like the search system at Instacarp. And so when you think about like product search, you type something into a form field, it goes off and then you get a bunch of products displayed back to you. But the logic that occurs there involves multiple machine learning models. It involves half a dozen microservices. You know, I've used closer to a dozen actually, at the worst point, many of these have circular dependencies. And they have their own statefulness and they become like these big app level things actually that start to have all the same problems that we had with our monolithic architecture. And then you eventually have to figure out how you're going to shard or scale that that final database. One of one of the great things about microservice architectures that's also a terrible thing is that you know this frees up every team to choose their own database to be suited exactly to the purpose that they need. In Instacarp, you know, we were running Postgres, but also Redis, Memcache, Cassandra, Druid, Redshift, Snowflake. I'm forgetting several of them, but I think if there was a major database that was popular in the last 10 years, we were probably running it behind some microservice. Most of these many of these were for machine learning models as feature stores or model stores. I mean, you can call S3 a database if you want. It's very important of SQLite, of course, SQLite in S3 because why not? That's the outer circle of hell, right? I mean, you have to realize like, yeah, your defense into hell is like one level at a time. And you're just like, okay, I've got to get out of this circle. Where do I go next? The only way, I guess, is down. At least you can say the big O word, right? Oh, yeah, thank God. But no, we had people who were like, why don't we use my SQL because, you know, Postgres has vacuuming and I don't like that and I'd rather have anyway. You can, once you open the can of worms that is microservices, it's pretty hard to put it back in the box. But eventually we did to a large degree at Instacarp. The way I think about microservices is that like, you thought it was hard to manage a sharded database system, but it was a single kind of database. It is much, much more difficult to reason about what happens when, for example, your memcache cluster goes down and now your service that was depending on that is now backfilling memcache and how is it doing that? Of course, it's hitting your primary application database, which then takes your primary application database down, which then brings down the whole site, but you thought you had microservices with database isolation and it's really, really difficult to actually achieve that level of isolation when things get complicated. To give you an idea of how complicated things can get, this is a chart created originally by Andreessen Horowitz, who's one of the big investors here in the Valley. For example, they invested $100 million in Pinecone recently, which is one of the hot new vector databases. They are very in touch with what companies are doing, and I've included a link in this graph to their original write up and blog post where they presented this. This is only a small expanded box in their much larger data infrastructure diagram, but it gives you a peek into what you need to actually build a machine learning model and a machine learning service. And when a request for search comes in, and it's using a dozen models created by a dozen different data scientists, you'll notice that in each one of these boxes, there's always a handful of different competitive technologies that can be used for that function. Inevitably, your data scientists will all make different choices just like they did for their databases. And if you don't have a really strong machine learning platform, it's sort of solved all of these problems. Every single request will go through a different microservice that's virtually an entirely new and different stack. Each one of those requests will take anywhere from 50 to 500 milliseconds. You have a search system that first has to be named into the recognition, and then it has to do synonym detection, and then it has to do query expansion, and then it has to do its initial query, and then it has to query for potential replacements for low stock items. And all of these involve multiple models. At one point at Instacart, when we had all of these microservices that were Python based, our P90 query times for search were up around eight seconds to get through. And sure, that's P90. It only happens one in 10 times, right? Except people do more than 10 searches on every visit. So basically, every customer was hitting at least one of these. And when you make somebody wait, you know, eight, nine, 10 seconds during their shopping cart, some significant portion will churn out, or they'll just give up on whatever it was that they were looking for that time in search or something else. So if you don't just lose, if you don't completely lose the customer, you'll at least lose some fraction of sales. I realized this talks not about Instacart. Talk about your stuff. Like, was that eight second P90? Was that like a, you flipped a switch on something new and it was just that slow? Oh, no, no. No, machine learning, they always want to add like a little bit more data or a little bit more sophistication or one more microservice. And so it's constantly like each microservice they add to add some new functionality. It's like, oh, this one's only 50 milliseconds, right? Yep. But then when you've got dozens of them involved at the end, you know, after years of iterative development on a search and recommendation system, then all of a sudden the CEO of the company is like, Hey, guys, I was trying to search on my mobile and it was just timing out what's going on with your team. Yep. Yeah, I would say the other thing we noticed is some of this is auto and stuff from when the, when the CFO or the CEO knows the name of the service or the name of the database. That's a problem that like you got that's when people, you know, people got, you know, actually, or have a motivation actually fix it. Yeah, no, it's definitely true. And so I got to be part of a task force that was tasked with like, you're going to bring down search performance speeds and you're going to make this system better. And that's actually where, you know, this talk is about Postgres ML, but a lot of the thinking for Postgres ML came from those explorations and those learnings. I mean, a lot of it was my fault too. Like, you know, when I got to Instacart, I helped build the machine learning platform and I helped set up a lot of the data engineering principles and, you know, a lot of service oriented architecture makes sense. I don't want to bag on like microservices are terrible idea. If you can like fully isolate something like I mentioned, then it's actually great and you should, you should try to do that. And it's a great way to break load out of your primary database without complicating your life too much. But in the end, if your business is as successful as something like Instacart is, you will have database problems. And so in the end, we moved just to an architecture like the one I've got on the screen now, where we did end up sharding Postgres. And we ended up sharding multiple Postgres databases, but we're now Instacart is fronting all of their Postgres clusters with PGCAT so that it makes it much cleaner at the application layer so that they can go through a proxy pooler like this that is shard aware and it can handle your recovery and everything else. And if, so if, if you know this is where you're going to end up, this is actually the desirable end state for your app where you're not having to cash so much at the application layer, but you can you can actually to remove database load, but you can actually handle scaling your database horizontally, both with replicas and sharding and that it's actually not that horrendous and terrible of a process. And especially if it will save you from all of the other pain points that can be involved in microservices and having many different kinds of databases out there, then I think this architecture is a pretty sweet spot to live in. But the really nice thing about this architecture is that you don't have to start with it. You can actually start with the original web app architecture of like you've got one app you've got one database that you pick your app server, whether that's node or Python or Ruby or Java, pick your database postgres is a really good choice, because it's so general purpose can handle so many workloads. But just know that it's rather than having to like make your app really complicated. You can just throw a pooler in the middle of your databases, and you can keep that and you can't really get out. And so I think companies that go this way in the future will have a much better time. There's something I've seen at a lot of startups is that that I've worked at is when we have all the engineering buttoned up really well and we have our architecture being clean and crisp, and everything's properly refactored all the time. Usually the business isn't doing very well. And I think that's why you have all the time to get all of your engineering right. I think that if your business is really growing really quickly, that there's so many urgent priorities for engineers to work on that there's very little time to go back and clean everything up. And so if you don't have that pressure to move quickly, then maybe your engineering will be okay. But maybe your business won't. And after sort of coming to that conclusion, I decided, well, I'll just found an engineering specific company. So engineering will be the whole business and we can make the engineering really good so that I can have my cake and eat it too. Because, you know, as a software engineer, code cleanliness and best practices, strong principles are important to me sort of spedically. But, but in the end, you have to be willing to do what needs to be done for the business, not necessarily for that, or whatever idealistic engineering principles I happen to have. So don't. Sorry, do you accept questions along the way? I tried late sorry. Oh, yes. Yes. Okay, so just a question on the sharding. I was wondering, not sure if you're aware of like systems like our coach DB or you go by it. Would you consider those instead of sharding or would you prefer sharding over those systems that have the partitioning sort of speak in their schema building. No, I think that those those can be great. One anecdote I'll share as well is that, you know, we moved to elastic search. That was one of our first projects of we had a postgres catalog database holding all of our product data. And so we that database was overloaded and falling over because we were treating it like a data warehouse, not like a application database. And one of the first projects was to move all of that data into elastic search and front it with a sharded elastic search cluster, which was great for about five years. But we didn't have good enough control over the elastic search sharding schema and algorithm and everything else to get where we needed to be and eventually we had to redo our sharding on postgres where we could have more control over the end to end solution. We escalated all the way up to the CTO of elastic. And because of some of our needs that were like, Oh, well, yeah, it says right here in our documentation, you can't you can't do cross shark joins. Those are always going to be slow. And so if you if you and we don't have any indexing types that will help get you out of this jam that your business requires you to be in. So I think long story short. Yes, there are lots of databases out there that will do the sharding for you all you have to pick a key right if you can use those at some point. You may find that your needs get more complicated and need and you have these cross cutting concerns. And I think particularly with machine learning again, you find yourself where you end up in these cross cutting concerns, where there's no single sharding key that will do it all for you. And you and you have to have to reconsider but yeah. One of the things I really like about Postgres is just how how much control ultimately you can have over everything. And if it's not built into Postgres itself, it's it's one of the most extensible databases out there. Always write your own extension to do the thing that you want. And that that's actually a lot of where the motivation for Postgres ML came from. It was, it was a realization of if we have this horizontally scalable database. And all of the, all of our machine learning date, like the scalability of these machine learning microservices the hardest part was always scaling the feature store the hardest part of engineering and the most complicated engineering was about getting from wherever it lived into the feature store so that it would be there in time to make the real time online prediction. And, and so trying to figure out, you know, how we simplified those systems that in Scott we ended up replacing our elastic search cluster with a system very similar to what I've shown you. And that is big Postgres charted thing and then we started moving all of our feature store data from various databases into this big charted cluster and we did this in the middle of COVID. I don't think that you know had the business not been exploding exploding in a very good way. If we were doubling every other week or something, then we wouldn't have had the license to start making these huge engineering moves and having sort of all hands on deck of pulling everybody's favorite database out of their hands and saying like this is the one scalable system, your system is like quickly going down under this load. So, so the only option at that point was to build something like like we've shown. At the end of the day, it was shockingly successful. We went from a acidic, for example, if you wanted to add some new data to the Instacart catalog, and a product manager was like oh we want this feature it would take literally the last one we did on elastic search took three quarters of iteration, nine months. Okay, the product manager says they want this feature they go to the catalog team the catalog teams that we're going to put that in snowflake, and then we're going to figure out a way to like etl that through Druid to do some future computation and then we're going to go from Druid and just that no elastic search and then, oh by the way we didn't get it in the right format that the search team needs it so we'll just start this whole like cycle over again and cross like three, three VPs of engineering and half of a half a dozen engineering teams. The coordination overhead there was terrible. It's like oh wait we don't support your your data type for time stamps that's use strings for time stamps throughout the whole system, or or whatever. And so, when we said everybody was going to put everything inside of Postgres, we're going to let anybody get out of any table they want. Then when we had issues it's like oh we're just going to change the column type in Postgres, the two engineers going to agree in a meeting, and it takes an hour now, instead of multiple weeks of like oh we've got to reconfigure our whatever service. But anyway, I feel like I've wandered a little bit away from the slide. If you if you don't buy all of the are all reasoning for simpler database architectures and how those will make your life better. They're in especially in terms of machine learning complexity. There there's this notion of data gravity that like the more data you get into a system. The more data it will also attract the more applications will get built around it. And again you'll have this snowballing problem of like unconstrained growth in the data layer. But in machine learning you have a different option. You can be running your, your model as if it were a stateless service, and every time your model needs to make a prediction you can go fetch data and pull the data up to the model. Or you can do what Postgres ML does which is push the model down into the database into the data storage layer. And then you know you're not pulling data out of the database to the application layer, you're you're just passing a pointer from Postgres shared buffers through the model and so there's no more data movement. That does mean you have to redeploy models and so you're moving models, instead of moving data. And in my mind, this is, this is fundamentally better and it's provably better, because any model is always smaller if it's a good model. It's always smaller than the data set that it's trained on, and it's always going to be smaller than the data set that it will be used for prediction. And it will always change less frequently than the data set that it's being used to model, otherwise it's just not a good model. If you have to constantly update your model then it hasn't generalized and you've really failed at the machine learning aspect. If you build your model well, then there will be fewer electrons involved in a Postgres ML kind of process than would be in a microservice architecture process. And there's a question of like, how much it does that actually matter? Aren't computers fast? Aren't networks fast? Isn't ML inherently slow and expensive anyway? So are you optimizing the right thing? And that's a really good question. You should always benchmark and optimize the right thing. There's another question of like, a lot of Postgres ML thinking applies to classical machine learning. The systems that we were building this for, we did have some deep learning models involved in our search services at Instacart years ago running like TensorFlow 0.4 or whatever it was in production. But the new world that everybody's really excited about is like vector databases and GPT-4 and open AI. And like, can't you just make a call to open AI? Why do I need to consider anything else? But if you're even in the new world, your open AI chat GPT model in the open source world where you're hosting this thing yourself, it's still a massive thing. It's still 70 gigabytes. It will break all of your traditional software application continuous integration deployment pipelines, because most people aren't deploying 70 gigabyte Kubernetes containers. So you're going to have to rethink your deployment system as it is and like how you actually manage these systems. But at the same time, these models are still incredibly data hungry at inference time because you need to go pull back not just one vector, but potentially hundreds of vectors. Vector databases will do cosine similarity, but cosine similarity is actually a really bad predictor of relevance compared to having a trained model that is trained to predict the relevance. And one of the things that is catching on now is like, oh, I'll fetch 10 documents from my vector database by nearest neighbor. And then I'll feed that to a pruning model that will select the top two or three most relevant documents before I actually pass that on to my text generation model. So even in this new world of LLMs and vector databases, being able to have the data, whether it's vector data or traditional and user data, tabular data, in the same process as the LLM, even though LLMs are slow to run, you know, they're anywhere from 10 milliseconds to many seconds of runtime. That data movement is still a considerable case or a considerable expense. I've got some benchmarks we can show later in the talk, but this matters as much as ever. You really can load up your LLM in your database one time, and that's a one time data movement cost that will then save you from moving. And keep in mind, vectors are, you know, a thousand four byte floats long, that's four kilobytes. It doesn't take, somebody wants to pull 100 vectors out of your Postgres database. Then they're talking about half a megabyte of data movement. And then you're going to pull that into a Python process, which is going to blow it up to like 50 megabytes of Python data memory. And then you're going to run it inside your model. And it's just, I mean, actually, you'll spend as much time in pandas in the data frames in the Python world, as you will actually using your model and actually using your data. And one of the really cool things is like, you can see that this is a sequence of events that starts with the app and goes through the embedding model. And then it goes prompt creation, then it goes tech generation and then response comes back to the app. I should have had an entry arrow on this diagram starting with the app. So you can see the actual whole loop on what a request lifecycle looks like. But you can actually write a single Postgres query with multiple comment table extensions. They can do the first, you know, it can do a union between an embedding query and a normal SQL query as a CTE. And then it can actually have, you know, Postgres string and cat nation or other UDFs to actually generate your prompt as a second CTE. That can then be calling a pruning model as a third CTE. And so you can just chain these comment table expressions together until what you really have is like a full program of multiple steps. Except that program instead of executing across a web of Python microservices, serializing and de-ferializing the data at every single step, it all happens inside a single Postgres process. And so you cut out so much network latency. And when we actually look at the load on the database in these cases from serializing all of this data in and out, Postgres load and query times actually drop because instead of having to send back, you know, half mag of vectors on every single query, they're sending back like a 10 kilobyte text string or something. And so just like the data movement in and out of the entire system can drop significantly. And then you've also like dropped off this huge web of microservices that are now non-existent instead of massive GPU bills. So I hope I've convinced you that this is at least an interesting idea. I was going to give you a little bit of an idea of like what Postgres ML actually is. For classical machine learning, it's just these three functions. These are basically UDFs that the extension provides for Postgres. Machine learning is actually a very well-defined process. We have supervised and unsupervised learning. We have classification and regression. These are all tasks that can be done with machine learning if you're not familiar. These can all be provided as just parameters to a training function. You can just say, I want a classification model. Literally, that's the task. That's what it's going to do. The problem formulation with machine learning is still, I think, hard. And I think that's where most people get stumped. But once you can formulate your business problem as a machine learning, either classification or regression or now as a text generation problem for chatGPT, then you're off to the races. Postgres ML gives you the ability to train models. It gives you the ability to then strategically deploy those models like you would expect. And finally, it gives you this predict call that you can now leverage that model given some new data that's been written to the database. Or you can just pass Postgres accepts parameters in queries. You can just pass the, you don't even have to actually, this was surprising to us that people were using Postgres ML as just a model inference server. It's basically stateless service, but they liked Postgres better than having like GRPC or some other H2P REST endpoint because they trust Postgres to be able to serve responses reliably. And they know how to manage Postgres as an existing piece of infrastructure. There's, I think this is more like the new school of vector databases and transformers that Postgres ML also provides. It's worth, I think, noting that, and I'll talk about the technology that we use and how we build this stuff in a little while, but the transformers stuff, the Hugging Face Transformer stuff is still in Python. Everything else is written in Rust. And so we have like good zero copy abstractions and a lot of places we can move data without having to actually copy it. But in the Python case, we still do have to go through Python to access some of the latest LLMs. And those things are changing and coming out like every week. So it's pretty hard to nail them down and standardize them, but there's progress being made. And we'll get back to a lower level implementation on this front in the future. But given those six functions, you have a very comprehensive machine learning toolkit that you can solve a lot of problems with everything in Postgres. So this slide talks a little bit about how we actually bear memory inside of the Postgres process. If you know Postgres very well, then you know shared buffers where Postgres pages data in and out from disk to RAM. And that's configurable. But we store our models in Postgres tables and we store our feature data in Postgres tables. And so that is naturally stored in the shared buffers. And so Postgres manages that global cache resource for us. But each connection that's opened for Postgres, when you actually call one of these functions like predict or embed or transform that leverages a model, that the actual, you know, pulling the model out of all of the weights out of shared buffers and instantiating it with whatever model inference library it needs to use, whether that's XGBoost or scikit-learn or PyTorch. We support all of those under the covers. Your model gets cached in the connection. And because so many of these models run in Python and were originally conceived of in Python, none of them are developed to be concurrent. The support concurrent access, they all have some kind of a lock around their usage. And so this actually works really well in the Postgres connection process memory model because every connection is an independent process. We can load as many copies into as many different connections. And then it's Postgres connections that give you concurrent access to the model as many times as you need. Now this PGCAT is also really important in this picture because PGCAT allows us to keep that connection open even when a client goes away, which preserves our model cache of what models were actually being used. We can also use Postgres roles. And so like if you want to isolate certain connections or throttle certain models and enforce queuing, you can do all of that with PGCAT. And you can say that, you know, whatever user is using model XYZ, we're going to limit them to one or two or 10 back end Postgres connections, which limits their total usage and throughput in the system and their queue. And that'll leave the rest of your Postgres database available for your application workloads or your other modeling workloads that you might have. How big is usually a model? It varies wildly, but a linear regression model is like eight bytes. It's like two floats basically. And you can actually do a lot like that. But like an XGBoost model can be anywhere from like 10K to a couple of megabytes. XGBoost really is state-of-the-art for tabular data inference. But like, yeah, it's on a connection for like a state-of-the-art model for doing, you know, most of the search tasks, it's totally fine. It's even on like a tiny database. But when you get into LLMs, then you can be talking about like a, you know, Lama7DB is like 280 gigabytes if you're using the full precision. Yeah, that's what I was getting at. So like, so every connection, every Postgres worker has a copy of a model sitting in memory. And it's not, is it in shared buffers? So, I mean, because the model is persisted in a table. So if you want Lama7DB in Postgres ML, you can get persisted in a table. It's going to be a 280 gigabyte. Well, you know, there's a row data size limit in Postgres. So we transparently split it up into multiple rows. And then we stitch them back together when you load your model. That becomes 280 gigabytes in shared buffers. And then that actually, you know, we'll get actually copied into your connection-specific model cache. So then your connection will need 280 gigabytes of workman. Yeah, that's what I was getting at. Yeah, so it's almost to the point for, I understand not everyone's running the full Lama thing, but like, at some point you need to dedupe that memory. Yeah, absolutely. And I think that that will be part of the move away from Python transformers to libraries like Rustformers. This is all very new stuff for us that we've just been, you know, extending in the last three to six months. And we've got a lot of work to do on this. But I think you're absolutely right. And we want to get to a point where we can like share beyond what people are doing in Python, where we can actually share the read-only weights of these models. And then we'll only need to allocate memory for the intermediate computation steps. There's still a lot of these buffers are still intermediate computation that have to be isolated and can't be concurrently accessed by multiple processes at the same time. Because they write to them for intermediate use. But we can definitely do better there. Awesome, thanks. And the GPU cache is also interesting in that regard. Because you're much more memory limited on GPUs. And so I think this is where a lot of the motivation for us is coming from de-duping memory. And we have a serverless cloud offering right now where that being able to offer people, you know, a time slice of a GPU and a shared memory model across multiple connections is really important. Because, you know, a lot of people experimenting with LLMs and transformers right now are hobbyists or enthusiasts or prosumers working on hobby projects. They're not necessarily large corporations that have the budget to spend $5,000 a month on a GPU in the cloud. And so being able to reuse that multiple connections from PGCAD is something that we're pretty excited about. Because then we can charge people 60 or 70 bucks for their little embedding model. They've got a vector database with an embedding model that they can query many times an hour for their chat bot. But they can't, and they don't need to fully utilize the GPU. Either it's RAM or it's compute. So yeah, I think expect to see more there. And so we love benchmarks at Postgres ML. In some ways I feel like we're just cheating because we don't have all of the network overhead and everybody else has network overhead. We can say things like we're 10 times faster than OpenAI for embedding generation. But that's because you have to call OpenAI over the internet. You can't run OpenAI in your data center. The best you can do is like try to guess where Microsoft is hosting their things and then to put your app or whatever in the same data center. But you're still subject to all kinds of queuing and whatever. A lot of people think that OpenAI is a clear leader. And so it's worth waiting for higher quality. I want the best when it comes to these things. But it's not true. OpenAI has lost pretty much every domain except for text generation. They used to be the leading image producer with Dolly too. And it's funny I say this because they've just come out with the integrated Dolly in GPT-4. Trying to get back to relevance against stable diffusion or against mid-journey, which really took it away from them. They used to be really relevant when it came to embeddings but they now ranked like 12th or 13th and they keep getting pushed down the leaderboard across the metrics that we see. And I mean this isn't a whole other interesting question of like is OpenSource going to win or is CloseSource going to win? Will OpenAI keep their lead and will you always need an OpenAI integration or will we actually be able to run OpenSource models in our database or in our local Lama setup? My money is on OpenSource. I think that we've had multiple reports of like the whole Google we have no moat thing going on and like the fact that GPT-4 appears to just be a mixture of experts of GPT-3.5. And now GPT-3.5 is losing to Falcon 180B and a lot of context so I think like we might even see them lose text generation in the next three to six months unless of course they release GPT-5. We don't owe, they don't tell us. But I think it's going to be very competitive and very interesting and there's still a lot of other reasons to choose OpenSource other components that you can mix and match with text generation. Yeah, it's a similar story. Like if you use Huggingface for your text generation to notice these models and then you use Pinecone as your vector database. I mean, PGVector added hierarchical, navigable, small worlds, H&SW as an index type last month. Andrew Cain's been crushing it on that front. But even with IVF flat, the let it faster to index things. So it's still a relevant indexing type for vectors. But even using IVF flat for query time, which is slower than H&SW and doesn't scale to as large of collections, we're significantly faster because you've eliminated two internet round trips, which are a lot slower than the 10 milliseconds it takes to do an embedding generation and the sub millisecond it takes to do a vector index lookup even with IVF flat. So you can go down the list and you can dig deeper here. In my mind, I think that we have presented a pretty strong case for, you know, this terrifying concept of moving more workload into the database that it's not just effective. It's also safe and it's also scalable. And if you follow those architectural principles, it'll be a lot better in the long run. Oh, yeah, it is important to note that, like, a lot of, I think SQL is awesome. I imagine everybody in this databases course also thinks SQL is awesome, but a lot of people only know Python or JavaScript or whatever language. And so we actually have another Rust project where we generate Python bindings and we generate JavaScript bindings that encapsulate a lot of these common machine learning application paradigms and give you, like, three easy JavaScript functions you can call if you want to index documents and recall them from a vector index from your JavaScript app without having to actually write any SQL or know about an IVF flat index or anything like that. And again, this means that, like, people who don't know anything about retrieval or anything, I guess, can get the benefits of a much, much faster architectural. So, like, getting, I think we're coming up on time. We've got 10 minutes left. Andy, is that about right? Yes. Yeah. So I'll go through this pretty quickly. You know, we use PGRX. It is a Rust extension management framework. It is awesome. Like, I love developing. I never thought that I would love developing a Postgres extension, but it's a pretty nice life. It's all pretty well managed. It's like writing any other Rust app at this point. I love the strongly typed Rust application with strongly typed database schema. Having lived in a world of Ruby and Python and Elastic Search and Cassandra that are all, you know, all of these things are schema and type-less. They're runtime typed, whatever. It's amazing to see, like, I just add a new enum somewhere, and then all of my Rust match statements are broken, and then I go fix that. And then the Rust compiler tells me, I haven't taken care of a bunch of other things, and I'm like, oh, I've got to add this. And, like, I don't really have to think through it anymore to add features. There's enough of a framework in the application. But if I just break the first thing by adding something to an enum, it pretty much tells me everything. I have to fill out. So I'm pretty excited about the full extension. These are some of the libraries that we use under the covers. There's a lot going on to move more machine learning into Rust in the Rust community. So I think this feels good to me. But in the meantime, we do call back into Python. And there's a couple of reasons. We want good, strong reference implementations. People have been using Sykit for 20 years. They come in to Postgres ML, and they're like, I want to see the exact same convergence and statistics that I was getting. And so we need to be able to at least give them that pass out equivalence test before they can move onto a different implementation or different platform that they're used to. So I'll stop there. Open up to any questions that people might have. Awesome. Thanks so much. I will applaud and have everyone here. If you have any questions, Kristoff, you want to go first? Yeah. I have a question on PG Vector. If I look across different vector databases, then they implement different similarity metrics and different indices. And not a single one has all of them. So I was wondering, do you have a sense of if PG Vector wants to become in quotes a superset of all these combinations so that I maybe avoid having to run two or three databases depending on the vector similarity search or index support? So I think if you're running less than, say, 10 million vectors in your corpus, it doesn't matter. It's going to be whatever there is going to be fast enough. And also, and that's for indexing, whether you're using IVF flat or H&SW, or there's at least six that I think are out there right now and fairly popular. PG Vector, IVF flat is the one that's really fast to build the index, but it has slightly worse query performance. H&SW really slow to build the index, but has much faster. So you can pick which of those two extremes you want to live on. It doesn't have all the ones in the middle. But I think from my perspective, that's good enough for, like, 98% of people. And honestly, anybody who has less than 10,000 vectors doesn't need an index at all. You can just run the query and brute force it, and it'll come back in 10 milliseconds. It's fine. I mean, I guess if you need sub 10 millisecond queries, then sure, put an index on it, and then you'll be bound by your inside data center query time, which is going to be a millisecond to get between boxes and connect to your Postgres instance. In terms of the operations, like cosine similarity versus Manhattan distance, those don't really impact query or indexing speed that much as much as, like, the index type or network latency would. I wouldn't worry too much about that. What you should worry about is cosine, all of those are very simple arithmetic distance functions. But they all treat every single element of the vector as equally important to whatever is being measured, and that's rarely the case. And so what you actually want is you want to train a machine learning model using user feedback data to actually tell you how similar two vectors are and not rely on cosine distance or cosine similarity or the dot product or the Manhattan distance. If you really want to improve that, and the only way you can do that is if you can take 1,000 of these vectors and then run them through an XGBoost model in the same memory space. Otherwise, it's prohibitively expensive to pull 1,000 vectors out of your database, feed them to an XGBoost model. But this is what we do at any modern search and recommendation system. And then, of course, the CEO complains about how slow your search system is. You can't possibly pull 1,000 vectors out. And so there's some negotiation and haggling that takes place of what if we just pull 100 out or what if we just pull 50 out? Can I get some latency budget back for my next machine learning project? But then when you're only looking at the top 50, instead of the top 1,000, there's usually something in the long tail that might have been the product that the user was going to buy that XGBoost would have been able to promote all the way to the top, but cosine distance won't find it. So you just lose some percentage. Thanks. I'll just say we invited the PG vector guy to come give a talk and declined. So we have the neon guys are building PG embedding. And they're giving a talk later in the semester. Other questions? Go ahead, sir. I was going to say it's exciting to see multiple implementations take off here. Yes. Any other questions from the audience? So I'll finish up by saying a question. I mentioned the memory deep issue. And you guys also did a major sort of refactoring, sounds like from Python to PostgreSQL-based extensions. What's another sort of major system task on your horizon for the next one to two years that you guys want to undertake? And if you want to go five years out, by all means do it. But what's a major challenge you think is sort of unsolved that the space you're working in? Well, I mean, off of the top of my head, columnar storage is actually really important for time series calculations. They've got good, there are good algorithms out there and in timescale has already implemented it from Postgres. But unfortunately their license wouldn't allow us to offer something like that. So I think coming up with an actual open source implementation for columnar storage that we can integrate with some of these time series predictions would be good. We do have a long road to go when it comes to adopting rust implementations for the latest LLMs and making sure that we can get back to a fully deduced memory storage. I think that that is what we'll consider 3.0 for Postgres ML. That'll be a big milestone achievement for us. We've got probably 50 or 100 GitHub issues open right now. So I think an enormous breadth of coverage that when you look at what's possible in machine learning, there are so many different algorithms or bells or whistles that people want that we are effectively competing with the entire Python ecosystem. Like as soon as Yandex releases CatBoost, people are like, oh, it's the latest ingredient boosted trees. Can you do CatBoost too? And it's like, oh, OK, yeah, we'll add that. But it's a never-ending fire hydrant. So I think having help on that front would be awesome too.