 That's a how what's your nation? This is going to be my first talk in Chinese no Unfortunately, definitely not not there, but Yeah, that's pretty much where my Chinese ends. I can order my bubble tea, but So I'll survive also. Let's put it like that Yes, thanks so much for joining. I'm Thor. I help developers built so I Worked for a little company called super base. I know some folks Already said they they've heard of it. That's great I have So the the QR code on the left is just the link to that page where I just have the resources Basically, I'm using for this talk and then on the rice you can ask questions Kind of throughout the talk, you know as you kind of think of them you can put them put them in a slido Yes this is gonna be interesting because I Don't have a microphone stand and I'm gonna Should I sit down? Should I stand? Not entirely sure We'll figure it out as we go along Now super base. Yes, open source. That's kind of the theme of the night So we called it an open source meetup Where the open source fire base alternative if you want to put it like that? It helps with the SEO You know a lot of people search for fire base. So I think that's that's sort of the main thing there Probably the same with notion With SEO as well But yes, so basically You know it comes from we love the developer experience of fire base, right? If you've worked with fire base getting started is super easy Eventually it might get a bit hairy But you know, it's it's kind of the idea of taking the developer experience from fire base and applying it to a relational database and postgres open source relational database Been around for I think over 30 years, you know three decades Fun fact actually like It precedes github, right? So the actual management of the postgres project is kind of via email there's actually an interesting blog post about sort of how You know proposals get merged into postgres. It's it's quite interesting because it's not on github But yes, so basically the database is the foundation of everything So that's postgres and we're pretty much all in on postgres because postgres is so powerful It basically is like using a cheat code So we're using things like authentication. So that's also We're using a server an open source service called go true was originally built by Netlify Back in the day for jam stack sites. We've kind of forked off of that And the authentication Actually sits on top of the database. So all the user data lives directly within an auth schema within your database And then we're using the JWT with the role-level security policies, which is a native concept within postgres To actually restrict access to data Which means that we can put kind of this auto-generated API layer on top using an open source project called postgres Yeah, it's confusing with the names, but rest get the idea postgres And that actually allows us to automatically generate rest APIs. We also have GraphQL APIs built in Into postgres with a postgres extension And so that's the other exciting part of a kind of the postgres ecosystem there All right, I probably Probably need to be faster. Okay, let me see if I sit down and kind of move the Thing every once in a while Let's see Did that work? Okay, sweet Yes, Twitter if you're not following us feel free to do that, but actually while we're here So we just launched a super-based vector on On product hunt so let's everyone go to product hunt and maybe upvote Let's go while we're here. Maybe we can does that work Oh Yes, so if you go there and you do that while I tell you about Postgres vector so Who's familiar with kind of building a i applications embeddings vectors kind of all that all that fun stuff Anyone no one Yes back there lovely So Basically, you know the idea is that you take kind of something like human language or vision and turn that context into a Vector and Did everyone manage to scan the QR code then I can go and show you some visuals that help with the explanation Okay, very good So if we go if we go here Yeah, let's maybe look at this So there's a blog post that explains kind of you know working with embeddings and Vectors in in Postgres and so basically it's kind of this idea here where You're kind of translating, you know something like human language and the context of human language into Basically just numbers kind of a vector and if you look at this here, so we have kind of this is just two-dimensional, right? And say we have something like the cat chases the mouse and the kitten hunts This is a rabbit rodents there we are Contextually that is fairly close to each other, right? You have a cat a kitten A mouse a rodent and so if you translate this into kind of the vector space This context will be fairly close to each other, right now if you have something like I like ham sandwiches That is fairly far away from kind of the the previous two statements And so if you if you put this into this vector space What you can then do is you can perform vector similarity search Or kind of proximity search where you're basically looking, okay? For two vectors to be close to each other then there is some contextual relationship and so basically now if you're putting, you know language or Images if you can translate that into kind of this vector space And you can store these vectors and then perform You know kind of the search on it Then you have you can build fairly fairly powerful applications and so PG vector is a postgres extension that allows you to store vectors of different dimensions Right within postgres and actually create indexes on top of that data on top of those vectors And then perform kind of the search operations Within within postgres And so if you're working with something like open AI Where you can turn Kind of that context like human language or images into vectors You can then store that into the database using Postgres or you know super base vector, which is basically a managed PG vector postgres Kind of bundle now obviously To kind of represent something like visual context or human language You need a lot more dimensions than just these two and so you actually have like vectors with you know Depending on the model, you know more than 500 dimensions and obviously that is that's kind of very big numbers But we'll just use this for kind of visualization visualizing this If you look at kind of the docs so we have Some documentation Super base vector sort of the soft launch that we've done now We have the the vector store that kind of allows us to store these embeddings using postgres and PG vector There's a python client library called vex That kind of takes care of the indexing Creating of the collections Performing, you know queries. So if you've worked with something like pinecone, which is you know kind of a Dedicated vector database sort of specialized on you know vector data It is kind of fairly similar to that experience where you're creating collections and you're querying querying those You're kind of absurd things into your collections. Well, we'll look at an example in a bit But kind of the exciting part with With PG vector is that you can actually you know because your data is sort of within Where is it? Is it this? No, that's not anyway, you can go through the docs yourselves, but The idea is that you know you have both your vector data as well as your you know normal relational data Or your files. So we also have Unstructured data like file storage kind of as part of the super-based stack. So everything sort of is co-existing Within the same database the same the same ecosystem And all the parts are open source. You can use them separately or You know, you can kind of bring all the different bits and pieces together and sort of that's kind of the philosophy within super-based open source Modular so like each piece you can you can take on its own self-host or you know, obviously then our business model is we hosted for you and We manage Kind of the hosted database service for you and that's kind of where we make make the money Okay, great, so let's actually dive into some examples and you know, I'm I personally I'm a JavaScript fan But I hear kind of in the in the machine learning space people tend to use Python What what I find exciting about kind of this Python example is that I managed to build it without you know Being very proficient in in Python. So I think That's sort of the great thing here So it's it's this link the the image search with open AI clip. So it's an open source Model and the the interesting thing here is that it turns Images as well as text into the same vector space. So this means that you can perform image similarity search, right by comparing The two vectors of you know images for example, but also you can do Text to image or image to text so you can kind of translate between these different Environments and so this is kind of what what this looks like so We're basically putting The image data as well as the the text data into the same vector space and you can see here We have an image of two dogs in the snow Which then the text of two dogs in the snow they those two vectors are very close to each other Because you know the context is kind of the same Same there and so what we can do is We have this example. It is on github. It is linked in the example as well And what we can do is we can run a super base Locally, which is pretty nice. So because it's all kind of open source. It's already running. So Sorry, I'm typing with one hand here. I Should say status. Yes, so We're basically running the entire super base stack locally here So we get kind of our API URL. We get a GraphQL URL We have our database URL. So if we're kind of connecting directly to the database We can we can use the postgres URL here and then also we have a super base studio And you can run that locally as well, which is quite exciting. So if you're you know somewhere where you don't have internet Well, you do kind of when you start need to pull down some Docker images So I really do that big before takeoff But you know if you manage to start the service locally and have a running you can actually develop, you know kind of offline So to speak So we have our default project here We can kind of look at the table editor. We have sort of our public schema We have a bunch of other schemas. So as I mentioned, we have kind of this auth Service that sits on top of the database. And so we have kind of all the The tables here and in a separate schema there And so if we now look at this example, let me just open it up in Here and I'm just using poetry to run this which is sort of Kind of like npm but for for Python. Let's put it like that and so here I'm just using vex so vex is our Python client for Vector handling handling vectors and embeddings And so I have here my database connection. So I'm basically just instantiating A vex client with my database connection. So this is here locally on my machine And I'm I'm actually using something here. I think this is It's open source as well. It's called orb orb stack. I believe orb Stack only found this recently And it's basically oh Okay, there we go. It's cinema mode now So it's it's Basically docker running sort of docker optimized for Apple silicon. So if you're if you have a Macbook with Like an an apple chip then actually this will run docker a lot more efficiently Then say docker desktop or something like that. And I think it's open source. So That's pretty exciting. And so you can see here. I By super base start. I have all these different services running here And then what I can do is I have my seed method. So I create my client I create a new collection that I call image vectors And we're using the open AI clip model here, which translates our images and our text into This vector with 512 dimensions, right? and then I Have my model. So we're using the open AI clip model and then I just have a couple of images just here from Maybe just remember those images. They'll be in the exam later So this is the couple images we have they're just from unsplash And so we're just encoding them here as a vector and then what we're doing is we're just using our Images collection and we're just absurding our vectors here and we can specify some metadata that we can also later use for kind of filtering in the query later on right and then we're just saying okay, we inserted the images and We are creating an index. So after we're absurding kind of the images We're basically creating an index to make the The vector proximity search kind of more efficient. So let's say poetry run seed and so We're doing all this what I just mentioned. We're generating our vectors We're absurding them into database. We're creating our index. And so now if we go back to our locally running so this is running on localhost kind of our Dashboard we now have a new schema called vex Can you see that in the back? Do I need to zoom in a bit more? And then here this is our collection that we created. So this is our image vectors. You can see we have Our IDs. These are our vectors So you can see the column type here is vector. So that's the column type that PG vector allows us to store vectors, I think of up to 2000 dimensions something like that and then we have some metadata here, which is just JSON B So we can use that to to filter lately later in our query So that that's it, right? That was pretty pretty easy. So now we can do we can perform some searches here Again, we're creating our client Now in this case, we already have our collection. So we just get kind of a reference to our image vectors collection Again, we're using our open AI clip model And so now in this case, we're getting a query string from our arguments We're encoding the query string into the same Vector space as our images and now we're just doing a query with our query vector, which is our text string Which is limiting it here to the most relevant result There's kind of other input parameters you can you can put in there And then we can also filter on kind of the the metadata where you can just say, okay We won't only want to look at kind of vectors that represent sort of check pack images, for example, and then we're just getting our result And we'll just open up on the result So this is where it comes in handy if you remembered what was in the pictures earlier because we're gonna run some searches Okay, who wants to start? I think I need to run search. What should we search for? Sorry Open field, okay All right, and so ideally now and the reason why I called these one two three four There we go. I Think that's where you applaud now, I mean Right, I think it is quite impressive because like if you think about okay You use this like on a daily basis is you use something like Google images, right? And you you can just search for like me on a bike Not me like yourself on a bike and it actually shows you all the pictures of you on a bike And but the tech behind it, right? Is it like if you think okay, how do I actually implement this? Here you go. It's just a couple lines of of Python code Okay, I mean obviously I could have been cheating here somehow So maybe should we try another one? Just throw them out Laptop well, I mean That's interesting because we didn't have anything that's close to a laptop So let's see what this model thinks is the closest to a to a laptop To be fair on your lap here. There could be could be a laptop right We don't know maybe actually maybe the model knows a lot more than Then then we do now that would be pretty mind-blowing if that was actually the case maybe fruit Let's see. We did have some grapes in there Yes, so we get we get our grapes Anyone anything else We did have a We did have a bike in front of a red brick wall Obviously if I put that in that might be a bit too easy, but Maybe if we just say vehicle I Think the bike was kind of the closest contextually to what we had is yes very good. It's like I practiced this What else do we have okay, we have So these are sort of both I guess Flowers no, I mean, oh, let's see what what happens if we put flower Anyone has an idea how this model was trained. I don't know do we get the yes, okay? I think I think that's that's fair enough Happy okay That's true like any any predictions what What's happy That is a happy remote worker working in a field Fantastic internet connection They are happy alright Yes, so the point I'm trying to make here is obviously right like the math behind this is fairly complex and you know like even if you're using something like pinecone actually This is kind of a copy of the pinecone example for image similarity search But it is quite complex even still if you're if you're kind of implementing this with pinecone But like for me, I don't have much knowledge of kind of the AI worlds I don't have much knowledge of Python and yet Somehow a fun little demo came came out right and so that's kind of kind of what we're trying to do in general We're trying to make postgres more accessible to kind of a broader audience of developers and here with sort of Super-based vector as well We're trying to make it easier to you know allow you to to build kind of AI enabled Applications Now the the I think the subject of my talk was something like building your own chat GPT Obviously, let's let's quickly end on that one Build your own chat GPT Here with Dino, so for the you know, let's go back go back to JavaScript Actually who has here worked with Dino before little show of hands any Dino fans Yeah, the super baseball. Yeah, that's true Yeah, I love Dino, they probably have the best the best swag in the in the business somehow They found someone who can draw the cutest dinosaurs and Yeah, it's just it just works I think And Yes, there we go. Okay, so we're starting up again Super-based start. We're starting our database here I probably should have maybe opened the code before before I ran this This is yeah, well, it wasn't wasn't too bad considering how many docker containers kind of are behind this So now one difference here is we're using open AI To generate embeddings Rather than kind of running the model a sort of ourselves So we're just going to the open AI API We have kind of a pre-processing step. So maybe if we quickly look at the repository You can look at kind of the the different sort of things that are happening here So we have a github action and basically the github action any time We're kind of making changes to our docs So we have like a bunch of markdown files with kind of context and we put You know that context we generate embeddings for it because what we want to do is You know if you if you use chat GPT and you asked something, you know to chat GPT It doesn't have kind of the context that you're asking this question in right and so if we know you're you know developing with super base We can actually get the relevant context from the super base documentation Give that to open AI and make sure that the answer is actually only coming from this context Meaning that you can get a lot more precise kind of answers where you know Like if I go to chat GPT and ask like what is PG vector Or no, I think the example is like if I asked what are embeddings then Jet GPT will reply with like oh in the concept of machine learning You know because it has to set all this context But if I actually provide kind of the context of this query I say okay, we're developing with super base and I asked the question now What is PG vector then it knows the context and we can provide kind of the documentation that we have And then have chat GPT actually provide kind of a concise sort of answer Okay, and so we are pre-processing our documentation. We're generating embeddings and we're storing them in the database And so we can quickly look at Where is our dashboard Here's our dashboard now This dashboard is now running our other project and so here I'm not using the Python client anymore. So this is all all JavaScript And so I just have here in my code base I have one piece of documentation Which is this markdown file called open our embeddings and storing embeddings and Postgres And so I'm just turning this into different page sections here So basically all the the different sections and then I'm just generating a vector so that I can later perform Kind of the search to find the relevant context And then lastly what I'm doing is kind of edge At runtime basically when someone is asking a query I then Take kind of the the query string here What we need to do is we need to sanitize it because open AI actually has some some terms of service where you can't like You know ask the model some crazy things that's that's against the the rules because otherwise it will get very rude Because it gets trained on all that so you need to sanitize your query. You have your moderation response So you're kind of asking basically open AI. Hey, is this an okay thing to ask? Then open AI says yes. Here's the sanitized way You can ask it and then we create an embedding here. We're using I believe this is GPT three Text embedding model and We're now then performing our We're using kind of a remote procedure call to perform our basically proximity search Using kind of the JavaScript client here So that's something that will have to make a little bit easier with vex. You saw it was it was a bit easier to do that But then what we can do is yeah, we tokenize this And then this is kind of the important piece. So we're generating this prompt where we're saying okay You are a very enthusiastic super base representatives who loves to help people. That's important So that the answer will actually be helpful Giving given the following sections from the super base documentation Answer the question using only that information and so that's kind of the crucial part where we're setting the context to be only The context of our documentation We're providing the context of Basically all the pieces of the documentation that are relevant to the user's question And then we're putting the question in here and then we're saying okay answer as markdown including related code snippets Sending that off to the open AI completions endpoint and then we're just streaming back The response, okay, and I think I'm running way over time here So Dino tasks start and you know, basically the the front end piece to this is is is fairly small But what we can do here now is what our embeddings and So if I fire this off now, we're looking at relevant information in our database We're sending that off to open AI and then we're streaming back kind of the response Now if I put in what our embeddings It just into chat GPT then it will say like, you know in the context of AI and machine learning yada yada yada It will be a lot kind of longer longer response Or I can say can I store? embeddings with Superbase Let's see if the typo typo is fine Yes Pg vector lovely and there we are that's how far I'll go and Thanks so much for joining us tonight and hopefully you get a chance to play around with pg vector And if you do, let me know what you're building. Thanks so much. Cheers