 Hi, welcome to our talk on designing your SAS database for scale with Postgres. I think we're starting about five minutes late, so we'll try to like rush through things, the slides. I'm Özgün, the C2 Outside2Stata, and Lucas and I will be co-presenting this talk. We have about 40 slides, and it's a technical talk, so if you have any questions, well I was going to say feel free to interrupt, but given the time, maybe we keep the questions to the end. And any urgent questions, feel free to interrupt. And before we start, I have a quick question prior to this talk. How many of you have heard of Cytus? Okay, a fair bit about health, and we usually hear this question, so I'll spend just two slides explaining what Cytus is. We'll then dive into the talk. So Cytus is a distributed database that uses the Postgres SQL extension APIs. Cytus horizontally scales Postgres SQL across multiple machines using sharding and replication behind the covers, so it's all invisible to the user. Cytus's query engine paralyzes SQL queries across the machines in the cluster. Most importantly, Cytus is open source. If you'd like to try it out, or if you have any feedback for us, please hit us up over GitHub. This is our GitHub repo, Slack, or Google Forums. What are some good use cases for Cytus? Cytus serves many use cases, and three common ones include the ones on this slide list here. First, if you have a multi-tenant database that needs to linearly scale, Cytus enables that with minimal changes to your application. Second, when you have a large dataset, and when you want to get answers from that data in human real time, typically in less than a second, you want to use Cytus. Third, if you need to write large volumes of data into your database, and you'd like to combine the power of structured and semi-structured data, Cytus is a good fit. With that, let's dive right into the talk. Here's a quick talk outline. We're going to start our talk by first defining what it means to scale. We're also going to talk about different ways to scale, and most importantly, when you need to scale. Then we're going to define what a multi-tenant database is. After talking about scale and multi-tenancy, we'll talk about data modeling for multi-tenant databases. We're also going to provide an example database schema just to make that example like more concrete. Next, I'll talk about three different approaches to scaling multi-tenant databases. From there, Lucas will take on over the slides and talk about integrating your multi-tenant database into your application, and then he'll do a short demo. Finally, we'll conclude with Q&A. Okay, with that, let's get started by first defining what it means to scale. At a higher level, scaling is the process of providing more resources to your application or your database with the intent to improve its performance. Also, as a general rule of thumb, scaling computation is always easier than scaling data. For example, let's say that you have a website that serves customer traffic. For example, Amazon.com. If you needed to scale your front end, you can just provision thousands of servers on the fly, put them behind a load balancer, and scale your website. Doing the same for your database, however, tends to be harder. And in this talk, we're going to focus on scaling databases. When you're scaling your database, you need to think about both hardware and software resources. Usually, though, if you can linearly scale your hardware, you're in a good place to scale your database. We're therefore going to focus on the hardware dimension of scaling. And I'll keep it very brief. In fact, there are, you can scale your CPU or memory or disk in different ways, but I'll just keep it brief for this talk. And I'll start with the first and better known way of scale, which is known as vertical scaling. In this approach, you simply go and buy a bigger machine. In the example here, we have a PostgreSQL database that has 4 CPUs, 30 grams, 18 grams of locally attached storage. And then you need it to scale. So you go and buy this bigger machine that has four times the resources, and then you migrate your database from one machine to the other one. And this approach is the simplest way to scale your hardware resources. And we recommend that you always leverage it as, like, as long as you can. That said, there is a potential drawback associated with this approach. What is it? Any guesses? It costs more. Yes. What else? Yes. There is a hard limit to it. You can only scale up to a certain extent, and then you're going to hit a wall. And in that case, you will need to scale out your database into many different machines. When you follow this approach, you can continue to scale your CPU, memory, and storage resources by adding new machines into your cluster. This approach is also known as horizontal scaling, and we're going to talk about it in the later slides in our talk. Now, an important question to answer is, when is the right time to think about horizontal scaling? The answer to that question primarily depends on where in your application and databases lifecycle you are. If you're at a stage where you can throw more hardware at the problem, we recommend that you always do that first. We also recommend spending time in tuning your database's config settings and optimizing your queries. There is a link there. And in other words, we recommend that if you don't need to scale out, don't scale out. With that said, if your SaaS application is growing, there will be a point where you're going to start running into performance issues. So when do you know, like, when do you get a sense of, okay, this is the time at which I need to think about scaling out? We've seen numerous customers who look to scale out their Postgres database at Citus. In terms of when to scale, I compile three heuristics that I wanted to share. These aren't rules, per se. It's best to think of them as general guidelines. The first one is if your business is growing, and you're on the second largest instance type available on your cloud provider, you probably want to start thinking about scaling out. Why the second largest instance type and not the largest one? Because your business will continue to grow on, and being on the second largest instance site will give you breathing room as your business grows, while you start thinking about scaling and maybe sharding. A second heuristic that usually applies for our LTP type workloads is auto-vacuum. Like, we hear this frequently. PostgreSQL uses auto-vacuum demons to clean up load. And the default vacuum settings are too conservative to begin with. So if you haven't tuned your auto-vacuum settings, we have some recommendations here. If you have, and you're still experiencing, hey, my vacuum issues that comes up, now that may be a point to start thinking about scaling out. A third heuristic, and again primarily for our LTP workloads, is how much of your working set your database can serve from memory. Most databases will track how often you hit the cache, and when you need to issue this guy. For our LTP applications, most of your working set should be served from the cache. Ideally, you'd want to serve 99% of your lookup queries from the cache itself. So in PostgreSQL, a good way to calculate your cache hit ratios is by running these two queries. The query at the top gives you the hit ratios for your tables, and the query below gives it for your indexes. And if your cache hit ratio for your database is starting to drop below 99%, then it may be a good time to start thinking about horizontal scale. To recap, you have several heuristics you can keep an eye out for. By using those heuristics, you can have a sense of the right time to scale. Then when that time comes, what do you do? What are different approaches to scale out? We find that the answer to that question depends on your requirements, and that the following two questions help better identify those requirements. First, are you looking to scale out a database that serves a B2B, B2B2C, or B2C workload? This talk is about scaling B2B workloads. Second, do you have a transactional or an analytical workload? This question is orthogonal to the first one. You could have a multi-tenant database or a B2C application, and you can be interested in serving either transactional or analytical workloads from either of them. This B2B workload lends itself to what's known as a multi-tenant database. If you're building a B2B application, most information relates to tenants, customers, accounts, and your database tables capture this natural relationship. As an example, you could be building a marketing automation tool for other businesses. In this case, each business that you serve along with their data becomes a tenant in your database. And the notion of a multi-tenant database isn't new. It's been around for at least two decades. What's new and primarily in the context of scaling multi-tenant databases is the cloud. We now have cost-effective SaaS applications. These applications no longer only serve dozens of Fortune 500 companies, but they also power thousands of other businesses. And these SaaS applications more and more rely on open source, and they need ways to scale not only to dozens or hundreds of tenants, but to tens of thousands or hundreds of thousands of tenants. And further, SaaS applications can store even more information to help their customers. These reasons combine create a distinct motivation to scale multi-tenant databases. Google's F1 paper is a good example that demonstrates a multi-tenant database that scales this way. How many of you here have heard of F1 or read the paper? Okay, just a few. This paper talks about technical challenges associated with scaling out the Google AdWords platform to over one million tenants. So if you're a Google AdWords customer, you're one of those tenants within that multi-tenant database. And the paper also describes common RDBMS properties F1 leverages for powering the underlying AdWords platform. Those features include transactions, joins across tables, and database constraints to make sure that each day tenants data remains consistent with each other. The F1 paper also highlights how best to model data to support many tenants or customers in a distributed database. The data model on the left hand side, which is this one, the traditional relational data model, follows the traditional model and uses foreign key constraints to ensure data integrity in the database. This straight relational model introduces certain drawbacks in a distributed environment, however. In particular, most transactions that if you distribute each table in here, the customer table on customer ID, campaign ID, ad group on ad group ID, when you do transactions that touch a particular tenant, a customer, they need to span across the network. Or when you're doing joins, you need to bring data, shuffle data across the network. And if you have foreign key constraints in between these tables, you need to enforce those constraints. And if you shard each table on its primary column in the relational database model, then most distributed transactions, joins and constraints will become expensive. The diagram on the right hand side shows the hierarchical database model. It's this one. And this is the model used by a fund. I won't dive a whole lot into the details. In fact, the hierarchical database model was the predominant database model prior to the relational model. I'll just dive into the simple idea, which is the customer, this is a Google customer, Adios customer sits at the top of the hierarchy. And all other tables underneath it, they fall into a hierarchy. And then they are sharded on that customer ID, more or less in the simple form on that field. Does this make sense? The model on this side of the picture? Now the key benefit to this hierarchical database model is that it enforces data collocation. In its simplest form, you add a customer ID tenant ID column to your tables and shard them on customer ID. This ensures that data from the same customer gets collocated together. And collocation dramatically reduces the cost associated with distributed transactions, distributed joins, and foreign key constraints. Because when you're operating, you're primarily operating within the scope of a customer. And collocation also simplifies costs associated with handling network and machine failures. In summary, the hierarchical database model brings performance as a key benefit. Let's look at the concept of collocation just a bit deeper. In this diagram, we have three tables, stores, products, and purchases. And all three tables are distributed on the store ID. So we have the store ID one here, products is again distributed on store ID, store ID one, and purchases similarly, store ID one. And this way, when you run transactions or joins that are scoped to a particular store, you can always push them down into that machine without having to pay the cost of managing these operations over the network. Collocating tables together sounds great, but what happens if I have a table that doesn't fit into this data model? Which is fairly typical. For example, this could happen if you have a web application that serves different accounts or different organizations. And each organization normally has its own like users. However, you may want to simplify the login process for those users who log into not only one organization, but also to the other one. You could handle this users login table in one of two ways. You could either keep it as a regular process table. Or you could share your users a login table on the user ID column, and make sure you don't join it with other tables. A second common table type is small tables that touch across all tenant related tables. For example, think of time zones, you have a time zone table or like you have these small dimension reference tables. For these tables, one option is to denormalize them into the larger tables. Another option is to create reference tables that are replicated across all the machines. Now that we talked about scaling multi-tenant databases, let's talk about different ways to implement this approach. So if you'd like to scale your multi-tenant database, we did what is the scaling, when to scale, and now you want to scale, how do you go about it? At a high level, you have three options. You can create a separate database for each tenant. You can create a separate namespace slash schema for each tenant. Or you can have all tenants share the same tables within that database. Each one of these design options come with different benefits, and I'll look to describe them in the next few slides. In the first approach, you create a separate database for each tenant. From a hardware standpoint, these databases could all be living on the same physical machine, so you could create three of those databases living in one physical machine, or they could be living in separate machines. For example, in this diagram, we have the three databases, and those three databases usually share the same table, index, PLPG, SQL within them. And each database only holds a particular tenant's data. For example, the database on the left-hand side holds a tenant files' data. And when you send a query, the query gets routed to that particular database. This approach, creating a separate database for each tenant, optimizes for isolation of tenants. You may need to use this model in industries such as healthcare or finance that impose regulatory requirements. This model also has the benefit that you could give each tenant SQL access to the underlying database and have them run their own queries. I believe Oracle and Azure's multi-tenant database offering follow this approach. And the trade off says if you're not using those products, DBAs now need to manage different databases and make sure that each database gets allocated a fair share of resources. In practice, customers who have isolation of tenants as their primary decision criteria usually lean towards this approach. In the second design pattern, you create a separate namespace schema for each tenant within the same database. From a hardware standpoint, these schemas could again all live on the same machine, or you could add logic to place them on different physical machines. In this diagram, for example, tenant file lives in its separate schema, and all tables and indexes and all that logic actually are scoped to that particular namespace. Before you send a query to the database, you usually first set the search path config setting in Postgres SQL to tenant file schema, and then you send the query to the database. When you think of isolation and scaling as the trade-offs, this second design pattern sits between options one and three. In this model, you can continue to isolate a tenant's data and queries into a particular database schema. Depending on your industry, this may or this may not meet the regulatory requirements. Similarly, compared to the one database per tenant model, this approach does a more efficient job at sharing resources. You now have one database that manages resources allocated to it. At the same time, you may start running into other resource challenges. For example, if each one of your tenants has a hundred tables and you have a thousand tenants, your database needs to maintain 100,000 tables. ORM tools, for example, cache metadata related to all tables in the database when they start up, so now each ORM process would need to cache data related to 100,000 tenants. So there is software resources like associated with this as a question mark. And the third design pattern is one where all tenants share the same tables in a particular database. And again, this database could be distributed across the cluster. In this approach, you simply add a tenant ID column to all your tables and share them across your cluster. In this diagram, for example, you have three tables, tenants, campaigns, and leads. And you have the tenant ID column across all of those tables. Each lead for a tenant becomes a separate row in that database. And then tenants, you see you'll see tenant 5, 5, 5, here and a data for a separate tenant in that database table definition. And then you ensure that these tables are consistent through database constraints. In this approach, you don't have the strict isolation guarantees as the one database per tenant approach. For example, your application needs to add a tenant ID filter to queries going to the database. On the scaling side, this design pattern easily scales to tens or hundreds of thousands of tenants since all tenant data lives as rows within regular database tables. This approach also simplifies the operational burden. For example, you can add columns to your database table schema and the database takes care of all the work for all your tenants. So you don't need to go over each database and add a column that way. So how do you pick between these three design patterns? The truth is, each of these three design options with enough effort can address questions around scale and isolation. The decision depends on the primary dimension you're building, optimizing for. A simple rule of thumb is if you're building for scale, have all tenants share the same tables. If you're building for isolation, create a separate database for each tenant. Now a natural follow-up question is why having all tenants share the same tables provides better scaling characteristics. The answer to that question comes from our definition of scaling. Scaling is allocating more hardware and software resources to your database. The more efficiently those resources are shared, the better your scaling characteristics get. As an example, if you create a separate database for each tenant, then you need to separately allocate disk, memory and CPU to each database. If you're running 50 of these databases on a few physical machines, the resource pooling and resource sharing becomes tricky even with today's virtualization tech. If you have a database, a distributed database that manages all these tenants, then you're using the database for what it's designed to do. You could shard your tables on tenant ID and easily support tons of thousands of tenants. Does this make sense? When all tenants share the same tables, a second related benefit involves operational simplicity. As your application grows, you will iterate on your database model and make improvements. For example, you may need to change the schema for one of your tables or add an index to improve query performance. If you're following an architecture where each tenant lives in a separate database, then you need to implement an infrastructure that ensures that each schema change either succeeds across all tenants or gets eventually rolled back. For example, what happens when you change the schema for 5,000 of your 10,000 tenants and one of the machines in your cluster failed? How do you handle that? When you shard your tables for multi-tenancy, then you're having the database do the work for you. The database will either ensure that an alter table goes through across all shards or it will roll it back. Before I wrap up, I also wanted to touch upon a question that we hear frequently across all three design patterns. The question is how does my largest tenant affect my scaling properties? We find that the tenant data in multi-tenant databases usually follows as if distribution. That is, you have a few popular tenants and then a long tail. You might have also heard of this distribution as power law, Pareta distribution or the 80-20 rule. How many of you in here have heard of the 80-20 rule? A good chunk. And then they all actually mean the same thing. They're different ways of representing the underlying phenomenon. In this graph, this graph displays it as a ZIF distribution, which is actually the same phenomenon. And to construct this graph, you start with a representative sample of your database. You then take all tenants in that sample set and count the number of rows where each tenant occurred. You then sort those tenants by the number of rows and plot that graph on the X-axis. For example, the most frequently occurring tenant becomes number one. So if you think of the Google AdWords infrastructure, this is Google's like most popular tenant. And it has in this sample set 100,000 rows associated with it. So it's right here. And then the least popular tenant, tenant 15,000, occurred only once in the sample. And it becomes a data point right here at the very end. So that's the tail end. It's worth pointing out that both the X and Y-axis in this graph have low, low scale. What that means is if you took a sample that was ten times bigger, the values and the values in the X and the Y-axis would change, shift. But this curvature of the graph would stay the same. So it's in the curvature, like that plot in here, the plot would remain the same, the slope would remain the same. And then the 3D behind ZIF or power law distributions helps us in the following way. When you're looking to migrate your single machine database, an important question to ask is, what percentage of the total data size does the largest tenant hold? That is, if your existing database serves 100 tenants and the largest tenant only holds 10 percent of the data, then this approach will help you scale by 10 X. If you assume a standard ZIF distribution, you then come up with the following heuristic for the largest tenant. When you have 10 tenants, your largest tenant roughly holds about 60 percent of the data. When you have 10,000 tenants, this means that your most popular tenant will have about 2 percent of the data and you'll be able to scale to 50 X or where you are today. Of course, these are general guidelines and the best way to tell is by looking at your data. With that, I'll hand it over to Lukas to talk about the application side of the picture. The simplest way is to shard all the tenant ID. There are ways to shard all one dimension and partition all the other. Even men can shard all two dimensions. The very simple and standard way is to shard all the single tenant ID. All right, and we can take further questions at the end. Also, we are almost at the end. So, I'll talk a little bit about how to migrate your application to, in this case, kind of a shared multi-ten database. This is specifically for something like Citus, but this would apply to anything that matches that concept. And essentially, you know, just to kind of go into some details here, essentially, you know, one, two, three scale, right? In the end, we also, we want to scale here. Kind of the first step you need to do usually is to add tenant ID to all your tables. So, really, that's kind of what's missing often when people start out. They have like complex data models, but not every one of the tables actually references a tenant. To give you an example, let's say we have an e-commerce store with line items, right? And so a line item has an ID, a line item might belong to an order, but then a line item also belongs to store. And we actually, in this case, want to shard by store, but we don't have it on the table yet. And so, essentially, what we're doing here is we need to make sure that we add that to every one of our tables, right? So, in this case, the store ID is kind of the, you know, tenant ID that we want to shard by, and that we want to co-locate data by. And so, it's one important step to start with, essentially, add this to every one of your tables if it has a tenant reference, and then also add it to the primary key. So, we're kind of, you know, only enforcing the primary key on a per tenant basis. The second step is to include the tenant ID in all your tables. So, if you look at, you know, a typical query that the application might do today, it might look, you know, for a product, and it looks for the product by a certain ID. And so, if you do that in a model like Citus, what you have is you connect to something called the coordinator node, and the coordinator node then has to go to every one of the nodes and gets the data. Sometimes these queries make sense, right? So, let's say I want to get a count of all my, like, orders, right, or like, how much money did I make last month across all stores. And so, in those cases, this makes sense, and database like Citus supports that. But in some cases where you're more looking for a single tenant and performance, it often makes sense to essentially kind of prune it down and focus on a single tenant or store by essentially adding that store ID to the query. And then, you know, the coordinator node is essentially a router, so it can essentially, you know, know, okay, this is the node that we need to go for. Then some quick notes here, and we'll share the slides later so you can dive into it more. Essentially, if you have an ORM, you, you know, kind of the most important step is include the tenant ID in all the queries, right? This means, you know, select, update, delete, insert is kind of usually already done. You want to include the create, distribute, table call, any migrations, so I'll briefly show that in a quick demo, but that's just, you know, one step you need to do. And then for cross-tenant queries, there's some restrictions with Citus specifically, where, you know, you might have to change some SQL. But typically, we find that people who have a multi-tenant app, they actually have, you know, no problems migrating to Citus within weeks if not days sometimes. And then, you know, make sure to run your test suite against Citus, right? Like, that's really, like, where we see a lot of people, they, you know, just like, make sure you test against the database you want to switch to. And then just a quick note on transactions, kind of work like similar, like a single-load Postgres, so that's really the benefit for, you know, an actual like all-TP transactional workload, where you can actually run transactions and have it work just as on single-load. And DDL, kind of, you can do transactions also across nodes. And then just a quick note here, if anyone of you is using Brooklyn Rails, I'm actually the author of a library for Rails that kind of makes all of this easier. Just as a note, if you are, you should use it. It's called Active Record Multitenant and essentially makes it easier if you're using Brooklyn Rails and Active Record. It makes the work really easy for you, like, just drop it in and it works. And, you know, for your developers or you, this is going to look like this, so you're just adding, like, a multi-tenant annotation to every one of your models. In this case, it would be a customer. Then you kind of say, okay, all these queries, all these operations I do, I do in the context of this customer. And then the queries are kind of modified for you and the tenet that is added. All right, with that, and in the interest of time, I'll do a quick demo. So, I have kind of what we call the formation on Citus Cloud. So, we didn't touch on this initially. I'm part of the team that runs Citus Cloud. Citus Cloud is essentially a hosted offering that, you know, runs distributed Postgres using Citus. And so, you know, don't have to run it yourself. You can essentially have us take care of it, and we take care of, H.A. take care of, you know, enabling you to scale out. I'll just quickly walk you through an example here of how to, you know, work with your data here. So, I already opened this, and let me actually do something real quick. Sorry. Don't do it in production, obviously. I'm just cleaning up the data that I just added. And if you do it in a transaction. All right, so we have no data, and actually, I'm going to make this a bit more bigger. All right, I hope you can see it. So, essentially, we have an essentially empty public schema. And this is, you know, again, we're connecting to the coordinator here. The coordinator behind it has two data nodes, but this could be 20 data nodes, 100 data nodes. It doesn't really matter. Like, what matters to us here is we have a single connection string that we can, you know, connect to make operations on. And again, I'd be happy to share more on this later. Just quickly here, I'll create some tables. And I'll talk about this for a moment here. So, essentially, this works just like regular Postgres, but what you do is you mark things as either being a reference table, which is a table that's the same on every node, or you mark things as distributed table, which is essentially split up across nodes. And so we kind of have, like, our e-commerce example here. We, for distributed tables, we sharded it by the store ID. And this is actually public in GitHub, so I can share the link with you if you come up later after the session. And so I'm doing a little trick here because I don't want to copy over to a conference Wi-Fi. So I actually have a separate schema, which already has some data. So, you know, typically you would do copy here, but I'm doing an answer into select, where I'm essentially just copying the data from another schema. In this case, we actually have, you know, not that much data, intentionally, so it's, you know, quick and easy for this kind of, showing how it works. But obviously you could have millions of records, billions of records, terabytes of data, right? Typically this works best after you have a couple of hundred gigs of data, at least. Before that, you can, you know, use a single node, just fine. And then, so, you know, again, we have these tables. And so the nice thing here now is we can run queries. And so really quick example is a count query, right? And if we explain that count query, we can see that the count query actually goes to all the nodes and all the shards. In this case, we have 32 shards for every table. And you can see one example of this, right? And so essentially it traverses down into that node, and then it just runs regular Postgres on that node. So you get essentially the full power of Postgres, but kind of, you know, distributed and scalable. And I know there's 10 minutes left, so. I'll do it quick. And then here's an example of a kind of multi-tenant query where, you know, we're, in this case, kind of getting which products we have for a certain store. And so here we can see, you know, again, this is all in the coordinator, and like SIDIS kind of takes care of, okay, you know, identifying this as a multi-tenant query, saying, okay, this goes to single node. And we, again, have regular Postgres underneath on that single node. And then let me show you one more thing which I think is interesting, which is transactions, right? So a lot of people, if you look at a system like, you know, MongoDB or, you know, other document stores, they often have an issue where, you know, they're not really useful for relational transactional data. And so I think a big benefit of SIDIS here is that you actually can use transactions. And so let me demonstrate this here. So I'm here looking for certain line items for a certain order. And because, you know, and I want to modify something. So because I know my Postgres scales, I open a transaction, right? Every time you modify something, like just in case, open a transaction, and then we're going to by accident delete this data, right? So, you know, at this point for this transaction, the data is actually deleted, right? So if we run our count again, the count is going to return zero. And so that's kind of, you've got the full, you know, Postgres MAVCC model here. And then a nice thing is, let's say we want to roll back, and then run it again, and hey, our data is still there. And so this might seem simple, right? But like if for any system that stores, you know, data that you care about, transactions are really important. And so that, again, kind of this works on a single, like single node, single tenant basis, right? So there is restrictions if you do the cross nodes, because it's complex to do that lock detection across nodes. But essentially, if you, you know, can always prune it down to single tenant, then you have the full Postgres transaction support here, which is a big, big benefit. All right. So I think I, there's more things I could show you here. What it would suggest is if you're interested in this topic, come up after the session and, you know, we'll either schedule a quick call, or at, you know, at the conference, we'll have you to show more. But I think we should switch to questions. Do you want to come up again? Yep. And so let me just put up a slide. Perfect. Well, thank you for listening. Any questions? So in a situation where your application has multiple releases, and the underlying schema will change, maybe some of your tenants, but not all of them, what are your options in that case? You're going to have to have a totally different database. Yeah, I think, so we'll be... I had a few slides that actually answered that question. That's the second most frequently asked question about this model. I cut them out in the interest of time. In practice, we recommend that our customers use the JSON-JSONB-H4 fields to represent that data. So they add for that different use case, okay, I have a JSONB come to represent this. We recommend that you do that too. There are other ways of doing it. I had an example from the Salesforce architecture. Happy to go over those slides in detail with you. But for the short answer to that question is look to use HStore JSON or JSONB to represent the differences data that varies across tenants in this model. So injecting something between the database and the application. So if JSONB... Are you familiar with JSONB as a data type? So like typically what people would do, they would use JSONB, right? So they would essentially say, if most of my tenants need that field, then you just add it, right? Like you can use typical DDL like alter table at column. But if you have data that really varies across, you know, here is this kind of tenant, here is this kind of tenant, then JSONB makes a lot of sense in general, right? Because you have the fixed kind of keys, you have indexed, but you can be flexible on a pertinent basis. So that, but it wouldn't put anything between your application, right? Like it's just, it's really just when you're querying, you would query for like a field in the JSONB field instead of like a field on a top level. Like JSONB for anything that varies. Pretty much, yeah. And if you aren't using JSONB, they definitely take a look because it is, I mean, there's restrictions, but it's a good data type. Sure. So we didn't touch on it much. So as I mentioned, I'm part of the cloud team. For Citus cloud, we use standard streaming replication, but we essentially orchestrated in a way where, you know, failure work becomes a matter of seconds, essentially, which relies on a lot of like primitives on like, you know, like AWS in this case. For on-premise, we also have an offering. It's a bit different though. You know, essentially one issue of H.A. always is you need to kind of take a look at what are the, you know, what's the storage you work with, what's the, you know, like the objects you use. So for that one, we have on the replication side, the answer to your question is yes. And then the particular implementation depends whether you're on cloud or enterprise. For largest tenants, we have additional functionality with the Citus that enables you to isolate the larger tenants into their own machines or into their own hardware. So that one is the third most, like that was the top requested feature. So we have our isolated tenant feature available built into Citus. Yeah. And we actually released this as of a couple of months ago. So if you look at the blog, we have a blog post on tenant isolation. I think that was one question in the back. It's not true for DDL. Yes. For DDL, we use 2PC and we, you know, essentially try to, like there's still restrictions, I believe, on some of these things, right? I think the MVCCs, like the transactional semantics still hold the machinery with which we push it down to the machines is different. Yeah. So we don't, we use 2PC to make sure that it either happens or doesn't like it gets rolled back. And for that, we use the underlying primitives available in Postgres. And we also allow transactions in that case, right? So like essentially in other cases, we essentially don't allow transactions because there is a case where the application, you update one row, you update a different row, and another transaction does the same thing reverse. And if you do it across nodes, you don't have that luck detection, right? So Postgres, if you don't know a single node, we'll sometimes tell you, hey, there's two transactions they hold locks against each other. I stopped one of them. And that doesn't work on multiple nodes. There's details on what could be done to make that work, but the short answer is, you know, for DDL, yes, for other things, maybe or no. In some applications, we would want, let's say, a traditional update. We're constantly in that location. The fact that you would want to use the multiple different tables to the point they can't all have the same table. Yes. You could use the same table. You have very similar sorts of characteristics that you've created from start over all of the tables. Mm-hmm. Because I think that we have more than one model. Yep. So have you considered, are we looking towards having the cleanup support more than one distribution for a single application so that a single application could then be started on one more premise? So I think the answer to your question has, well, has two answers, actually. One is data-arousing type workloads. Like, we have three use cases of non-data-arousing, like type workloads called just a bit outside of them. Still, for those workloads, say, for real-time, if you have a real-time application, you can sharpen on different dimensions. I mean, you sharpen on different dimensions. One way to answer like the more complex query is by doing reshuffling of the data underneath. So if you distribute one table on entity ID, another one on time, so it requires some reshuffling. Citus has support for that, and we're planning on expanding our support for it in our upcoming releases. For this particular use case, we're looking at multi-tenant applications. We are looking at ways to, I think this goes into the earlier question, hey, can I distribute on tenant ID, but also another identifier? Immediate or unmediate, what that is, we're going to, I already have customers doing this, sharding on tenant ID and partitioning by time, for example. So that will give you more capabilities. We're looking forward to post-test time for that, where you'll like, within the machine, you'll get finalization. For multi-key sharding, we actually have a few customers who do that too, but there is manual effort involved in doing it, so it's not in the immediate world. It's probably, so that's the... So rather than multiple processing... Yes. Yes. Or in broadcast, or whatever. So is that level something that you guys are tracking on? So part of it has support within Citus, it's not an immediate use case that we're tackling right now. So if we, in this demo, if you ran that thing, it would work. That again, there are ways in making it more efficient, and then it's in the like, further out, more now. So we unfortunately are out of time. So thank you. Just come up to us if you want to talk more.