 So welcome to today's PostgreSQL Conference webinar, the NoSQL Store Everyone Ignored. We're joined by Zoheb Satya San, Senior Engineering Manager at DoorDash. We'll do a walkthrough of the history of HStore and how we can now use JSONB support and Postgres and discuss what makes it enticing and comparable to NoSQL stores like MongoDB or CouchDB. My name is Lindsay Hooper. I'm one of the Postgres Conference organizers and I'll be your moderator for this webinar. A little about your speaker. Zoheb is a dreamer hacker, philosopher, troublemaker. I want to know more about that one. An evangelist of open source who is also a long-term Postgres believer. Today, he's a technical lead at DoorDash leading the platform services team and using Postgres pretty heavily. Welcome. So with that, I'm going to hand it off. Take it away. Thank you for the introduction. See, it was better than I expected. So hello everyone. Welcome to the session. This is a fun little talk that I enjoy talking to people a lot about because I think fundamentally this whole feature set for Postgres has been kind of ignored over the time. And we've been looking at shiny tools around. It is definitely something that I believe it's worth looking into and might be useful for your knowledge set. So first, a little bit about me. I think I've been already given a big enough introduction, so I won't have you here too much. Here's my Twitter handle. You can follow me. I tweet tech stuff and the troublemaker stuff that was pointed out. Right now, I'm working at DoorDash. I have to say we use Postgres pretty heavily. It has been a remote tool. I've used, myself, have been a convert from MySQL to Postgres because for the longest time, like I was like a lamp stack guy who was using PSP and MySQL and all that, but it really opened my eyes the moment I kind of started looking into the feature set and the data types and I was like, this is amazing. Like, why are not people using it? And you can kind of see on Google Trends is pretty similar. Like the popularity of Postgres has gone up quite a lot. So regarding this topic, let me give you a little bit of history of how far back I go with this particular topic. So in 2009, Friend Feed published this blog post where how Friend Feed uses MySQL to store stimulus data. I was, again, going back to the context, I was, the MySQL, why I had to read this. Two years down the line when I started investigating into various things, I figured out it was one of the projects, if I remember correctly, where I had to store some stimulus stuff. I discovered Postgres and that along with this whole thing I discovered at Store. I obviously blogged about it. That here's my post from December 2011, where I first time covered the key value store everyone ignored. And this is exactly the hence the title of the talk. I'm kind of continuing on that, you know, marching on this whole topic. In 2012, I kind of did a reimagination of what if Friend Feed did what it was trying to do with MySQL and the limitations and all the stuff. I'll go to the details of this. So I won't spend more time over here. But I kind of reimagined the whole thing in 2012. And by that time, obviously, I had tried Postgres with a store and deployed it in production, tried it out and it worked out pretty well for me. Sort of continuing on that in 2015. So after that point for the longest time, I was disconnected because I didn't I didn't have the same need or the same problem that I wanted to use it for again. But way down in the line when I look back in 2015, somebody did talk with same title in Dublin. I don't want to butcher the last name, which by Stephen. So you can go check it out. It has some good material in there. So for our roadmap today, what I'm going to do is I'm going to give you a brief look at the Friend Feed use case with warming up on its door and the history. Essentially, when I would do that store thing, you would see I have essentially solved the problem that Friend Feed was trying to solve with the amount of code and the complexity that they had to deal with. But then I would take it to a higher level with JSON B, which is the modern incarnation of storing semi-structured or schema-less data in Postgres. And give you some examples of complex queries and how we would partition our documents because Postgres now has built in partition. So basically, I'm going to show you how this is not just a side feature, but it is essentially the core and how you can marry it with other features of Postgres and do some pretty amazing and awesome stuff. So Postgres is always evolving. At least for my 10 years, 12 years that I've been looking at Postgres, the whole storage engine, data wrappers, partitions, different data types, that was the first thing that brought me into Postgres itself. So from robust schema-less types from array, getStore, XML, JSON and JSON B to the core support on saying, okay, here's how you can do partitions. Here's improved data wrappers and storage engines. Postgres has just evolved over time. And that's going to be the story going forward. So two years from now or three years from now, if you're looking at this talk, keep in mind things might have changed because it's always evolving and it's always getting better. So let's do a brief history of ItStore because I want to do this because I think it's important to understand the story and the timeline in this whole universe of improvements that Postgres has brought to the table. So May 2003, first version of ItStore, that it was committed in and that was I think the movement where it was conceived. So if you want to look at the timeline of ItStore, it was first committed or it got in for Postgres Android 3, it was unpublished. 2006, ItStore was part of Postgres 8.2. 2007, generalizing new data indexes and just support Postgres 8.3 improvements in Postgres 9. And in 2013, somebody did some work for Nested ItStore with Erase port. So essentially enabling the document or the schema-free document support. Let's stay tuned for this because I'm going to talk about this later on. This almost became ItStore 2 and how it kind of merged into something just to give it a spoiler. Benefits of ItStore, it provides you a flexible model for storing a semi-structured data in Postgres. It is binary presented, so extremely fast, selecting fields of properties is really quick. It supports the JN and JIST indices. The only drawback that I can point out here is it's just too flat for my taste, I would say. Because it's basically a Q-value story. It doesn't support nesting or tree structures as you can think of in JSON. And JSON was introduced in 2006, three years later after ItStore. So how does it work? What's the data type? Let's kind of start building some context for that. So enough theory. Let's build something serious. Let's take the same friend feed example using SQL to build no SQL. So first I'm going to give you a walkthrough of what was the problem? How was it done when friend feed did it? It's kind of a brief summary. I won't be going pretty deep, but I would kind of show you the code patterns and all that. I would still recommend going to the original blog post here and go read through it. It was a mind-opener for me at least, I would say. Back in the days, it was a mind-opener for me. So why friend feed? Because it's a good example for understanding available technology and problem at hand. And I want to emphasize this, that people kind of cave into buzzwords when they're selecting stores or something new for solving a new kind of problem, which they think it's a brilliant idea. Well, sometimes it is, sometimes it's not. I feel like a good engineering. Take any big company. They don't master on creating new technology right away. They first master on solving the problem with the tools that they really understand because then you can focus on solving the problem. And later on, they can improve tools like, I can throw in some examples like Facebook's PHP and hip hop, VM and all that. So don't cave into buzzwords. That's my essential message. Same is true here. Friend feed didn't cave into the buzzwords and they started using, didn't start using less reliable or new tools, new kid on the blog. Also, I feel like friend feed was large enough at the scale to show you how you can, if you want to scale a problem with solving the NoSQL inside SQL, how you can do that actually. So again, using the tools you're comfortable with, read the blog post. Let's quickly move on. As part of that blog post, you won't, I pasted this image here from Wayback Machine because this image is broken there, but essentially what they claimed was by moving over to the steam of this mechanism and their new architecture, their latencies. You can see on the top graph where he says one week ago, the P99 latencies were down way below well contained and pretty stable. This is obviously one of the things that developers are always looking for when we're developing these large scale systems. We don't just have to focus on the means, but at the large scales we have to focus on the P99s or the edge cases where the worst cases. So with that in mind, let me quickly show you some schema for how friend feed, what a friend feed document. So every time I'm going to show you stuff in JSON format, you can imagine it in whatever format you want, but I would recommend you sticking to JSON. But I'll show you what a document looks like. A document is one object that you can think of that you're dealing with a single entity. So here is a post object for friend feed. Let me quickly walk through this. You have fields like ID, user ID, title, link, published and updated. And these are published, updated are like timestamps. So they are like units, timestamps, numbers. The rest of them are strengths, the key value stores that are strengths. What friend feed did was they created a table for entities, for these entities, where they have the similar schema except they have this blob in the middle. So for entities when they're storing it, they're storing this blob in the middle. This is where the whole JSON is going to be dumped as is into the body. So later on when you're fetching it by ID, the bodies are going to contain the whole JSON as is. And that can be shipped down the wire and the friend can pick it up and render the feed. So now the question becomes, how can we index? So in the same document, I'm just taking one field as an example here. In the same document, you can see there's a user ID field here that I want to index. How can you do that? So friend feed created a whole framework around it in order to make sure they can index these kind of fields. So here's how they did the indexing. So they said, okay, so for indexing, create table for each field. So essentially what this essentially means that for indexing user ID, they're creating an index table for user ID, where they have a user ID as, well, however you might like, a binary key and the entity ID pair in there. And I have the background workers populate the newly created index. And then you can also do stuff like, hey, I can do a background right into this table as I'm inserting rows. So essentially what this means is whenever you're inserting a document, there's some additional work that you have to do in order to insert into this table to keep it consistent with the view of the original entity table. So, but I've written the query below. The same example table that they posted in the blog post index user ID, user ID is the binary entity ID is the reference to the entity ID back into the original table. Primary key is the pair of user ID and the entity ID. And essentially whenever you want to say, hey, give me all the rows with user ID X, you just look up this table, you get a bunch of entity IDs and you can do whatever you want from that point onwards. So obviously, in order to do that frictionlessly in code, they have to put in a lot of code to make sure whenever you're inserting a row into the table, then you're inserting a row into this table. And when you're ever, you're doing a lookup, you do appropriate reverse lookups. So here's an example of piece of code that I literally picked up from the poster game on them describing how they did that. So you can see they had the whole client side there has the sense or the knowledge of, hey, these are the shards. This is more complex in terms of configuration on the top because it's doing sharding as well, but let's even ignore the sharding part. Right at the bottom, you can see whenever they're doing a lookup, the whole lookup mechanism is subtracted away into this language. In their case, it was Python, so it's subtracted away into this Python framework that they created. But essentially, the whole responsibility or the load is handed off to the creator of the table and the index to kind of take care of it and make sure you're doing it on your side. So at a key value store, everyone ignore, this is where I'm going to start. Raise of hands, how many people or do a plus one or a lay of hands, how many people have actually used it store in past? Nice. I see some raised hands here. Awesome. For those who have not done it, go check it out. I'm going to be touching it a little bit on syntax, but I won't go into much details. So again, you can see it goes back 15 years, 15 years back. This is even before MongoDB, I think it was 2009 or 2006, I don't know. I might be messing updates where they actually first got in, but you can kind of see how far back this goes. This is almost in the age of axiomals and soaps and we're transitioning into that. So let's solve the same problem for, in fact, before solving the same problem, let's just create the same feed. And let's look at how it's going to look from the syntax perspective. So here's a simple create table feed with ID, where care and document it store. That's it. For some versions of Postgres, the older versions, you would have to execute the create extension in store command, but that should be more than enough once it's configured and it's done. And that should be it for having a table with ID and the document, the schema free document that you're looking for. So how do I insert or into this row? Pretty simple, insert into feed values, just like anything else that you would do, the random ID and the one at the bottom ID equal arrow to the values or ability and post equal arrow to hello as at store. And that's it. Postgres is actually going to store this as an at store value in there. So some of you might think, oh, so what's the point? Like I stored it, I can fetch it back. What's the point? Well, the point is you can do complex queries like this now. So what I'm doing here is I'm saying select doc and out of that doc, select the post field as post. And as an example, I selected something undefined as well, undefined field. So just to show you it would be null from feed where doc ID equals so ability just to kind of get your feet wet. So essentially, I've already selected few fields from the document store itself and applied a filter on it. And this should be the moment of aha for anyone who is not used this before because you're traversing inside the column into the fields of this document. So however, if I do an explain on it, of course, it would be a sequential scan because there are no indices on it. And just like any good citizen of Postgres, you can create an index on it. So this is this is where I think once you do this, the whole problem, the whole framework that from feed built and people that has been solved, because essentially, you have an index on the field and you can store those documents in there without creating any troubles without creating any triggers or you know, background workers to fill do the backfills. This whole thing has been done for you out of the box. And of course, you can also say create a synchronous or create index index synchronously to create it in background. Well, if you discover, oh, there's a new field that I need to index. Think about it. If post, if friend feed folks were to do it in my SQL, they would have to create a whole schema. They would run whole, you know, background strips to backfill and everything. And just with this one statement, you've essentially removed all the engineering overhead and all the maintenance overhead and have a really clean index. Now you can do queries on it and you can actually run the explain and see who tell you like it is doing the index scan. But as I said, it doesn't end here. The whole system is the whole ecosystem is at your disposal. So over here, I'm creating a generalized search index over the whole document. So with gist, I can just say, okay, create an index on it using gist on the dot column. And now what you're going to do is something really amazing. You can actually, well, actually, you can do it even without this. But then again, they will be linear scans, but I'm kind of showing you the optimal way of doing it. You can do document matching, the key value field matching. So essentially what I'm saying here is select all the posts and the same undefined field that should be null from feed where document contains another document or the document matches this specification, which is another document. And over here, I'm saying ID is so ability. So essentially all the documents that contain ID so ability, they're going to be matched against it. And the result set is going to be returned back to you. So let's reimagine friendly, their whole thing, all that they did. Same entities table. I've just moved over to Postgres data types, updated timestamp and so on and so forth. The body goes to a store. The user index that they used to create for mapping everything into a table, that becomes a simple background, create an index concurrently on entity ID on this. And voila, you're done. You can essentially, you can essentially look it up by the, well, in this case, entity ID, but you can do the same thing with user ID. And there are more operators here. Like what I've showed you is just the tip of the iceberg. Here is just a snapshot of the operators. Of course, you can do selection with the arrow operator. You can get the, you can select multiple fields in single arrow, right? You can concatenate those to its store documents. This is just like a JSON merge if you've done it. The new fields would be added and the previous one would be replaced from the document on the right. And this is extremely useful if you're doing some kind of updates. So essentially, you can take an existing document, add something new to it and update the document. The contains key and two or third down at arrow is, does the left operand contains right? The one that we used with our just index and vice versa. Like you can remove fields, you can, you know, convert it to array, delete operands, all that. And again, go to the link. You will love it. It lets you almost manipulate all of the document with these basic operators. But this is where we don't want to stop, you know. The next big thing that came in, well, essentially the next big thing that came in initially was JSON did attack itself. But it had its own caveats, including it was not parsed, it was not stored really optimized in the storage level. So every time you were selecting something out of a JSON document, it would parse the whole document. I think I have links to the slide where there were PRs and discussions around supporting nesting of documents inside a store. We almost had an ItStore 2 at that point in time because people really badly wanted ItStore 2, sorry, nested documents and ItStore was just a flat key value store. At that point in time, people thought about, hey, JSON is great. It's really well understood, standard people understand it. Why can't we combine the storage optimizations of ItStore with JSON? And that's where JSON B for real deal came in because it was binary serialized and all optimized. So again, coming back from the same story, why JSON? Well understood, go to standard for almost everything on web. It's self-describing and hierarchical form and serialization libraries for every programming languages are out there. You can see ItStore would require each language or driver to support serialization system. Let's you describe new shape of object which might be necessary in some cases. And though some cases, I can throw out some examples, but limit is your imagination like analytics through the workloads where, for example, the input structure that's coming in, it might have some fields, it might be missing some fields. So you don't need to have that sparse table where so many things are known. Similarly, if you're, for example, aggregating things from multiple places where JSON is coming in and it's totally different for, say, coming from Google and totally different, say, coming from Facebook. So you don't have to bind your table to these schemas. And what I'm going to do is, in order to kind of demonstrate how nesting can be useful or how powerful it can be, I'm going to take example of tweets and hopefully everybody understands Twitter here. So if you look at a Twitter JSON document, it looks something like this. It has an ID, it has some text, then it has a user object. And inside user object, it has your user information. Then there are nested object called entities, which has your tags and all that stuff. We'll do some operations on that in some time. And then it has these other timestamps and retweeted, truncated, so on and so forth. So taking the same structure for feed, you can have a similar structure here. And by the way, this is an extreme example, but still, for sake of simplicity, I'm keeping it on two columns. You can have more columns if you want. The fields that you think are always part of structure, you can definitely have them just like I did it with ID over here. So I created a tweet table with contents and JSONB column. Now I can insert in a similar fashion, where I can say ID is here, and the whole string of JSON dumped as is from anything that's realized. I'm not going to put an example here, but you can always imagine just doing an insert statement, a basic insert statement that inserts those documents. So now how do I select fields out of it? It looks extremely similar to what you saw in its store. Content, the same arrow operator, text, STXD, and then I selected favorite count, SCNT. So in this example, I'm just picking text and favorite count. From tweets where content ID, STR, ID, STR is ID in string format, and I'm just putting it in the ID. And yes, you can index this as well. So essentially, you can create an index similar to you did it for its store. And that should be good enough, right? So you should be able to do almost whatever you want from the structure of the document. So picking into structure, if you want to look at the values, there are multiple operators here. And I'm going to go with assumption that you can go to the docs and read about these operators. A lot of these operators look very similar. In fact, they're exactly similar to what you saw in its store. So I'm just building on the assumption of your knowledge from its store. Over here, I'm paying our favorite count. I'm selecting all tweets where content favorite count is greater than equal to one. So again, like I'm picking another field and just saying, treat it as an integer, and if the count is better than one, select everything over there. But yeah, if I'm going to explain this without the indexes, you can always imagine it's a sequential scan, which is bad. You should not do that in production. But again, I can create an index on this. So here is an index. It's a very stupid example that I did of an integer or favorite count, but you can almost imagine anything going here. Here's an index. Again, I created an index on saying, hey, content favorite count as an integer, trade an index on this. Now, if I select it, you can see it's a bitmap scan. It's optimized as you expect it's going to be done from postgres. So now you can imagine the JSON for friend feed being stored in here and you're doing queries and everything in this one basic table. And all you need is indices. And now you're actually JSON compliant as well. But let's do some more JSON widgets, I would say. So I remember I told you inside entities, they're hashtags. So what I want to do is I want to build something that can do a lookup on hashtags. So hashtags is usually an array of objects that has a text to the value. It also has some additional information like indices that indicates from what index to start of what index to end of what index, this tag has been applied. So Python, hash, Python, hashtag, Python, it would automatically put that information here. But I'm not interested in that. That's the beauty of this thing. So what I'm writing now is a little bit more advanced query in terms of what I can select or what I can pull out of this whole document. So now what I'm doing is select content hash. So this particular operator hash, it's going to let me select multiple fields or select paths. I can walk paths with this operator. And I'll show you in an example next on the table next, what does it mean? So essentially what I'm saying is from content, go walk up to text and select that as a text. From tweets, where content, inside content, go to entities and then hashtags. So it's essentially saying like if you were thinking it in JS or JSON, document.entities.hashtags. So we're selecting all the from the array, all the documents or all the JSON objects of hashtags. And then I'm matching using the same ad operator that you saw before. That's a beauty of it, like the same reuse of the operator. Match all the documents or match all the array documents that have text, Python in this. And this is going to return me back all the tweets that have a tag of Python in there. So as for JSON operators, there are a lot of operators. We have already seen the error operator, the error with double tip on the top that's get element as text. The fancy operator that I showed you, the hash and the angle bracket, that's the one where I was saying get JSON object in the specified path. If you just do the double angle bracket, it's going to return you back the object as specified as text. And I won't recommend using it, but you should be sure about in this path if it's text or something. Similarly, exact same operator for matching ad. If the document on the left is the sub document on the right, the other way around, the field question mark operator for field containing. Similarly, the pipe operators for concatenating negative so on and so forth. Like this is all documented on the main documentation page of JSON media type. So you can kind of tinker with it. Really set of interesting set of operators. And you're going to be able to do quite a lot of stuff just with these operators. So let's do, we already did selecting text with matching text, but I didn't call it out before, but looking at the same example, let's repeat the example just one more time. With the hash angle bracket, I'm saying walk inside content, go to entities.hashtags. It's going to return me an array of documents. And then I'm saying if it's a subset of this array document of text to Python, then it's a match. And you're selecting everything. But you can imagine since I don't have any index on this, this is going to be slow. Right. It's going to be a sequential scan. So what if I want to create an index? The same greatness surrounds the same greatness that you had in net store surrounds JSONB. Over here I'm creating that index. This time I'm creating JIN index because I want an word index in this case. And with JSONB ops, that's the only change that you have to put in there for JSONB operators. So now, after creating the index, I can run more complex queries. So running the older query is going to be linear or index scan. But I can do more. I can say, oh, you know what, I just observed that a tweet can be a retweet. And the hash tags and the parent tweet doesn't have those tags. So I do want to look at the parent, sorry, the retweet itself. And inside there are hash tags. I want to surface them as well. Guess what? You don't have to do anything complex at all. Here is an example that I put right on the top. I create an index. I'm saying inside content, look at the retweeted status. And inside the retweeted status you would see entities and then hash tags and index that as well. So now at the bottom, the query at the bottom is combining both of them. So what I'm saying is select content from text, the select content text from tweets. I do look up on the entities hash tags of the document. And then I, or with content that's retweet status entities, hash tags status. So essentially retweeted status dot entities dot hash tags, attacks also contains the same text. Well, I don't know. I think it's a typo poster here. But you can imagine right there. So essentially I'm now selecting everything from a nested document, which is the same copy of the parent document. And that's not it. There's more to it. That's the beauty of this. So all that we have done so far is just, you know, run of the mill features that were provided along with these new operators. But now the beautiful part of this thing is now you can combine JSONB with the whole ecosystem of the Postgres. So you can think of all the features. And that's what I'm going to do. This is think of this as kind of showing off to other folks on saying, hey, yeah, you can do this as well. So here is an example. JSONB and full text searching. So for example, I want to do a full text search on the text content itself. So I'm trying to build kind of I'm kind of digging into if I'm building a search engine on top of these tweets. This is where I'm heading. So here on the top you can see I created a gin index on the text vectors in English. So the first parameter is English. I'm assuming the tweets are English. Of course they're not. But let's for sake of simplicity, let's go with this. So pick up the content text and that becomes that gets converted into the vectors. And then I am doing an inverted index on top of it. And there you go. I have a full text search engine. Now I can say select content text where I'm matching ps2 vector of for English text content to query SDR in English and I'm saying Python. So all of search all of the text that in tweets that contains Python. And I think Python, if you think about Python, it's more basic, but you can think of more free form text query searches that you can do on top of this. And you have to remember all of this data is sitting inside a column that has been serialized and saved on this for you in a structure feed format. So all the all the documents that don't contain the text field would automatically be more because they're not. Is that it? I don't think so. Let's do one more good thing here. So I can use this with partitioning feature of Postgres. So it's a new latest and the greatest stuff from Postgres. So what I've done is I've created a table right on the top. I've created a table part tweets just for partition tweets content. And then I'm telling it partitioned by hash of MD5 of content of user ID. So I took a random, well, I just picked content user ID. So inside, remember inside our tweet, we have the user object. So what I'm trying here to do here is I'm trying to partition my documents by user ID. So for a same user, all the documents of that user lie on the same partition. It could be anything else for you. Like you can even partition it by ID or something else, whatever you might think of, even from within the document, doesn't matter. This is just an example. And then you can see at the bottom, I have a create table part tweets 0, 1, 2, 3. And each one of them is a partition of part tweets. The important thing to remember is the hash and the MD5. And I think anybody who has done missing charting knows how to do partition with consistent hashing over the tables. So essentially what I'm doing is by doing an MD5, I get a number and I do a hash on top. By doing MD5 of the user ID, I'm essentially rotating between one of these partitions when the remainder is either 0, 1, 2, or 3. So if your ID is, say, 10, I do a modern list with 4, so the remainder is 2. So you're bound to fall on part tweets to partition. And then I can repeat, rinse and repeat. Now creating same index on the part tweets. Now here's the beautiful part. The tables are partitioned and now I'm creating an index on the top level table. The same indices that I created and I insert into parent table. I'm not looking into the partition tables itself. And with these indices, here's a beautiful part. Now I do a query on part tweets. And you can see when I did the query on tweets for saying, hey, pull out everything that has Postgres tags on it. You can see it automatically distributed, Postgres automatically distributed in among the tables, doing bitmap scan and then pending the results together. And doing it on even more complex queries, you can see it still does a pretty optimized version of bitmap or and then bitmap indexing on each one of them for each of the lookups that you did and the whole thing has been solved for you. This is all without writing any client set code. This is all done for you just because the ecosystem of JSONB and partition indexing is working hand in hand. So the limit is your imagination. Don't underestimate the power of your tools, something that you understand, Postgres and the schema-free storage. I know there are still tendency to say, oh, yeah, this new tool is it. But if you just look at the feature sets that have been shipped in Postgres, I think they're pretty awesome. For links and resources, I am touching a bunch of links here, including my blog posts and some really important documentation slides and presentations around the topic. Thank you for your time. I'd like to open up the room for questions. Got to say that was a pretty fun presentation. So far, we only have one question, so go ahead and get those in now. So the first question came in only a few moments ago and it was why the MD5 part for the partition key? Oh, okay. So it's totally up to you. It's not required as such. What I would say is if your user ID generation is not sequential, then you won't need to probably worry about it. But what MD5 does is I took a very basic hash function that doesn't even distribution. So passing it through the MD5 function is going to make your documents spread more evenly across the partitions that you have. You don't necessarily need to do it because if the user ID is already an integer, yes, the mod would work. But you can imagine when it's sequential, then first partitions, there's going to be a skew of partitioning. MD5 makes it kind of more even. And it's totally up to you. It's totally dependent upon the ID generation mechanism. Wonderful. Sort of a follow up to that. Does partition by hash expect a value from a hash function such as MD5? Because that would explain it. Yeah. So essentially what hash function, when I was doing, if you remember at the bottom, I was doing a modulo with four, right? So imagine, okay, let me make it even more simple. Imagine if your field is not integer or a number. Imagine it's a string, right? So user ID, imagine if it's a UUID. And I think in case of Twitter, it is a number. But you can imagine if it's a UUID that you're generating. So yeah. Then the modulo partitioning mechanism where I'm specifying the criteria of what goes in this table actually requires that to be something that is ingestible. So it's totally dependent upon scenario. Don't take that seriously. Thank you. How is benchmark comparison on performance of Postgres JSON with Mongo excluding sharding options? Yeah. I've not seen official third party benchmarks, but one of the slices is one of the slides that I've included in the links. It has some benchmarks with its store and its store did beat it. So it was faster, but this still needs to be verified from third party. The problem with, I think what everybody forgets when they're comparing MongoDB and Postgres is like solid asset store. It's a full transactional asset store. It's not like any random dump it out. I know Mongo has made improvements, but I would recommend if you ever pick Mongo, first go look at the JSON benchmarks if that rings a bell. But even with that, the benchmarks are, I don't think they're comparable to be honest from asset perspective and all transactional perspectives, but even with all of that, the individual benchmarks show a higher score, but I won't take them seriously. If it's good enough and if its scales are not good enough, I would take that any day. Fantastic. You've gotten a bunch of thank yous. How is opinion on OLTP and OLab? Can we grow with the same Postgres with 25 plus terabytes? Yeah. I know there are various, okay. So when people usually are talking about these kind of workloads, I know there are flavors of Postgres that are optimized for like OLTP and analytics workflows and all that. I've personally seen Postgres work at a pretty, of course I cannot disclose the numbers, but a pretty large scale, more than what I thought Postgres can do. So yes, I think the whole thing that you have to remember is the foundational pieces remain exactly the same when you're looking up the columns and when you're doing these lookups, it's as good as when you had an index on a column and you were doing a lookup. So the foundational pieces remain the same. So if something was working on that particular index, it should work here as well. It is no different than a regular index that you've seen. Okay. So with that, we can actually call it. Thank you so, so, so much. This was a wonderful presentation. I have to say, I loved the font. We had a lot of fun with that. And for all of our attendees, thank you so much for joining us. Thanks for spending a little bit of your day with us. And I hope to see you on future Postgres conference webinars. Have a great rest of your day, whether it's morning or evening. Thank you. See ya.