 Welcome to working with JSON data in PostgreSQL versus MongoDB. I'm joined by Darshan Range Gouda, founder at ScaleGrid.io, who will discuss both the functional and non-functional aspects, such as scale and performance of handling your JSON data. My name is Lindsay Hooper. I'm one of the PostgreSQL conference organizers, and I'll be your moderator for this webinar. Let me tell you a little bit about your speaker. Darshan specializes in V1 products and services, as well as forming and growing development teams within startups. He has in-depth technical knowledge and expertise in the area of enterprise software, virtualization, and cloud services. Prior to founding ScaleGrid, Darshan spent nine years at Microsoft. That's it for me. I'm going to hand it off to Darshan. Take it away. Thanks, Lindsay. Thank you for the kind introduction. I'm Darshan. I'm the founder of ScaleGrid.io. Essentially, what we do at ScaleGrid is we provide fully managed services, and we support various databases, Postgres, MySQL, Redis, MongoDB. We have worked a lot with both Postgres, as well as MongoDB. For storing JSON data, we have learned a few things along the way. This presentation is an attempt to share some of our learnings from that. Without further delay, let's jump in. Let's start with the basics. First question is, what is JSON? JSON stands for JavaScript object notation. It's essentially an object serialization format that was popularized in JavaScript. It's an open standard. It's documented in RFC 7.1.5.9. I would say today it's the most popular format to serialize and store objects. As you can see in the graph, it is going up and to the right. The next question we usually get asked is, hey, Postgres is a relational database. Let's say, why does Postgres need to care about JSON? In my opinion, there are two, three good reasons. For Postgres to handle JSON. The first one that we usually encounter is the need for schema flexibility. There are a lot of business cases where there are a lot of transient columns. Columns are changing on a frequent basis. They are getting added and they're getting removed. If you go with the traditional relational model and if you have a large dataset, and the DML operations can get quite hairy. In these sort of scenarios where you have a lot of transient columns, this could be in the analytics scenario or tagging scenarios, a lot of event tracking scenarios. The schema flexibility that is offered by JSON is very important. The second scenario where I see JSON very useful is when you're dealing with a lot of nested objects. Sometimes it might be easier to store these nested objects as JSON and query them in JSON rather than trying to normalize them and store them in different tables. Third case that we typically see is when you're dealing with other external systems that are sharing data in JSON. For example, we have a lot of customers who store striped data in their database and the common format for striped data is JSON. That might be another reason why you need to deal with JSON. JSON support in Postgres has gotten progressively better over time. Let's get a high level overview of the timeline and then we can jump into each release and I'll go into the details. Initial support was in, as you can see, was 9.2 for the JSON data type, not to be confused with JSONB data type. We'll go into those details. Next release was a 9.4 release which added support for JSONB. The next significant release, I would say, is the PG-12 release which added support for SQL JSON and JSON path. SQL JSON is essentially the SQL standard supporting JSON as a first-class feature and I'll go into that as well. Comparing this with MongoDB, initial releases of MongoDB were in early 2012. And the next major sort of significant change in the storage was, we know, and where Tiger was acquired that was in 2016. So the first release for JSON in Postgres was in 9.2. Basically, Postgres added support for the JSON data type. And the JSON data type essentially is basically think of it as a string field with some JSON validation added. And it's actually very simple and I would say it's an initial release and most people today who are working with JSON in Postgres will be working mostly with JSONB and not JSON. And for example, JSON does not have any index support or anything like that. And the common data type that you're going to use in Postgres is the JSONB data type that was released in 9.4. JSONB stands for JSON binary or JSONB better depending on whom you ask. So it's essentially a binary data structure. It is used to store JSON data and I will go into the details of the data structure. And it has great support for various indexes that you can as well. And in PG12, like I pointed out earlier, Postgres added support for the SQL JSON standard. And one of the important parts of that release is supporting JSON path, which is an expression syntax for querying JSON. It's a very powerful, intuitive engine for querying JSON. There have been many improvements to JSON path in PG13 as well. And there are a lot of interesting sort of improvements planned for JSON support. I have a link to the JSON roadmap. It's put together for the Postgres pro guys who implemented JSON support. And like I pointed out earlier, if you are working with JSON and Postgres, you are mostly going to be using JSONB. And however, there are a few cases where the JSON data type might also be useful. For example, the JSON data type will preserve the white space in the original JSON. It will also preserve the ordering of the keys. It will preserve duplicate keys. And it's also a little faster to ingest than JSONB. And so if you have scenarios where you're just sort of ingesting JSON data, just validating it, but not really processing it, then JSON might make sense for you. But in most cases, JSONB is what you should be using. And when I refer to JSON support in Postgres, I'm typically referring to JSONB. Before we dive into the details, there are some broad patterns and anti-patterns for using JSONB in Postgres. And I'm just going to briefly touch on it before we get into all the details. So one question we frequently get asked is, hey, now that we have JSONB, do we even need the columns anymore? Why not just put everything into a JSONB column? You have an ID and a JSON primary key, and then put all your data in JSONB and not use column anymore. So essentially what I would say is columns are still the most efficient way to store and query data in JSON. There are some high-level limitations in JSONB, and I will go into the details of those limitations. And the design pattern that we typically suggest to our customers is, you know, if a column is appearing in all your JSON data, if a field is appearing in all of your JSON data, it's probably better off as a column. So the more stable fields in your dataset, you want to still use them as columns and use JSONB as a catch-all for variable temporary or intermittent columns. I think that gives you the best of both worlds. You can handle the intermittent data with JSON, and you can use our traditional column structure for all fields that are stable. Jump in a little bit about into some of the high-level drawbacks that I talked about. So there are a couple of drawbacks at a high level. There are some storage issues, and there are some statistics collection issues that you need to be aware of before you jump into it, and there are workarounds for both of them. To get into the statistics collection, Postgres typically collects stats from the data distribution for all of your columns, and essentially, you know, this helps the query planner in that, you know, it can keep track of, you know, what is the, what are the most common values in your dataset, you know, what fraction of your values are now, and this is really useful when you have ordered data like, you know, integers or strings. It can also keep track, it also keeps a histogram of the distribution of your values. So the query planner can make some really intelligent decisions around, you know, how to, you know, work with the table. In JSONB currently today, it doesn't, the query, you know, Postgres doesn't collect any statistics about JSONB columns. So it's the query planner is essentially running blind, and so it could make some wrong choices in some cases. So the more data, you know, I've included a link to a blog post from the heap.io guys who ran into one such case and they have blogged about it in detail. The next, you know, topic I want to talk about is, you know, is how JSONB stores the keys. Currently, you know, the way JSONB is implemented, it could result in some storage block. And one of the reasons are the keys are also stored along with the data. So again, this completely depends on your, you know, data set. And ideally, you know, in the long run, what would be good is to, you know, extract the keys to like a table level, hash table or data structure and basically refer to those keys instead of actually storing the keys in the data itself. Now, we have seen some customers have been able to reduce their data footprint by sort of using smaller key names instead of, you know, very descriptive key names. And, you know, Postgres relies on toast for out of line storage and compression and I will jump into that as well. And we did some very simple, you know, basic tests around the compression of data. And again, these are very simple, very subjective tests. So, you know, you should, you know, test this with your data to validate, you know, in some of the sample, Jason, I'm using with about 10, 11 gigs of data, you know, with Postgres, we were able to compress it to about 8.7 gigs. And if you compare that with, you know, MongoDB snappy compressed it to about, you know, eight gigs. And, you know, Zlib compressed it to about, you know, 5.3 gigs. So, you know, it's decent compression, but it's not great. That's something, you know, you want to keep in mind, again, this is very subjective. You know, it depends a lot on how many keys you have, you know, what the length of your keys is. It also depends on, you know, how your data gets compressed by toast as well. So, talking about toast, and essentially toast, just toast stands for oversize attribute storage technique. You know, it's probably the coolest acronym that I've come across. And essentially, you know, it is a way for Postgres to store large data types. And, you know, if you, simplistically, essentially, you know, the way it works is, you know, one year data size is, column data size is greater than the tuple threshold, which is default 2KB. Postgres will attempt to compress the data and fit it in the 2KB. You know, if it doesn't work, you know, it is basically moved into out-of-line stores. This process is called toasting. And by default, toasting also includes compression. So, essentially, your data is compressed, stored out of line. And when the data is accessed, essentially, it will be de-toasted, you know, and accessed, decompressed, and then used in Postgres. And as a user, you can control these, you know, strategies. Postgres provides strategies where you can set the storage strategy per column. The default is extended, which allows for out-of-line storage and compression. But in some cases, you know, you might notice some CPU overhead, or maybe you don't need the compression. So, you can also turn off the compression. You can set the storage strategy to external. This will essentially give you out-of-line storage, but there is no compression. So, you know, if you expect your JSON data set to vary, you know, especially as you go about 2KB, so, you know, I would recommend you to run some tests around this to ensure that the toasting, de-toasting process, you know, does not add a lot of overhead to your data access. A little bit about the data structures. So, I've pulled in some, you know, sort of high-level, you know, sort of block diagrams about how JSONB data is stored. It's a fairly simple data structure. And, you know, as you would expect, you know, it's a tree structure used to store JSON. And at the top of the data, at the top of the structure, you know, it stores the size, and then there is a hierarchy of nodes. And each node essentially has, you know, a header. And there is a variable section which stores the keys and the values. And if you contrast that with recent news by MongoDB, it's, again, very similar. There is a size, there's a hierarchy of nodes, and each node has got a header in the value. So it's strikingly similar in how it's implemented. And JSONB, you know, driving into the operators, you know, JSON is a postgres as part of JSON support, supports, you know, a number of operators, a number of functions. And I'm not going to go through the entire list of operators at this point. And what I've done is I'm just going to, like, talk about the high-level families. So there are the member operators where you can access, you know, members of JSON. And there is the contains operators. And this is the art greater than operator that you have. You know, you can easily use it to check, you know, if a JSONB, if a particular JSONB structure is presenting another JSONB structure. And third family I want to talk about is the existence operators. This is a question mark. And then there is the and and the or versions of it where you can check for the existence of keys in JSON. And the last family is the at-at operators, which are basically the JSON path operators. And what I'll do is I'll walk through some of the, you know, examples with these operators in the indexing section. And like I alluded to earlier, Postgres also provides, you know, a wide variety of functions to work with JSON. And they're broadly divided into creation functions and processing functions. And I'm not going to go through all the functions here, but you know, I've included a link to the documentation so you can go in and take a look at that. Comparing this with MongoDB. MongoDB query language, you know, is essentially based on the JSON syntax. So in some ways it's, you know, it works a little bit better with JSON, for example. You know, if you want to find, you know, like, let's say I have a sample books database, if you want to find this, all the books in the database, this would be cool and to select star, you know, you can say db.books to find and passing like an empty object, it will find all the books in that collection. Similarly, you can also specify, you know, filters. So you can, I can say db.books to find and you can essentially say publisher and you can specify a particular publisher. And you can do array operations as well. For example, in this case, you now have an array called tags. And I'm trying to find all books that have these red or blank tags. So you can pass that in. And you can throw in, you know, and an or operators as well. They have their own special sort of dollar syntax to specify all these operators, you know, could be dollar or dollar and dollar less than dollar greater than. So you can specify, you know, fairly complex conditions in that query language. And of course, you can query nested documents. In this case, you know, there is a document called size and you can query one of the nested fields inside that particular document as well. And let's say you have an array of objects or a nested array of objects. And this query language, you know, handles it fairly easily. For example, you know, I can query all the objects stored in an array and you can query for a particular property of that object as well. And this sort of stuff is a little hard to do in Postgres without having to unwind the array. This is PG 12. And after PG 12, you have some options with JSON path. And the last one I want to talk about is, you know, how you project to return only certain fields from a particular query. And again, this is slightly difficult to do in Postgres prior to the JSON path release. But in, you know, here in the Mongo query language is very simple, you know, as part of the query, you can also specify, you know, what fields in the JSON you want to get project. And next step, I want to jump into some of the, you know, indexes. And this is my favorite topic in Postgres. So we can talk about the various indexes in JSONB and what are the options they are, and we can compare and contrast this with MongoDB as well. And at this point, I want to check with Lindsay if there are any questions that we want to take, or do we want to like proceed ahead? Yeah, I just so far have one question. And it's what type of access speed penalty do we have from compression of JSON data? I think that's a, you know, great question. And ultimately, the access speed penalty also depends a lot on your, let me skip back on your data set. You know, first of all, you know, you have to make sure that it's actually compressed and toasted. If your data set is small, you know, there might not be any toasting happening at all. So it might be stored in line. So you have to first, you know, it depends on the data set, you have to see, you know, whether it's getting toasted. And typically, we see CPU overhead when your data gets toasted. And so my, our hypothesis is that, you know, as your data gets larger, the toast overhead will get, you know, will get more. But you know, we ran some quick tests and, you know, the tests have not backed up their hypotheses. The overhead is sort of remaining the same, you know, as your data grows from, you know, two KB, four KB, five KB, 10 KB. So we were running some basic tests. So we're going to dig into that deeper. And we'll probably come out of the blog post comparing the overhead as your JSON data gets larger. And at a high level, right, toast provides compression using PGLZ. And this, it's kind of like a generic compression across all data types is nothing specific to JSON. And there are some, you know, if you, if you, if you go to the Postgres pro guys who worked on JSON, they have a few extensions that can do a better job of compressing JSON. For example, they have an extension called Zson that can, you know, essentially build a dictionary of your keys and help you compress your JSON even more. So there's a lot of work happening in this site. So let's jump back into it. So at high level, you know, we're going to dig into three types of indexes for JSONB, the GIN indexes, the Btree indexes and the hash indexes. In MongoDB, typically when you, when you see a customer setup, it's mostly going to be Btree indexes. There are, they do have text indexes in MongoDB, but they're not very widely used. They've gotten better in the newer releases. So you, we see more of that. But in most cases, it is, you know, Btree indexes in MongoDB. And it's mostly what you would call expression indexes in Postgres. So let's dig into the Postgres side of things first. So GIN stands for a generalized inverted index. So essentially it is kind of like a inverted index. I, you know, I won't go into the details of what an inverted index is and how it works. So essentially GIN, you know, supports two operator classes for JSONB. There is the default one called the JSONB ops. And there is an extra variant called the JSONB path ops. And I will deal with, I'll explain both of them. And JSONB is the more general one. You know, it will index each key and each value in the JSON. So it, it supports the largest number of operators. And JSONB path ops, you know, only supports a couple of different operators and it indexes only the values, but not the keys. So let's look at a, you know, before we jump into the examples. So I just put together some sample data. So essentially think of this as kind of like a, you know, a book database, you know, with the ID and author and ISBN and some extra JSON data. It's a very simplistic database. So if there are any book database, if there are any librarians in the audience, and I'm going to apologize in advance for the simplicity of the model. And, and if you look at the JSON, again, it's, you know, it's fairly simple JSON. And it's quite random and made up at this point. But there are multi levels, there are tags, there are multi-level tags, there are arrays of objects called friends. There are some Boolean properties, there are some array properties called keywords. This is just so that we can exercise in all the different functionality as well. So let's start with something really simple. You know, let's say you want to find out, you know, what are all the books that are available in Braille or the books that are available in hardcore, right? So let's start with GIN indexes. And our first step here would be to sort of add a, you know, GIN index. And that's the first line you see over here. We'll create an index on the data field and we'll use it, we'll use GIN index for that. And this makes it really easy for us to query. We'll say, you know, select star from books where the data question mark Braille. Essentially, this is the existence operator and what you're telling Postgres is, you know, are there any books where the data JSON v field contains the Braille keyword? And as we expect, you know, it finds the results for us. And if you do an explain analyze here, you can see that it's actually, it is using the GIN index and it's doing a bit Mac index. So this is going according to plan. Similarly, there are variants of the existence operator. There is an R variant where you can search for sort of multiple keywords. There is an and variant where you can search for multiple keywords existing as well. So in this case, let's say we wanted to find books that were in Braille or in hardcore, you know, you could do the same thing. You can save our data. And then you can use the exists and R operator. And you can specify an, an array of, you know, keywords that you want to find. So in this case, you know, it will find your books that are using either of these keywords. And as you can see, it's still using the GIN index to find. Now the drawback of these existence operators and is that they only work on top level keys. And this is where we see a lot of people stumped using these existence operators. So if the key is not at the top level, then, you know, it doesn't use the index and it'll result in a sequential scan. So for example, in this case, you know, instead of searching for a top level key like hardcover or let's say Braille, I'm going to look for a nested key. So for example, in this case, you know, let's say we select star from books where the data of tags contains some key. And so here the GIN index doesn't support it because the existence operators only support top level keys. And there's a significant, you know, a problem we see a lot of the customers. So the way to sort of work around this problem is to use, you know, expression indexes. And essentially, you have to create your own expression indexes for the fields you are interested in. So in this case, you know, since we queried data of tags, if you want to work around this issue, the way to do that is to add like a GIN expression index on data of tags. And in general, you know, when you're working with JSONV data, you know, I would, you know, recommend people, you know, work with expression indexes and basically index, you know, just the fields they want instead of indexing the whole JSONV column. So in this case, you know, let's go ahead, we can create an index on the data of tags field. It says still a GIN index. And if you see this time, when we do a select star from books, where the data of tags contains a particular key, you can see that, you know, the operator is able to use the index. And if you look at the expression analyze, you can see that it's basically doing a bitmap index scan of that particular index. Okay, so essentially, you know, the moral of the story is, you know, whenever you're working with JSONV in Postgres, make sure you're always doing an expression analyze to check and make sure that the right index is being used. The next operator I want to talk about is the contains operator. This is the at greater than. And then I wish these operators, you know, they had more easier names you could use. But unfortunately, at greater than is the best we can do at this point. So these operators, you know, essentially, you can use them to see if a particular JSON on the right matches the one on the left. And this is the variant we're going to talk about. So if you take the previous example that we use, where we are looking for books, you know, that have the Braille tag, you know, it's you can do the similar sort of query with instead of the existence operator, you can do it with the, you know, at greater than operator. So here in this case, we will say, no, select star from books, you know, where the data contains the Braille equals to track. And as you can see here, I know it is able to leverage the index as well. And it's able to get the chapter. These path operator, it can be multi level, and you can, and you can nest it to an arbitrary level, depending on your JSON data. So in this case, I will look for not just top level tag, but I can say publisher and the value you can provide it as well. And as you can see, it still uses the gin index. And you can nest it to sort of any arbitrary level. Right. So you can say, you know, tag contains key, you know, key of field one, which contains field one, which has all you want. So you can essentially arbitrarily nest this. And it is still able to leverage the gin index. There's also another way to do this. There is another operator in JSON called the greater than hash operator, which is essentially you can specify a path to query, you know, but gin indexes don't support those operators. So you need to be careful with that. So most of these indexes that we talked about so far in indexes, they are the default JSON be ops operator class. And JSON be also supports a variant called the JSON be path ops. It's essentially an option to reduce the size of the gin index. And as I pointed out earlier, right, when you use the default option, the JSON be ops option, you know, all the keys and the values are indexed. Whereas if you use the path ops option, essentially it will index only the values in the data. So because of that, you know, your index tends tends to be smaller. Again, your, your savings will depend on, you know, your data set, you know, on this simple data set of about a million books, you know, I created like a path ops gin index. And you know, there were about 10, 20% of savings in the index size that I just put in. But you know, it's very subjective. So you have to go and you have to try this on your data set to see how much you would save. And the path ops option, essentially, it has a limited set of operators it supports. So for example, it only supports the path operators greater than it supports some of the JSON path operators as well. So we can, if you go back and run the previous query that we did, you know, where we're looking for data, which contains tags, which contains a particular key, which contains a particular key, which contains a value, you know, it works very well, you can see that from the explain, you can see that, you know, we were able to leverage the path ops index. But like I know, but like I alluded to earlier, the the path ops option only supports one or two operators. It gives you a more efficient index, but the scenarios are a little limited. But so if you go back and jump into some of the earlier scenarios, we ran earlier, let's say use the existence operator, you know, data question mark tags, that will result in a if you don't have another appropriate index that will result in a sequential scan. Similarly, if you try to use the path operator to query like any of the intermediate, intermediate JSON objects and not specify the full path to the value, then that will also result in a sequential scan. So again, there are a lot of gotchas here. So it depends on your, you know, data set. And more importantly, it depends on what you're querying in your data set. So, so essentially, you know, you might have to look at multiple indexes of different types depending on what you're querying. Any questions on the Jin indexes? Yes, I actually just message you the question that came in says, what do you mean by leveraging the Jin index while using the at greater than operator? So leveraging the Jin index while using the at greater than which is the path index. So essentially, you know, all indexes, they have a list of operators they support. So you want to make sure that the operator you're using is actually supported by the index type you're creating. For example, let's say you want to use the like operator or do you want to use the equal operator and you create a Jin index, it's not going to work because Jin index does not support either of those operators. So essentially, you know, you have to figure out first thing is you have to understand your data. Then you have to understand what queries you plan to run on this data, what operators you're planning to use. And based on that, you can create an index, you know, of that appropriate. So let's jump back in. So the next index type I want to talk about are the B tree indexes. B tree indexes are probably the most common, you know, index type in relational databases. And if you index an entire JSON B column with a B tree index, you know, the only useful operators you have are the comparison operators, you know, the equals less than, greater than, greater than or equal to and so on. And essentially, what you're doing is you're comparing whole JSON objects to each other. And I guess it could be useful in some scenarios, but you know, it's a very limited use case. So if you're using a B tree index, you know, it's very unlikely that you are indexing a whole JSON B column. And typically, when you use a B tree index, you know, what you're going to be doing is you're going to be using an expression index, where you are, you know, you are indexing a particular key in the JSON or it can be a key or a sub key or depending on your needs. And these indexes support all the normal, you know, B tree index operators like, you know, equality comparison and so on. So the example I have here is, you know, let's say we want to, you know, extract all books that have like, you know, a critic rating greater than four, right? So you can do this with a gin index. So you need to have a B tree index for that. So, you know, let's just go in, let's, what happens if you just try to attempt this query, right? Let's say I do select star from books. And you know, where data for critic rating is greater than four. The first thing you will see is that you will get like a syntax error. Because essentially, once you say data member critic rating, you know, it returns a JSON B object. And you know, you're trying to compare it to an integer. So it's not going to work. So you have to do some additional work. So essentially, you know, if you're in PG 11 plus, you know, there are some additional casting functions. So in this case, what I'm going to do is I'm going to cast the data of critic rating and cast it to an integer and then compare and then it'll go through. If you're in some of the earlier versions, it's a little bit more ugly. Essentially, what you have to do is you have to use the text operator, like, you know, data dash, you know, greater than, greater than, which is like, essentially, it'll return the value as text. So you have to return that use the text operator and then cast that to integer. Then you can do this query. And since we are using an expression index, you know, the index needs to match exactly with the query expression. So, you know, once you have nailed down the expression that we want to query, you know, we can go ahead and create a B3 index on this. So you can say, you know, create index using B3, and you can pass in the expression that you want to index in the JSON. Okay. And then if you go ahead and if you, you know, run your query where you say, you know, select from books where the rating equals to three. And then you can see that, you know, it's actually using the B3 index that we created. And, you know, if, you know, the equality operator is essentially, you know, what you're looking for, then hash index also become really interesting for you as well. And typically, you know, hash indexes tend to be smaller than B3 indexes. So you can, you know, if you can go ahead and create the hash index, you know, you create index using hash on that particular field. And essentially the only operator that supports is the equals operator. And you can use that and you can do that. If you see the explain analyze, you will see that it's using the hash indexes. So there are a couple of, you know, other index types that I want to, you know, talk about a little bit. And one of the other common scenarios we see customers using is when you're using B3 indexes, people want to use the like operator to basically do like a regex match. And this will work if essentially you are, you know, with B3 indexes, if your expression is basically left anchor, but you cannot use any arbitrary indexes for string matching. So one of the cool sort of indexes that can help you with string matching is the trigram index. And in general, trigram is basically, you know, you know, basically a sequence of three letters. And when you create a trigram index, you know, postgres will break up your words into sequences of three letters and index all of them with a gin index. And it's part of an extension. I think the extension is part of the contract package. So you can go ahead, you can create the extension PG trigram. And when you're creating the gin index, you essentially have to specify the gin trigram option. And this will enable a trigram index on the JSONB data. And in this case, you know, we are creating a trigram index on data of publisher. And after that, you can use any sort of regex that you want to match arbitrary strings on the indexed field. You know, you see it doesn't have to be left anchored. You know, you can, in this case, I'm using, you know, a percentage, something in the middle like percentage expression percentage. So it is a pretty cool feature. And you know, if you're looking for arbitrary string matching, this can work really well for you. The next thing I want to talk about is, you know, indexing arrays. You know, gin works really great if you're, you know, indexing arrays and arrays of strings and arrays of objects. And essentially, you know, you can easily search these arrays, you know, for particular keywords. So in this case, you know, I have a keyword array on my sample dataset. And I'm going to create like an index, a gin index on just the keywords array. And, you know, then once the index is created, you can use it, you know, you can use the path operators, you know, at greater than and you can use a variety of keywords and essentially, you know, you can search the particular array contains these keywords. And if you are storing an array of objects, and if you're indexing them, it gets a little tricky prior to PG 12. And if you're looking for particular fields inside the objects and you want to have some conditions on them, you know, you might need to extract the array and write your own function to do sort of a more in-depth comparison. But after PG 12, working with arrays of objects is much easier using JSON path. And I will dive into that as well. So the SQL standard, you know, now supports JSON. So JSON is a first-class citizen of SQL. And this is detailed in, you know, SQL standard 2016. And this basically sort of includes like a generic SQL JSON data model and JSON path. And there are SQL JSON functions as well. And with the PG 12 and the PG 13 release, you know, Postgres has, you know, one of the best implementations of SQL JSON. You know, I've included a link to the PG 12 release notes where we compare the compatibility of Postgres SQL JSON with other databases. And you can see Postgres does really well. I'm going to skip over the SQL JSON data model. You know, it's just basically the definition of what SQL JSON is. And it's going to be, if you are familiar with JSON, it's going to be very similar to that. The most impactful feature that's part of this SQL JSON release in PG 12 is the JSON path release. And what JSON path is, you know, it basically allows you to specify an expression to query or to project your data. And you are doing this in like, you know, object access notation of JavaScript. And, you know, in some ways, this is very similar to the MongoDB query language. You know, you can refer to a member using dot member name. You can refer to all properties using dot star. You can refer to multi-level properties using dot star star. And if you want to search or if you want to refer to all objects in an array, you can just use the array star operator. So, you know, this gives you sort of a very powerful and intuitive way to query and project your JSON. You know, if you, this is, in some ways, it's similar to XPath for XML. You know, any of you guys, you know, have used XPath, JSON path is similar to for JSON. And as part of the JSON SQL JSON standard, essentially it also provides a variety of functions and I've put in about, I've put in three main functions, you know, JSON B path exists and JSON B path match. Now, they're essentially used to check, you know, your JSON against a particular JSON path expression. And JSON B path query is typically used for projection, where you're projecting a subset of the JSON from the what you have stored. And I'll get into examples for all these functions. Now, to start with, you know, we can go with the sort of original query we used, where, you know, in JSON path and you want to find books by a particular publisher. And, you know, if you're, if you're building a warehouse, you know, using the JSON path expression, then you would use the at operator. And in this case, you know, it's very simple, you know, instead of using a lot of your operators, you can just use the dot member name to refer to a property. So in this case, I will say a select star from box, where the data at, you can say, dollar publisher equals, you know, look for the string. And if you see that, you know, if you look at the explain output for it, you can see that it also leverages the gym index that we created on the data. And, you know, you can get more and more complicated with JSON path. And I'm going to dig into it a little bit. So if you want to use the filter, JSON path has an option to specify a warehouse that's similar to a SQL warehouse. And you can use the question marks syntax for it. So in this case, you can say, you know, dollar publisher question mark, and you can specify a filter expression and act here in the expression refers to the current object that's been selected. So you can say where the publisher equals to this, so you can add a filter path. And you can build fairly complicated, you know, expressions. So for example, in this case, I'm doing a dollar print star village, which essentially selects, you know, all the objects, all the objects that are stored in the print array. And then for each of these objects, you can actually build expressions around it. For example, I will say, you know, at refers to the object that's been selected, you can say style is hardcover, you can do price. So essentially, you know, the JSON path is kind of a language of its own. You know, it has its own filters, it has its own sort of operators, it has its own functions, and you can build, you know, fairly complicated and very powerful queries, JSON queries using JSON path. And, you know, it's very powerful, but you know, right now in PG-12, JSON path has some limitations. And the most important limitation is that, you know, index support for JSON path is like, you know, very limited at this point. And pretty much the only sort of operator that is supported is the, you know, equals operator that we saw in the previous screen. So, you know, in the PG-12 release, it's going to be very hard to use JSON path as part of the VR class. And I'm still very optimistic about JSON path. I think it's kind of the future for programming JSON data and Postgres. But, you know, I think the index support and the various operators will get started slowly over time. And there is another use case, you know, where JSON path is really useful. And this is for projecting JSON. And when you say projecting essentially, what it means is that, you know, once you have selected your rows, instead of selecting your entire JSON, you might want to project a subset of the JSON. Like, for example, you might want to, you know, pick up a particular object in an array, you might want to pick up a particular field, and so on. So, prior to PG-12, this all this was manual. So, essentially, you had to get the, you can, you can select some fields, but you can't really do a lot of complicated querying there. And with JSON path, again, all this becomes really simple. And here we're going to use the JSON path query function, and that makes it really simple to project whatever portions of JSON that you want. For example, here in this example, you know, the print's array contains an array of objects. So, what I want to do is, you know, let's say you want to only select the last element of that object. Now, you can say dollar prints, and you can say dollar dot size to use the size of the current selected object. And you can just select the last element. And you can also start to add more expressions here. So, for example, let's say we wanted to select only the prints that were hardcover. So, you can say dollar prints of star, and then you can also add a filter expression using the question mark operator, and you can say act dot style equals to hardcover. And what this does, essentially, you know, it will take all the array objects, it will filter out the ones that don't match the expression and just return that. And you can also start to chain filters as well. So, for example, I can say dollar prints of star, then I can say act dot style is hardcover, and I can say act dot price is 100, and then you can build, you know, fairly complex expression for projecting out just the JSON you want from the selected rows. So, this part is, you know, is still very useful in PG-12. And hopefully, you know, as the indexes sort of get added, you know, we can start to use JSON path in the where class as well. And in terms of the roadmap, you know, there are a lot of improvements to JSON path in PG-12, the date time field has been added. And, you know, it's a little much a minor improvements as well. And in terms of the future roadmap, I've included a link to the presentation from the Postgres Autoleg from Postgres Pro who did the presentation on the future roadmap. You know, there are thoughts of, you know, unifying the JSON and JSON being to like a unified type called generic JSON or JSON. And the link I have included here, I'll give you a lot of details. And that's kind of it at a high level. So, I think we still have a couple more minutes. Any of the questions I can answer? The only question I have currently, so if you have any questions, please get them in now, is that general database question? Is there a mechanism to block queries that doesn't make use of indexes? Yeah, I think, you know, there are several mechanisms to that. And, you know, if we just go all around with it, you'll usually find it. Okay, another one. Does it make sense to use regular SQL column queries to limit the subset of JSON records that need to be searched? Can you, Eric, can you provide more details on that? So, you can all, you can use your regular sort of SQL column queries to do your, do you mean your regular where class to limit the queries? Or are you talking about querying inside the JSON v field? Eric Shapiro asked this. He said, since the JSON v queries are slower, it seems like you should first limit the number of records you need to query by using regular SQL queries and use the JSON query as a sub query. So, at a high level, you know, I don't think it's accurate to say JSON v queries are slower. Right. I think, you know, if you're, if you're appropriately indexed, you're using the right fields, I think they will perform just as well. And, you know, as always, you want to limit your filter to target just the required set of rows. So one quick question I would like to ask is, so this JSON v and JSON v path are pretty good for querying purpose with their proper indexing. But, you know, I have seen in our experience that if you do a lot of writes to some of the fields inside the JSON column, then the performance would be bad when compared to the document databases. Can you show light on that? Yeah. So in some ways, you know, whenever you update the JSON v thing, you know, the whole column has to be updated, right? So, so essentially, you know, that if you're storing large amount of data in a single column and you update a small field, you know, it's still an update to the whole column data. So that's probably the reason why you're seeing the performance. Because it's like, it's stored like a toasted column on the disk. So that's why even the one field could change the whole toasted object has to be updated. Yeah. So there's just no way for it to update just that one particular field. Wonderful. That appears to be the questions we have. So with that, I want to thank you for spending a little bit of your late morning with us. I want to thank all of our attendees for, you know, spending your morning or afternoon with us. And I hope to see you on upcoming Postgres conference webinars. Thank you all.