 Before I start, I want to ask you, do you really understand what this picture means? This elephant is our totem animal and a trunk. And this is an impossible tourist. It's a logo of Jason. So this... So when I ask our artist to make a picture for this presentation, she asked me what's the history about this. And I said that for many years, the Polish community tried to stay separate from the Nova Square World. We say that we are relational database. And there are some small world of some hipster people who do some Jason stuff and so on. But real life is complex and our users start asking us about that. We love Postgres, but we also need to play with their application, which is a modern application, they speak Jason. And another group of people, they ask us about Jason in Postgres, they came from the Nova Square World. They start using MongoDB, start using other Nova Square databases and quickly realize that they need something more sophisticated, more flexible database, which keep... which say very robust about the data and so on. So eventually Postgres has got Jason support Jason now is a first-class citizen between data types of Postgres SQL. So now I will talk about the history of Nova Square features. Actually, 2003, we invented HStore, IN Theodore. We invented HStore for one very specific, one of the case of the Nova Square database. We was working on the system, which collected aggregate metadata from different objects. They have a similar but slightly different structure of metadata. And we have about 50 kinds of objects. And we have to combine it into one table. And this table becomes very wide, like 500 columns. I was very, I would say, angry to work in PeerSQL with so wide table. And then we said, and most values was just new. So it was very sparse table. And we decided, we were working on application using Perl. So we said, okay, there is a hash in Perl. Why not use hash and implement hash storage in Postgres? And then we got just, I remember, 6 columns. And one 8 columns was just HStore data type, which contains everything inside. And this was very one of the first use case for the JSON. It's to aggregate some data with different structure. Another, we start using this HStore. And our friends give us, talk us about another case. They use HStore for storing user profiles or personification. And this was another use case. In 2006, in Toronto, the first meeting where all Postgres developers really see each other, because before we just exchanged email, email list. We presented HStore as a demonstration of genindex. And people say, wow, this is what we need, and they start using HStore. In 2012, JSON was eventually appeared in the 9.2. But this JSON was just a text, textual data type. And was useful only for storing and verification of JSON. People was a bit upset. And I started thinking about that we need to do something more usable. And we presented at PGCon nested HStore. So it was HStore with arrays, with objects. So it looks very similar to the JSON. But nested HStore has a binary storage, index support, bunch of operators and so on. Next conference we generalized our storage. I call it binary storage for nested data. And then we forgot nested HStore, because people said we don't need HStore, nested HStore JSON, we need just JSON. So we forgot nested HStore, just renamed to the JSONB. JSONB means some people say better, some people say binary. So what do you like? And it was committed to 9.4. And since then we see that on this graph, on this plot, it's a Google trend. I made it yesterday that JSONB becomes very, very popular. But HStore is still in use. So this is a very short history of... Yeah? Yeah. And another one is... We expect that in 2018 we will submit some new features to the progress. 11 or maybe 10 something. Who knows? We need your help. We need your votes. So after we got JSONB, we have... People said that this is NoSQL on ACID. And I made this picture, you know. NoSQL databases also evolved. They become... First they want just a simple key value database. Then it becomes order key value. And so on, it becomes more complicated. But here we have a very good relational database with very strong JSON support. We have binary storage for them. We have indexing. And I will show that actually NoSQL in Postgres can compete with MongoDB. And I see no reason to use MongoDB. Yeah. Okay. But now we have two JSON data types. You see there? We have a JSON and we have JSONB. So the difference between the... Illustrated just one SQL query. So the difference. JSON is a textual storage as is. So with all your formatting, with all your... And JSONB has no white space. You see this? No white space. No duplicate case. So last key with win. And JSONB have a key sorted by length and then key. Very important. And it has a binary storage, so you don't need to parse again and again and has index support. Overall, JSONB about 10 times faster than JSON, just for sequential... If you do sequential scan. And if you use index, we can get like a thousand times faster. JSONB, really, really helpful and very useful. I will not talk about all details about JSONB, because two years ago, Christoph Petters made a very, very detailed talk about all these features. So I recommend to look on this. Also, did you understand this? You know about JSON and Argonomics. Yeah. So this is very nice picture, very nice presentation. Very good job. And here is the conclusion of his talk. Use relational data for the basic set of attributes and all other stuff stored in JSON. Forget about JSON and use JSONB. So this is... I completely agree, fully agree with this conclusion. So I save time for the more interesting stuff for the future. Yes. JSONB is great, but, you know, we have no good query language. JSONB actually is a black box for the SQL. But sometimes we want something to talk to JSON and to navigate inside the structure using SQL. For example, let's consider this very simple data here. This is just some artificial data set. It contains colors and some... on different levels. And you may think about this as a supermarket of some shop which describes the colors and some goods. And you want to find... to buy something red. Sometimes it happens, you know. You want to buy something red. So, with SQL you have to write this query. So not everyone can write this query. It's very complicated. It's not easy. It's because SQL was invented for atomic data type. It works with something like integer, string. Very good. But if you have a data type which has some structure, SQL can't work inside, you know. This was a problem. But fortunately, Polaris has a very good support of SQL and writing this complex query. You can do something. But it's not our way. So, at PGcon, we... I and Theodore, we made... we presented vodka and Jeff query. I would not talk about vodka now, but Jeff vodka is a new access method. We are still working on it. And Jeff query is a JSON query language. So, with JS query, you can write query like this. You see? Very simple. Very simple and very effective, because efficient, because it also has index support of the separators. So, if you don't know what is JS query, it is a GitHub repository, and you can download. It is extension to the Postgres. It works fine. Many people use JS query. It provides more comparison operator, index support, and search in nested object and arrays. So, it's very good. Why we did not committed JS query to the core of Postgres? People asking us. Because we know that JS query is not the answer. JS query is just a tool, which is temporary tool. We need some SQL level. JS query is still, you know, iterates with something like black box. Postgres will not... JS query is data type. Yeah. So, we were looking for the better solution, some extensible SQL language. So, and fortunately, in December, SQL standard, JSON appears in the SQL standard. So, now it is official. I know that Oracle syntax is very similar to this proposed standard. So, I think that it's Oracle pushed JSON to the SQL standard. And we started working on implementation of this SQL JSON to the Postgres. Unfortunately, so, we discussed this in January, end of January in Brussels. Yes, and after the month, we submitted the patch to the community. But community said that it's too big. It's about 16,000 lines of code. And 5,000 of regression tests. And so, nobody will, you know, able to check all this stuff and committed for Postgres 10. So, unfortunately, we will not have this in Postgres 10. We will work for Postgres 11. SQL JSON, it's a new data type. It's not a data type. It is just a data model for SQL. The SQL standard doesn't require you have a data type. For example, in Oracle, or MySQL, or SQL server, they don't have JSON data type. They have a string, nvarchar, or something like this, some container. And do or and apply model to this container, to this container. But the problem is that it's very slow. They have to parse every time. Postgres, made because Postgres has extensibility, extendable by database. So, we can implement a new data type easily. We have all interfaces. So, that's why we have a JSON B as a data type, which already parsed binary storage in a very fast. Much faster than in Oracle, than in MySQL, or SQL server. We don't need to parse every time. So, one question was, we have JSON, JSON B, should we implement something, you know, like new SQL JSON? Three data types, too much. So, we decided to work, to continue to stay with JSON B, because it's already usable, very good data type. It's a subset of standard. It is an ordered and unique case. So, we will not implement new data type. We will implement new features from standard to JSON B. SQL JSON standard contains also specification for JSON pass. JSON pass data type for SQL pass language. It's very similar to JS query, actually. Different syntaxes, but easy to understand. If you know JS query, it's easy to work with JSON pass. So, JSON pass allowed you to navigate through the structure of JSON. You can construct very, very interesting queries. You can do everything with JSON pass. And also it specifies nine functions implemented as SQL clause. It's not a function, it's a SQL clause. So, we have, instead of something like 60 functions for JSON and JSON B, SQL standard just specifies nine functions, because it has a JSON pass. So, but for example, how it looks. So, it uses dot notation. So, dollar sign means the root element. And then, normal, everybody understand this. Dollar dot ABC and so on. You can use arrays as a bracket square and filters, which is very, very powerful. So, you can specify that the value of this key at sign should be greater than 10 inside the pass. It's very handful. And also in JSON pass you can specify a method, conversion method. So, you can convert to date time, to integer to some. Take a absolute value and so on. So, JSON pass is very, very powerful. You can see that this is a query from all this, which use contains operator. And this is query, the same query, which use SQL JSON syntax and JSON pass. It looks similar, but I agree that sometimes you need to type too much. But the good is that this is standard. SQL JSON, you see the SQL exist. This example illustrates question sign filter and some expression. Expression use name variable. The name variable pass quite later. Just example. You can imagine that, a very simple example, but you can imagine, you can write more interesting queries. And we have a SQL JSON construction function. I will not talk about them, because they look very similar to Alva. JSON B, RACE, JSON B, Object and so on. Looks very similar. And we have a JSON retrieval function. JSON value. It extracts a SQL value of predefined type from JSON value. And JSON query, it's a JSON text from JSON. It can, very powerful, it's returned many, many values. JSON table, it's think about unnest for JSON. So it's convert JSON to the relational forum. It's very powerful, also function. No testing function, JSON exist function. It's also useful. I will show next, for example, this how to use JSON value. In JSON value, you can specify also returning, returning integer. So we default minus 1 on empty. So if there is no value, then by default we will return minus 1. If there is an error, so we don't default minus 2. So here is an example. And JSON query use another example and the example of using wrapper. So we can specify should we wrap value result to the array, to the arrays. Conditional, unconditional, without wrapping and so on. It's very simple examples, because we have 5000 lines of examples. Yeah, this is just for easy understanding. Yes, that it is very powerful. And constraints, finally we can do constraints. This is what people ask about. Now we can do. Let's demonstrate the power of JSON table. So we have a delicious bookmark. And we convert JSON data. It's about 1 and 3 GB today relational data. So the schema is very simple. Just column name of type JSONB. And it looks like this one. So the structure is it's complete. So how to convert it to relational format. With the JSON table, it's easy. Not very easy, but it looks manageable, you know, the query. So here we use JSON table can be used only from page, from cloud. So we can specify the necessary path. So we can make some cycles inside. And we specify what we do with the result. Inner join, cross join, and so on. You understand it. You have to create new relational row with JSON value converted to a scalar value. So you have to if you have, for example, one array, you have to use join inside. If you have several arrays, you have to join between all of them. It becomes very it's very difficult. And so now we got relational data. And it's relational data in this case becomes about 2 times bigger. And this schema. It's just example of JSON table. So an SQL JSON finds something red looks like this. Over the JS query, but also understandable query more or less. You can play with SQL JSON already. We have a GitHub repository. If you want to understand the grammars, BNF is available also. Somebody published it, I don't know who. And we need your feedback bug reports and suggestions because SQL standard was written on very specific dialect of English. We Russians we very difficult to understand sometimes. So we need your help if we miss something. And especially we have problem writing documentation. Imagine how to describe document all these features. It will be very difficult for non native English speaker. Yeah. So the road map we try to push SQL JSON to Postgres 11. We have our fork Postgres Pro will be available maybe in May. So you can download and play already. Yes. And another is dictionary compression. We also plan to push to Postgres 11. Yes. And also will be available in Postgres Pro 10. Yes. And when we will have a sharding we will have a real SQL database. And sharding will be in several years I believe in Postgres. So now we will talk about JSONB compression. The idea is to have a transparent compression of JSONB and access to the child elements without full decompression of the JSON. That was the goal. And motivation was that I've seen JSON documents with very, very, very long key and the value is just true false. Or just integer. Or just something. And I understand people. They want to have something normal, understandable. Not just F1, F2, K1, K2. But JSON But it takes so long, so big overhead inside. And people asking us about could you please make it something like compression with JSONB. And the idea is very simple to use dictionary compression. So dictionary compression means that you have keys, key names stored in some external place and in JSON inside use just ID of keys. And then you can save a lot. For example if you have key name like 20 bytes you replace by integer, by 4 bytes and you have 100 such keys in the JSON and you have 1 million of rows by 1 million JSON you can save several gigabytes just on dictionary compression. Of course if you have long values the saving will not be so big. Also we try to compress values because we we try to use to you know object and array entries we have overhead like 4 bytes per array element. If you have 8 elements in array you'll spend just 32 bytes just for overhead. 8 bytes for object field. So we can save it using variable byte compression but I don't believe it will help too much. Dictionary compression is the best what we expect. But you see that what we use inside. So keys replaced by the ID we use delta coding if we have sorted ID arrays so we can use delta coding we have variable length encoding entries instead of 4 bytes entries for arrays we use chunk encoding for entry arrays and store integer numeric into int32 and use variable length encoding. It looks very strange but it works. Believe me I will show you some examples. We implement create compression method so you can specify your own user compression method and the syntax now looks like create table column name data type and you can specify compress username and suboptions but you can make alter table alter type and information about compression method stored in the pg attributes we have two additional fields it's compression method and compression options and one implementation of this infrastructure is JSONBC compression method type using dictionary compression so we have two possibilities so you understand there is an infrastructure and one implementation actually we try something like 10 different compressors we use lz for dictionary we use snap many many compressors but they are not they work well sometimes but they are too slow you have to decompress all JSON to get the value of specified keys so we decided to stay just with JSONBC which is a dictionary compression and it's you can have dictionaries identified by sequence ID and you know with sequence ID you have a fully transparent updates of dictionaries but and another if you know predefined key names you can use just Yenom so we have two methods for the dictionaries here is example we have create table JSONBC compress JSONBC and can manually create dictionary sequence and create table and specify this you know manually updating this and now the results of compression we have two data sets it's a delicious bookmarks it's 1.2 million rows they are mostly string values relatively short case and have two arrays of tags and links of three field objects so it should be not very good for the compression and another is a situs dataset customer reviews from amazon it's a 3 million rows rather long case and a lot of short integer numbers so it's a very good candidate for the compression yes another test we made with compressed file system it's a page-level compression which we implemented in our posguest pro enterprise it was also interesting how JSONBC compared with just use compression file system and here is the results so this is three two data set and the diction bookmarks here is a relational table so it's a JSON converted to relational data and here is the size and slide with about 700 megabytes and this will be different in all colors because so this is a compressed file system so we see that compressed is about 250 something like this so it's come up CFS is good for this JSON compressed also good JSON JSONB here is a JSONBC so we see that JSONBC has a very low overhead overhead to the relational data but with deletion bookmarks it's not so good so we have ah, I forgot to say that here is a wiretider the same data we load the Mongo so we see this is a Mongo size and Mongo Mongo developers they make very good job they have a very good compression so our JSONBC has a bigger size there is Mongo wiretider but still the idea of this plot is to show that JSONBC is very good compressed it's very close to the size of relational table and also interesting to see that compression file system works very well so if you have CFS you don't need maybe JSONBC yes the same for ZFS we did not test ZFS because we just don't have experience and our engineer said that it doesn't work but I saw that some people said that it works very well but I think that we will try of course but compression is another one side another side is performance because if you have a very good compression but it's very slow you don't need it so here is the performance it just contains query and we see that for relational data we have very slow performance but JSONBC and JSONBC have about the same performance and even better than Mongo so Mongo is slower here again JSONBC looks very good it has a good compression and very good performance so what does this mean what does it mean first we run all our tests in non-parallel mode then we turn on parallel mode and we got this number so you see that JSONBC and JSONBC several times faster than Mongo in parallel mode another query how to get several keys we see that JSONBC is faster than JSONBC we did not test MongoDB because we don't know how to measure time in Mongo they use a cursor impossible to check how long it takes to get several keys maybe you know you save me and very also interesting aggregate we use this average and of course relational table shines of course very fast but JSONBC still have comparable results and also is faster than Mongo and again we can use a parallel mode which parallel mode is that the parallel mode now in 96 yes what is the slide showing I understood the previous one oops getting several keys so key not just one field but three fields so you see that this performance and we have of course one problems and one problem is transactional dictionary updates of course sorry I will ask so currently automatic dictionary updates we use background workers but autonomous transaction would be much better so maybe it's posgres 11 there will be some autonomous transaction we can use them and also we did not implement cascading deletion of dictionaries because for example if you drop some column we need to track the dependency if there is some dictionary associated with this column this is easy, we just did not implement it and also we did not implement a user compression method not fully supported but I don't know should we do this summary is that jsonbc can reduce jsonb column size to its relational equivalent size so it's very good and it has very low CPU overhead and sometimes even faster than jsonb and compression can be significantly lower than in page level compression on cfs we need of course to do some work and availability you can download from github and play check on your data now we need to do some benchmarks how no school is fast because there are many many myths and the first myth made myself in 2014 it was my first non scientific benchmark which people I first show that jsonb can be faster than monger because before people just didn't believe before people said that monger is very good, very fast but when I did this benchmark very simple, very non scientific I was surprised that we are faster we are faster and you see that time monger was very slow for loading 10 minutes compared to the 40 seconds it was that time then enterprise db published the result no school benchmark it was already it was now it was a bit more complex than my non scientific but people still believe that it's not good because all this benchmark was written by positive people of course and people tend to believe some independent and scientific benchmark the benchmark should be reproducible they should support more databases many work laws it should be open source and so on so what we did we implemented we run ycsv benchmark do you know what this is benchmark available there is a scientific paper and it's a de facto standard for the measuring nose to a performer so we run ycsv for the postgres master and monger db 3.2 it's the latest version we use a two servers one server to generate load another server for database and they connected with 2 gigabit ethernet and ycsv benchmark provides 6 core workloads workload A it's update heavy 50% of updates and 50% of read workload B it's read mostly it's 90% and 5% workload C is read only 100% read workload D read latest so you insert new record new rows inserted and then get all this so you get hot workload E it's a short range and workload F it's read modify write so you read record modify and write back changes and all workloads used zip file distribution for record selection so this is not written by postgres people this is written by some independent people there are published results for many many databases so now we have for postgres we use a postgres 9.6 sometimes and master sometimes with asynchronous commit on why? because we want to match the monger db monger db g0 because if we use a g1 it becomes too slow very slow so you know what this asynchronous commit so we use 2 sets of 1 and 5 million rows and with asynchronous commit off we were able to run just 100 kilobytes of rows because monger db becomes very slow progress is optimized for this but not monger also we use fast update for gene index this is a problem of gene we know this and we tested a lot we tested functional d3 index for jsonb jsonb cd sql json functional d3 index for jsonb on compressed file system gene index for jsonb for jsonb build object monger db wiretiger this is default and we returned a whole json and another test we returned just one field we used 2 kinds of json just 10 fields and 200 fields this is very important because with 200 fields the average size about 7 kilobytes so it's toasted and you can see all variants and here is the result first we identify the problem we identify problem with hot you know what is hot update workload A hemi update 50% of updates and we found that progress becomes after some 40 clients it becomes the performance drops and we found the problem because hot doesn't support all structure all the data types like json, hstore, arrays because we have a functional index on the one key but updates different key and hot doesn't understand this so this is a problem we made some quick fix of course for the hot and then it becomes better so all other results will be with this fix but we need to make more sophisticated fix it was quick and dirty and here is the result for the six workloads workload A, D, C, D, E, N and then the monger is yellow monger is yellow and v3 json, v3, json, v3 we see that on all workloads functional v3 json, v3 is better monger is always lower I remind you this is for 10 fields so short rather short, not toasted json another results it's a 10, 1 million rows 10 fields and we see that progress is better in all workloads again here monger is crimson vilec vilec monger and all other is progress so we see that again progress is better than monger and we see that djin is not good for updates this is understood djin should be not good for update index but it's okay for select for workload C for example workload C djin is the same as v3 but djin has very interesting feature djin index all keys functional v3 key index just one key like in monger so djin sometimes could be very useful but not for updates now we have results for 5 million rows, 10 fields we also see the same results that progress is better progress is blue and monger is green sorry for the colors colors is why we should be green but it looks like yellow that's my objective yeah maybe but believe me it's green here yeah but we see that all this progress 1 million rows 5 billion rows 10 fields progress is better than monger okay so we made 1 million not 1 million 100 keys rows it's just my mistakes 10 fields and we use asynchronous commit off and we see that progress also better this is monger okay yes but situation is not very good when we take 200 fields so json becomes toasted and we see that monger db is better on somewhere close we need to understand what we can do what we tested here progress is toasted and we use and we identify that 50% spend to the pglz to uncompress it so we create another test with no pglz it becomes better but still sometimes we monger db is better so for long json b monger db has some chance in somewhere close of course so the conclusion is that pglz still beats monger db on one server that's true because we don't have built in sharding some distributed model so we use just one server i hope that in several years we can repeat this benchmark and see how we are good in distributed model but i believe that 90% of projects could be run on one server you don't need many servers so the summary of my presentation is that pglz is already good no squal database yes and the squal json provide better flexibility and interoperability that means that queries from Oracle will run the same as in pglz because it's a part of squal standard this is very good and squal json is just just the beginning just starting i expect this year or next year they will release something more because squal json doesn't specify any updates, any modified operator just build and query and json b compression is really useful we also expect in pglz 11 pglz beats MongoDB in one node configuration so we need to figure out how to make a hot support for json b and hstore and other arrays maybe we need it's time to introduce type specific compression so for now we use pglz for any data types this is not good because for json b we need some another type of compression for example so this is also direction of future work to introduce type specific so data type should understand which compression is better for them not just pglz and maybe we can run ycsb benchmark in distribute mode using cytos cytos demonstrated the cytos mx how to run in distribute maybe we can run using cytos and posrace will help no squal users to avoid nightmare yes we need sharding and community is working on it we have two different using fdw and using cytos approach so we'll see so expected in 2-3 years and the last the slide is about credit to people behind json b who help us with json b andrew who also made the first version of json and help us with json b roger class who work with andrew this is admitted although who also work for implementing some functions he ported actually hstore functions to the json b functions and now he implemented full tech search for json and subscription syntaxes like arrays for json b so i don't know how andrew will commit this to posrace 10 full tech search will go to the posrace 10 and 3 companies engineer was the first company posrace was interesting this is how our community works they just sent me email and said would you like to experiment with money to work and do something then hero cook and our company posrace professional still supported the development of posrace yes and the last slide sells to think what is the birthday of mincent van goch this is my favorite artist and this is a very fun but not by van goch of course somebody emulate the style of van goch but i like this picture so celebrating van goch okay thanks and attention yes i am here so you can ask me and we can discuss everything