 My name is Amanda Gilmour and I work on the Heroku Postgres team where I write a lot of SQL understandably and One of the nice things about running so many databases at scale is you got to see a lot of things You get to see a lot of different schema types And you got to see a lot of quirks like if you're migrating a database or things like that And I've had the interesting advantage of seeing quirks and learn how to deal with them And one of those quirks is that you can put rich data in Postgres I don't know if all y'all were here for the industry summit on Monday morning, but there was a panel about Bridging the divide between Postgres and NoSQL in industry So this is a thing a lot of people are talking about and using as evidence by like the foreign data wrapper features and things like that So But the thing is if you're in a Postgres mindset that can be a bit of a challenge, right? Like if you're used to thinking of data in a very relational way It can be tricky to know what document you could use the best for your application At one point I was at the office and one of my colleagues had a friend over and they were talking about How their partner was starting to use Mongo and how this person was like, ah, there's no joins I don't know how you do it like what's going on and I Happen to be there and I'm like hey actually I'm on I'm on the data team I can help with this and the the the short answer is if you're thinking about joining That's not quite how that works. You're thinking about it in a weird way So I am here to show you both how to think about that data what kinds of applications are going to be best suited to what rich data type and Show you some ways that you can query on those rich data types Because that's also important right like you don't want to pick a data type That's kind of a pain in the took us to work with you want something your developers are going to enjoy and you yourself are Going to enjoy working with so I'm going to go on over that And one of the things I love about postgres is that it does give you a lot of options, right? Like if you want to do something with it chances are it can help you do it And to that end it gives you four Though at least four that I'm going over today gives you four rich data types And I'm going to show you how to go over how to you know use these how to query on each of them I'm gonna therefore there's XML H store H store is a proprietary one that is postgres specific XML you may recall from you know RSS feeds. Yeah And also if you've ever worked with a soap API XML is going to look pretty familiar if you've never done any front-end dev It's very similar to working with the the DOM the doc You know that if you're doing front-end of I'm going to lump JSON and JSON be together though because a lot of the same Operators will work for both formats and the main difference is under the hood and I'll I'll go over that and how you can decide which of those two is going to be best for your application Which is tricky, right? Like there's so many options. How do you decide what to go with? So I'd like to take you all through some questions that you might want to ask yourself and your team as you're trying to Decide on a format So the first question is actually this quest the answer to this question makes it pretty straightforward, right? Like if you are wrapping a foreign database Just pick a data format that closely maps to that database, right? Like if you're using mongo, you'll want to use beast or a JSON because you know that that mark that language is already derived from JSON anyway, so straight mapping. It's really easy If you're working with Redis key value go with H store So I'm going to go into how to query those in a little bit Some other edge Casey things is like maybe you do have a normal object schema that is well represented By a relational database, right? And this is something actually that I encounter a lot in my current work where it's a nice Object model a lot of stuff like works with single table inheritance, but you got some weird attributes, right? Excuse me there's So an example of this that I like to offer is like if you're building a catalog for rental cars, right? Like you don't want to call them for each option because there are a lot of them You can have power windows power steering. Maybe it's a manual transmission You don't want to call them for each of those and you also don't want like a table that connects on an ID So you can have a rich document that represents those attributes and save yourself a lot of time and energy And this actually more closely represents what I do with my daily work I mean, I've used all of these formats, but this is the one I use the most now another really important thing to consider here is that You can dump your data as a blob into a row and it will live very happily there But if you're performing a lot of operations on it It will still lock your row So if all of your data is in one row and you're using it as a document You might want to consider possibly wrapping a foreign database, for example, or as I'll show you breaking it out into separate rows So rich data is very powerful, but it does have limitations when you're applying it to Postgres So the idea is to understand what tools are good for the job Because you can't if the lock contention will still be there if you're editing a JSON object, for example I'm gonna go in reverse chronological order Just in terms of like what you might encounter Which puts us at XML first You like XML I love XML Honestly, I like I think that the X path is like really easy like I find it to be super intuitive But I used to do a lot of like front-end automation development. So like that makes sense But it's also going to depend on like your team's culture, right? Like if that's intuitive to you roll with it So Here's how you'd make a table pretty straightforward. I'm sure you've all seen that before I have a nice handy Empty database right over here that I made for this demo It's not no tables in it yet. We will add those So I'm going to go on ahead and add a table right here Boom, we have a table now we can put data in that And query on it. So if we go on over here, there are a couple of ways that you can add data with XML So this one's a little bit more complicated This is if I'm bulk inserting an entire XML object into the database and as you can see I've got a version header here And what this is doing is it's saying XML parse so I'm passing postgres a function so that it knows it's getting the whole shebang Right here. It's getting the header. It's getting everything and it knows how to do that do that with this XML parse function Alternatively this takes a content argument right now I'm showing it with document because I think that's the powerful part about this function is it knows how to deal with everything I Have truncated this because it's kind of a behemoth of a document, but let me show you that so if we go on over here I have an entire XML document that I shamelessly yoinked from MSDN And actually at the end I'm going to have links and references to and the links and references to that will be in the Read me all of these code samples. You're seeing is on github So don't I got your back like have you ever been like in a talk and you're like frantically trying to get a picture with your camera? And you hate but yeah, yeah, I see yeah, I hate that I got your back You can go into github and get all this later. You don't need to worry about the camera anything like that So this is an entire document It's using that XML parse function. I'm gonna go on ahead and stick that in our database Boom now. We have data. It's great. Select all the data. Ah, thank you. I'm wearing contacts right now. I never wear contacts Kind of fuzzy at close range boom And I really like that it pretty prints this not all the formats do that but XML does and that makes me happy So you can see that But the thing is like the power of a document-oriented database is that you don't want to just both query everything Right, you want to drill down in there and you can do that with XML too So in order to demonstrate that a bit more effectively, I'm gonna add several records with a different node So as you can see this is like a book catalog, right? Just think I've worked on happily. I've worked in libraries. They're lots of fun I've had to solve technical solutions for libraries. They're nice folks librarians. So I'm gonna go on ahead Put this data in here and then we have something to play with and as you'll see As good programmers are I'm lazy. What did I call that database PG comf? The f flag do you ever use this f flag? This thing is awesome. I love this f flag It's really handy if I'm like messing around in the console to like really work out some complicated Sql and then I can stick it in the text file and then put it in later. It's great So now I have tons of data in here that I can query with and this is what it's gonna look like I'm gonna have one row with a book in there So Let me show you some of those x path functions that make this really powerful So like I mentioned if you're ever you know if you've ever done front-end dev You're used to like an XML or an x path type format You can use this function to query just like an x path and what this will do is it'll get as a String all of the items at this node in the x path Boom actually let me turn on There we go Yeah, now you can see what we got And you'll recall that because the first thing I added was an entire document that first row has all of the titles So if I did opt for an architecture that had bigger blobs, this function helps me out so I can still use that data Reading is easier. I mean, I don't need to worry about the lot contention obviously So like if you're doing a read heavy application this No, don't do that if I'm doing a read heavy application This could be a great way to go right here if I'm doing something. That's a bit more operation heavy. Maybe I want to do this It's up to you Other fun things that you can do Excuse me, so you'll note here that like that's an array right like you've got this little Bracket situation because of course in this case I have a lot of things that this query is returning so Postgres helpfully gives this back to me as an array So if I only want information for one record I can traverse that or I can use an index to pull out the data that I need just like here So if I'm looking by title for example, I can say I just want the first thing out of that array So here you go on down Sorry There you go. I've got all the data just for that one record and I have used that Array operator to pull it out of here So, yeah pretty powerful stuff Updating on the other hand is not a thing you can do natively with X path So Like I said, like I would recommend X or you know XML if you're doing something really read heavy another nice use case for XML like Back in the day. I used to work with a credit card processing API that it was a soap API So, let's say I wanted to process that information Asynchronously like dump all the data and then deal with it later That could be a good use case for XML like I can just dump it in the database and deal with it later Anything else that's read heavy. This is a really great data type for that Especially if you find that X path to be really intuitive. This could be a good one updates not so much I should caveat that by saying you can update it, but it takes a third-party library Which is kind of outside the scope of this talk, unfortunately I could go down a lot of rabbit holes if I went into a third-party libraries Excuse me sec Yeah age stores age stores are fun These are speaking of things that are native to Postgres age stores are specific in Postgres This is a postgres specific format that first came out with version 9.1 And it's kind of handy because it's a straight key value store So it is really intuitive, you know Excuse me on the other hand It's a string under the hood so if you're trying to do Let's say comparisons to other data types. You have to do an explicit cast Which I will confess kind of trip me up when I first started getting into this data type You know like I guess it shouldn't because I like being explicit But you know you do have to be very explicit with this data type Which is not necessarily the case with some of the others as I'll show you The other thing that can be a bit of a trick is that age stores native But it's native as what's called a contrib module which you may or may not have heard of but it's it's an extension But it comes out of the box, so you don't need to download anything to install it So if I were to go on over here Close that out And try and create that table without creating the extension postgres will be sad and it throws fits So if I create the extension explicitly beforehand Everything is copacetic There we go, so let's stick some data in here We have a schema And as mentioned this data is just a string So it's pretty easy to read. You've got the nice Comma thing going on to the little apostrophe thing to show that it is just a string This is this is shamelessly yoinked for my staging environment because good programmers are lazy and Yeah, this is just like if I had a formation of databases That's what this represents right here So that's from my staging and I'm gonna put that into the database and we'll do a little bit of testing with it. Oh Yeah Now again, that's not super helpful. Let's stick a bunch of records in here I've got several h-store samples here copy that file path Pdconff Boom so great look at all this efficiency Let's queries and stuff now So as mentioned if you want to make a comparison to this data to something else it does require an explicit cast So here I am casting this to an integer I'm checking to see if I have large ish relatively speaking large ish resources So let's go on ahead and try that out That's not larger and equal to so it gives me everything except The one resource that is 30 megs which is here. So I've got a bunch You can also if you wish I don't know about all y'all but like when I'm querying I frequently do the like the slash dt thing to like Check what I'm doing as I'm querying kind of like a frequent. All right. What am I working with? What am I working with? I Also do that if I'm working with h-stores because it's very helpful to see what am I have to work with? What do I have to compare to as I'm crafting my queries? This little function here will just give me everything in that table. It'll give me all of The records that have this particular key I'm getting ahead of myself That's okay. Let's look at this one first Cool. So let's say I'm looking to query on a key, right? I don't need anything else. I'm just interested in a particular key boom Actually interesting note so like As resource name since I'm doing this cast you'll note that it calls the column text So I have to explicitly name my column if I want to see something like make something useful out of that Excuse me now That's just that's not particularly useful information on its own. Oh, hey, I have a question. I Certainly can my sincere apologies Thank you for asking There we go. How's that? I am big and sublime text But I forgot that whenever you open a new terminal window. It doesn't it doesn't get bigger on its own So thank you How's that that better? Cool Sweet, um, so yeah, this is not terribly useful on its own, right? Let's ask for some more columns to make more sense of this information If we go on down here, I can also ask all right, which of these resources are active Which of my nodes are still kicking and they're still doing stuff So as you'll see, I'm not querying any additional columns. I'm querying the same column just my documents column Which is where the document data lives Cool. So this is a little more useful now I can get an overview of which Resources in my formation are doing their thing which ones are happy. Oh, I didn't let me rename that too as status Now in comparison to say XML One of the nice things about each store is that you can update things You can't quite do it in place literally like this is sort of How do I put it like one of the One of the things that's important to remember about this is that like many programming languages You can't add duplicate values for the same key. Well, it will not be happy The way Postgres handles this is it takes the second value So if I add two items that are active resource and true Postgres will discard the first one. It'll take this the second one Thinking it's the most recent one, I guess So in that case, that's how you update a thing with age stores. You use this concat operator Which seemed a little weird encounter intuitive, but once you understand how it's working under the hood. It makes a little more sense So here I'm updating the value of active resource Specifically where documents and I'm using a containment operator here because again remember this is just a string Or I'm sorry. This is just text. Let me not be imprecise. This is just text So if I can say hey, does this piece of text contain another sub piece of text? That's how I can ensure I'm updating the right thing I only want to update for example the resource name that has this name swimming swiftly Interesting aside so like If you want to ensure that you're not Doing the wrong thing you can add indexes on these so if I wanted to add a not null on these I'm pretty sure you can do that Which is nice because like it's a key value. It's a blob, but like Postgres still has your back Right, like it won't let you do any as long as you tell it what you want the rules to be You can't overwrite a thing that you don't want to So I'm going to go on ahead and show you that containment operator, which is really nice over here Also note I'm explicitly casting this to an age store because again, I will keep repeating this. This is just a string Or this is just text just text Boom and this updates two things in our database So if I go on ahead and rerun this query now, I can see that some of my resources have been marked as no longer active Now Let us go on to the fun ones Jason. Well, they're all fun. I think they're all fun This one is how shall we put this one's a bit more complicated There's certainly more Operators available to you I Like to think of it as combining the best of both worlds really We're in the side I was reading an interview with Voice actors who are talking about how like it really wears your voice out if you're talking all day and like now that I do talks It's like man, I totally understand where they're coming from like you need to drink water all the time So thank you for bearing with me Jason Jason beer very exciting This is more closely resembling what you would see in mongo for example I think if you're wrapping Cassandra, this would probably be a good fit as well Cassie's pretty cool Now the big difference between Jason and Jason be it's very important is that Jason is just text again like h-store It's just text the Jason be is a binary formatted data. So you can do like be tree indexing, which is pretty cool There's some other cool things and I'll show that as I go Let me first go over some of the operators because there are a ton of them for Jason And I also encourage you I'd encourage you to do this for all of the data types But I would definitely encourage you to go look at the Postgres documentation and get a sense of those operators Because there are a ton and they're a handy quick reference. So you may recall When we were looking at h-store that we had this little arrow operator here to connect our key and our value You can do the same thing with Jason except the beautiful thing about Jason is that you can have like deeply nested objects And they get pretty complicated So in order to use that you would just access that by chaining these together. So like Index item index item array value index item ad nauseam, and that's cool The big difference between these two is that this gets the JSON object by text So if I'm you know checking containment, this is the thing you're going to want This is handy also because this checks if the key even exists Which can also help be helpful, right? Like if there's no key there at all You're not going to want to bother querying on it or perhaps it'll break your query So you can use that to check Let's go ahead and add a table and Here I've added a column called text JSON or about I am about to add you will see me at it I have a column called text JSON and I have a column called binary JSON And I'm going to do that to help me more easily illustrate the comparison between those two data formats and how the Operators behave with both of them respectively. So let's go on ahead over and do that There we go we have a table Let's stick stuff in the table so this is the text JSON and You may ask what's the benefit of using text JSON if binary JSON You know it has these benefits right like you can add the B tree index. You can it's generally faster So why would you do that? Well, I Would say the short answer is don't unless you have like a legacy application that has some special constraints Because we all have snowflakes. It's it's fine. It'd be ashamed of if they can be a pain, but they're there So if you have a legacy application with some special constraints that needs it to be text then you might want to go with it Alternatively in much the same way is like the XML right if you're dealing with just string you have an application That's not terribly right heavy. You're just reading it and then manipulating that data at the application layer You can get away with JSON text, but generally I strongly advise JSON be it's better and the operators play more nicely with it But one of the nice things JSON the text JSON format does do is it ensures that your JSON object is well-formed And it will it will not be