 Cool. Hi, my name is Corey Hinker. I am a programmer at MOTE, and this talk is about some of the work I've been doing with partitioning, and specifically with ranges, and my interest there was that there is a lot of partitioning solutions out there that cover time series data, something where there's a notion of a new partition and old partitions that could be aged off, and this particular, our particular need didn't really fit that. So that got me looking into more about how to, what we wanted to do was partition by client names. We have some text named client data, and it's very lumpy, and we couldn't just go by simple, you know, the A's go here, the B's go there. We had to get much more detailed than that, and that got me thinking about could we leverage ranges in native Postgres to do that. So, show of hands, how many people have used ranges in Postgres? Okay, some of this is going to be a little bit basic for you. What I'm going to be talking about is an introduction to ranges. I'm going to be occasionally griping about things that I wish were there and aren't, and then I'm going to get around to, once I've established all that, just a quick overview of what we did with range partitioning. I like ranges. They allow you to indicate in the data that two different values are connected, start, time, and time. You know, beginning of your hotel stay, end of your hotel stay. If those things are range, there is no doubt the intention of the data. Likewise, when you're writing statements that use ranges, you're allowed to use operators that more accurately convey your meaning. Do these two hotel stays overlap? Things like that. You can also index on ranges. It's just, it's not somebody last night said that Bryn was possible, but I hadn't heard that before and didn't check it before this presentation. And it also allows you to do exclusion constraints, which is useful for things like resource contention. No two people can be staying in the same hotel room at the same time. And the thing that interests me most, and why I made this a talk here, is that no other relational database has them, or at least they didn't the last time I looked. Ranges go by standard mathematical notation. And if you're familiar with that, then you're going to find that how ranges are expressed to be very natural. If it's a parentheses on the left side, that is an exclusive, meaning that it does not include the actual value indicated. So in this case, values greater than x. If it's a square bracket, then it that it does include, so it is values greater than or equal to x. Likewise on the right side, that is the upper bound. Exclusive at inclusive. If no value is given, that means there is no lower bound or no upper bound. So the set of exclusive comma exclusive means everything, every possible set for that data type. Likewise, the word empty means none of the values. There's several ways you can construct ranges. The first is casting from text, and in that case, you need to specify your bound. Inclusivity, exclusivity, the actual bound value, comma, the high value, and the inclusivity, exclusivity of that end. If you omit one, that means that that side is unbounded. It does not matter if you call it exclusive or inclusive, it will be considered infinite and take all values to, in this case, all values to the right of low, will be part of that range. There's also a constructor function, in which case you pass in the low and high parameters, which can be null. And then you give the two bracket bounds as a pair at the end. Now, if you omit that, the default is inclusive on the left side, exclusive on the right side. There is no polymorphic way to do this. That was a small gripe of mine. So you can't select to a range of a type with arbitrary low and high bounds and an exclusivity constraint. What types exist? You've got your basic integers, numerics, time stamps, dates. They left off Boolean, I guess because the range isn't that big. And the one that really got me was there was no text range. Anybody care to guess why there would be no text range? Collation sequences. Okay, what letter is greater than another letter? It depends. You would essentially need one text range type per collation sequence on your system. That can vary. And as we've learned, unfortunately, the collation sequences on the operating system can change out from underneath you. So that could be a very big problem. If they wanted to create all the possible text range types, they would have to spend one oid per, and it didn't strike them as worth it. So you must create them on your own. In our case, the actual ordering didn't matter. We just wanted classification and getting our data into smaller buckets for easier querying because we were going to make heavy use of GIN indexes on text search. So the smaller you can make those tables, the faster it's going to be. So in this case, we just used a text range with the C collation, which is probably certainly the most stable. When you have range values, you can decompose them into their component attributes. In this example, I've created various stages of water. Please don't get all science class on me because I'm sure that the state changes involve ice and water being both at the same temperature. But for this purpose, you see that we've created ice as being defined as any temperature below 32 and not including. Water takes over at 32, goes up to but not including 212. Steam takes over at 212 and goes up as high as it can. You'll see that I used a different way to construct the ranges on each at each instance. The first one is just a standard text coercion. The second one is the explicit constructor and the third one there is the constructor omitting the final parameter, which then defaults again to exclusive on the right and inclusive on the left. Just for fun, I added in another state which has no temperatures. There's a series of functions that you can use to extract those attributes back out. They are lower, lower-inclusive, lower-infinite, upper-inclusive, upper-infinite and is empty. There's a few more but they're not specifically tied to extracting specific scalar values back out of a range. I don't remember whose blog it was that had this example to set your nulls to be displayed as that little pillow character. It was only when I was making these slides that I learned that the travel industry calls that the pillow character and it's also currency in a couple of Nordic countries. So it wasn't maybe the best choice but I find it really useful. So here you can see that I have decomposed all of the scalar values back out of the ranges and there really aren't any surprises there. The most interesting ones would be that you can see where it was unbounded. There will be a null value as is the case for the low value of ice and the high value of steam. In those cases it will also be infinite as you can see for the low value of ice and the high value of steam. In the case of heat death there is of course no values there so you're going to have to contend with the fun of null arithmetic. Ranges come in two forms. There is discrete and continuous. And the way you tell them apart is if that data type has a concept of a next value. If it does then you're dealing with a discrete data type. That's your integers, that's your dates. In continuous in the case of text you can always add an additional character on to tie break to otherwise similar strings. And you can always add a smaller number in the case of number range. You can see that there is no smallest number you can get where you will have two continuous values described differently that are actually equal. In the case of discrete types you will see that I have created a value from 1 to 10 for in 4 range. But the canonical description of that shifts it to inclusive on the left, exclusive on the right so it becomes 211. They are the same value internally. That is just how it chooses to store them. Some operators just don't work the way you expect them to. In this case less than, less than equal, greater than, greater than equal. The concepts don't really map well. So internally what it does is it tests on the lower bound or the lower bound scaler first and then uses the upper bound as a tie breaker. There's not a lot of real world cases where you would actually need something like that. And I thought about it for a while, like what would it be better for testing? If a median value, would you actually be testing like say the cardinality of the range, something like that? I couldn't come up with anything and it turns out they are needed internally for indexes anyway. Some other operators that are of value, this one is strictly to the left of which is to say that the normalized upper bound, again you have to remember to shift things to the canonical description. If the normalized upper bound is less than the normalized lower bound of B, it is strictly to the left. And as you can see one to three non-inclusive is strictly to the left of three to five because only one of them actually has three. However if you switch that exclusive bound to inclusive it then becomes false. Strictly to the right of is the exact opposite of course. And in this case I decided to do a little more illustration with fun you can have coursing text values into data types. Casting the string today into a date works. That's kind of fun. As does tomorrow and yesterday. It's not quite as good as the UNIX date command so you can't say two days ago or at least I wasn't able to get it to do that. So in this case the question is a range of today and up to tomorrow, strictly to the right of yesterday and up to today. The answer is true. Again if you change the bound from exclusive to inclusive it becomes false. Some operators I'm really not sure what the need for them was and this is an example does not extend to the right of. So basically is any element, if any element of A is greater than the greatest element of B then it does extend to the right of and that would be false. And as you can see here I've taken our common values and tried to show the similar tests does not extend to the left of. Yeah these I have not found a good use for these but they exist and I mentioned them because there's some operators I thought really ought to exist and don't. Adjacent. Now this one's actually useful. There is no overlap between two sets and there's no gap between them. The most interesting part of this is it doesn't matter which one is on the left or the right and you'll see that I actually put the lower set the lower range on the right in the first case. And in the other ones I showed an instance where there is a gap and then an instance where there is an overlap containment. These are the same as the is this point in a circle operators. I don't have a good name for them so the value that is on the greater than less than or pointy side must fit entirely within the range on the at side. The side that is testing for containment can be a scalar or can itself be another range. And you can see here I've shown that one is in the set of one to four inclusive. 20 up to 30 is in the set of one to 100 one inclusive. The date infinity is in the set of all possible dates. The everything range does contain the empty range. And the everything range for integers does not contain null. The answer is no welcome to the Boolean black hole overlaps pretty straightforward with one catch. It means do they have a common value. So as you can see 20 up to 30 overlapping with one to 100 that's true. The everything set overlapping with empty. Well that doesn't work. We just saw that well so I tried again is empty is the range empty in the everything range. Yes it is this empty overlap with the everything range. No it doesn't because there's no values in the empty range. They have no value in common therefore they don't overlap. You have the operators for plus union whatever you want to call it. And there's also the range merge function. This takes two ranges and combines them together. It only combines them together if there is not a gap. So in the second example you can see that I created or I attempted to create a range that would have had a gap and you get an error. New in nine five there is a function that will cobble those together into one large superset range. But if you're on nine for earlier you can go over to PG X N and install range type functions. Nice little extension written by yours truly. And you can have that capability on earlier versions. The intersection operator is what values do these two things have in common. In the case where the in four range they do not in these two do not overlap. They both get up to four but neither one includes it. You can see that their resulting range is empty. And in the case of one to four inclusive and four to one hundred also inclusive on the lower end exclusive on the upper end. Got to remember that default. They have the value of four in common. That is if you'll remember very strangely represented as four and up to five. The difference does what you would expect from a the except set difference function. The catch here is that you cannot create disjoint ranges. If you try to subtract out a range that is in the middle of the larger range you get an error. And that's something that I'll be addressing later. What I wanted was that ability to do a range split. If you have partitions and you want to pull out a specific value and make a partition just for that one value or for that very narrow range. You would like to have a partition on the left and a partition on the right. Maybe if they if unless you had picked a value that happened to be already on the edge of an existing range. So in this hypothetical case every time you see the hypothetical prompt that means that this was not actually run. It's just something that I dreamed up and thought would be nice. In this case I would like to see a function where if you tried to split the range one to one hundred inclusive. Take that range and it split out of it two to four you would get the three ranges one up to two two up to five and five to one hundred inclusive. Another missing operator couldn't there's not a lot of characters left so this is the best I could do. But it thought it would be nice to have something where you can test whether two different ranges share a bound. And in this case the side of the equal determines which bound you're checking on the left side for lower on the right side for upper. And they should be pretty simple for you guys to read over I don't think I have to walk through them quite as much right along. I would like it if the strictly to the left of strictly to the right of operators allowed you to compare individual elements against a range. This is very important if you are say writing an insert trigger that is going to be used for range partitioning. And you need to look at a specific value and figure out which section of the ranges you need to be looking at. As it is I end up doing containment on a middle range then trying left then trying right. It's not the most efficient but generally speaking in large bulk operations you're testing for which partition you're going to end up in first and then directly inserting into that. You can simulate this functionality by casting the scalar value to a singleton range in this case one one inclusive on both sides. And you can get that functionality it's just you can extend Postgres to make things elegant and right now this isn't elegant. So that's why I'm complaining to y'all about it. Another one I would like is essentially a string comparison comparing an element to a range where if that value falls within the range it returns zero. If it goes strictly to the left give me a negative one if it falls to the right give me a positive one basically mimicking string compare. And in this case you can see that the integer one would have fallen within the range one to ten inclusive. The integer one would fall to the left of one above one and up to four inclusive. Four would have fallen into one to four and four would have fallen to the right of something that was up to four but not inclusive. This I implemented as the element range comp function also in range type functions and there will be a URL for that at the end. I would like a test to see if a range happens to contain exactly one value. That would be useful for knowing if there was any point in trying to further subdivide a partition that was growing too large. If it only contains one particular value you're not going to find any luck there. And as you can see four to five not inclusive is a singleton because only the value four is in there but four to five inclusive is not because there is four and there is five. Another thing that will surprise you if you're writing a range partitioning extension is that you can make check constraints on values being contained in a range and that check constraint will work. It however will not work for constraint exclusion when you're doing a query. After some digging I've found out that the check clauses that are considered when doing constraint exclusion on selects and updates and things like that only look for equal to greater than less than and between. I think there may be a couple of others regardless containment is not there so I had to back manufacture the where clauses that would be used to create the check constraints. And it occurred to me that this would also be useful if I was some of the data that we're pulling in is actually derivative of data that is on a machine that doesn't have this extension in fact it's not even Postgres it's a redshift cluster. And they're not going to understand ranges anytime soon so I was going to have to be able to generate that sequel dynamically for making the remote call. So in this case these functions get lower bound condition expression upper bound condition expression and the full bounds expression came in handy. They all have they default to a variable of X but you can specify one as you see in the third example for a way to if you know what your column name is going to be on the other side. I doubt most people are partitioning by X. Okay so our use case is a type ahead search which is very much like what you see when you're doing Google searches as you add more characters typed it refines the search. And slowly you get down to a list of things that you know are the thing you actually want and you're able to arrow down and select that one. So what we needed was the ability to make very fast.