 Hello, I'm Matthias Gehrings, I'm currently working for Goldman, where my team is doing interesting things with Erlang. I've used a number of other languages before as well. I will be talking to you about what I call range maps. It's basically data structure, so we can talk about, like it's a data structure that maps from keys to value, but your keys can be in a range. So that's sort of what you can do for, where it crops up in practice, or it crops up in practice most often for me so far is basically handling dates, which we do in finance all the time. So for example, you want to say you take the holiday calendar from US and the holiday calendar from Singapore, and you want to have the union of both things like that. And those are basically ranges that something is in or out. That one above here is sort of the skyline problem, which you might have heard of, basically given a, think like sort of the skyscrapers in Singapore. You want to figure out what's the outline of the skyline, because you want to take the maximum of each, at each point. And basically instead of maximum, you can do other things. For example, for the calendars, you might want to do the union or intersection, or other things you can do, yeah. So agenda a bit. So I'll show you how you can solve this problem with a very simple linear data structure. Basically, not much more complicated than the linked list. Then we talk about what if you want random access? Basically we use binary search trees for that. Next one is we generalize it a bit for different kinds of intervals. And after that, I'll talk a bit about testing. Okay, so there we go. Okay, I used Haskell syntax for most of the stuff just so I can fit it on a slide. Sometimes I have some Alexia syntax as well, so you can see. The concepts are very similar. Okay, so my very simple data structure is basically, I say my interval on my range map is basically one of two cases. It's a constant, for example, constant two, which is all the same. Or it's a jump which would look like that one over there. And then we can make stuff like that. And the jump basically has first a value on the left. It has a jump point, which we call k here. And then it has the rest, which basically can be another constant or can be more jumps. So what you see here is basically just a constant. This one here you see it starts at one. Then we jump at k, which is like something arbitrary, arbitrary key. The next value is at three. That's why we go up to three here. Then at l, we have another jump point and we go down to zero. I hope my skr is up to the task. This is how you can write those terms. In Alexia, you could write it down like that and then you can pattern match over those. Anybody understand that? Okay, nobody has a answer, nobody understands it? Does anybody understand this way to write the data structure, roughly? Okay, this is just so that I don't have other numbers. So I think it's a number, I just didn't write down the variable. I think it's a number. I just didn't want because I have numbers here, I just used it as a symbolic there. K and L are numbers. So it could be dates, there could be times, dates. This one could actually be colors, whatever. Those things have to be orderable. So you can say, is it between here when you look it up? But those things on the y-axis don't have to be orderable. They can be colors, they can be companies, they can be departments. They can be types of either it's a public holiday or it's a weekend or it's just a work day. Yeah, okay, so here's some examples how we could construct it. For example, if you want to construct a rectangle. We have basically two jump points. One is the left part of the rectangle, the right part of the rectangle. To the left of the rectangle, we have sort of the value that's outside of the rectangle, which then we have the, so what's in the rectangle, the content on the right, we have the outside again. This is how it could look like, I think so as well. For Haskell syntax, you have to sort of keep in mind that you just write, if you have a function application, instead of writing f of two is three. You just write f, two, three, but it should be relatively easy. What you see sort of is here is that the whole data structure is just, it's just like a linked list, very simple. If you look at the top, that's the destruction of the data structure. This const is basically like the end of a linked list. And the jump is basically one more step in the linked list. Now to give it a bit of a meet, here's basically how, if you had one of those data structures, how it will look up in the data structure. Like we have a query value and we want to know what's the value for that query value. Basically, if you have multiple cases, if you have a constant, we don't care actually what the query value is, we just return the constant. If you have a jump point, we compare the jump point with our query. If it's to the left, we give what was here to the left. Otherwise, we recurse. The Haskell code is exactly the same. This is just so that you can see, you can use it as a stone of Rosetta stone. So we just walk it, it's basically just a linear search through the data structure. The thing is sort of a bit of a restriction, what you can do here. You can only have intervals that include, sorry, that exclude the left side, but include here. So basically, if you are, this one was like say, at this point, is like say three. Then this point, if you look up, if your query is a value at three, you get the new value, not the old value. There's no way you can only include three. Like you can query, basically your query for is smaller than the jump point, okay? You cannot query for something that's smaller or equal. A query for, for example, I want to know on this second. Yeah, is it high or the type what you're having there? So for example, you query, given this second, is this the second in time, is it a public holiday or not? Or how high is the building at this point? Yeah, but the sort of the buildings we can write down are basically, they include the left most bit, but they don't include the right most bit. We will later on fix that, but right now we can only do it like that. Because when we query, we always jump already if it's the same. If it's the same, like only if it's smaller, do we stay left, it's the same, we jump already. Here's sort of an example, I leave the code up there. So if you have, if this is our code and we, for example, say okay, in this case I left it K, Q and L, just a symbolic. But just in Z, R, K is smaller than Q is smaller than L. If we go through here, we would go to hit this case first. We figure out our query, which is Q is basically not smaller than K. Which is basically this one, it's not true, so we go here. We recurse, we recurse in the source bits. And we will check, okay, we go to this case again, because we have a jump. We figure out that, okay, now our Q is smaller than L. That means we get this one, so we return three. So Q would give us three, roughly. The Alexa code would look exactly like would go through exactly the same. Okay, so now it's quite a lot of code. This is basically how you can use data structures to sort of solve the skyline problem. Well, you see what we are doing is, before I look at the code, what we are basically doing is, we are making, we are breaking up the skyline into a bunch of skylines, lots and lots of skylines, each with only single building, and then we merge them pairwise. So this one will be merged into one that has this one, those two. And if we keep merging them pairwise, eventually we have merged them all into one big skyline. It's the same, actually, the algorithm is the same as merge sort, if any of you remember merge sort. Okay, so this is basically the merging. Okay, this is basically some operation we're doing, we think maximum for this one, it's up here. This is a list of step functions we have. I call it step function, it could be also be a range map. But it's also, if you draw it, it's like a step function, that's why. And in the end, you get one step function out, and I made a mistake here, the type's not quite. But anyway, what we're doing is, when we're merging, this is a list. Yeah, I forgot, it should be default here as well. Okay, anyway, we're checking, if you only have a single element left, if you have no element left, we just give the default that we're giving. It should be a default. If you have only single element left, we merge it with the default one. Which is basically all zeros in this case. Because it's the skyline that has all zeros, it's the default one. If you have more than one left, we pair them up, and then recurse. We pair them up with this one, basically we check. If we have at least two elements in the list left, this should be merge two, which we have over there. We merge them and merge on two. So basically, we put pairs of JSON elements together. This one is basically the thing that looks exactly like a merge and merge sort. So if you have two constants left, we just do the operations and think maximum. If you have a jump and a constant left, we do the operation as well. And then we recurse on the rest of the jump. Same for the other one. The interesting bit is if you have two jump points left, then we apply the operation on both of the left ones. We know that the new jump point, maybe the smaller of the two jump points. So if you look here, the first jump point here is this jump point, which is the smaller of the two jump points, those. And then we need to go, for the rest, we need to compare. And we either need to advance the left or the right of those two step functions. So in this case, okay, since it was this one, we need to actually advance this one here, so that this one is now the next one. If they are both the same, like less than equal greater, if they are both the same, we advance both of them. If the A one is less, this is A, this is B, this is merged. If the A one is less, we advance the A. If the same, we advance both. If the B one is to keep A as it was at the beginning, and that one's B. And then we keep going until we hit any of those things. It's literally straightforward recursion there. You could write it as a loop, but also Alexa would look very similar, because we don't have loops. So exactly the same pattern matching, okay. To make it actually work, we need a bit of stuff around. So basically, suppose we have given our this guide in as a bunch of ins, there's left, right, left, hide, right. And we want to give left, hide, right again, but no overlap. We need to first convert with a map, let's like a lambda map, convert this into this rectangle functions into buildings that we have before, then we merge them all together. And this one then, this one is sort of, I call it convert. It goes back from our representation that we had with the step functions, like linear representation, goes to a list of tuples again. This is just, it's actually, don't worry if you don't understand this one. This is just basically some glue to actually solve it. If you wrote this one, and that one in an interview, job interview you're done. You can actually link this. Okay, and so you can solve the problem with this one. And instead of, of course, so this one uses a max here. You could also use a union if you're interested in merging calendars and things like that, which we often finance. Okay, so now the question is, we have this nice thing, the problem is, okay, we can merge two things in linear time to do arbitrary operations on them. But how do we access it? Our access function so far also runs through the linear time. So it's a pretty slow access. Of course, what you can do is basically instead of having stuff like a linear list, we search through for an element. We basically use a binary search tree, pick any. Doesn't matter too much, say red, black trees. And then we look that up. What you need for looking up is, we see what you see here is, you're looking actually for the jump point. That is the biggest jump point that's still small, and that's still, sorry, we're looking for the smallest jump point that's bigger than our query. If our query is bigger, we keep going, and then we look for the smallest that's bigger. And this one, we'd also need to do that. So basically, the lookup function we need on our binary search tree is basically one that looks for the biggest element that's still smaller or the other way around. And like Haskell, for example, has one of those, they can look it up. Let's just look up less than. This basically looks for the largest key that's smaller than the given one. There are a bunch of similar ones to those. The standard Erlang and Alexa data structures I found, even when they're using search trees inside, don't offer you that. So you can either write your own, or you can probably contribute a patch to the upstream. I had to write my own because what we are using it in practice for is we're actually looking up. We're trying to do cloud scheduling, where we have a bunch of jobs, like programs you wanna run, and they have specific requirements. And we have a bunch of machines, like hosts that we know, that have memory disk and CPU available. And we are trying to look, like in a range, basically, we say, hey, give me any of those machines that have enough capacity to run this job. So where all the three capacity is at least bigger or equal than what we're having. That's what we are doing in practice. And that's right, our example here is only one dimensional. But the example we are using is like five dimensional or something what we're using in practice. And here basically then, if you would look up, for example, okay, what's at, this one only has the keys, like the jump points. The values are implied, yeah, so that's. But basically if you're looking up say, what's the value at nine? You would go here, go to the right, line go to the right, okay, it's not ten, you see eight is the smallest, it's bigger than nine. Or if you wanna look up what's 12, and 12 is a bit more interesting. 12, we have 12, go to the right, go to the right, you go to the left, because 12 is smaller, go to the left further, and then 13 is the smallest, it's bigger. You can do it. You still have to try the tree before then. Yeah, yeah, but the tree is then after you have the tree, you can then look at. Typical, yeah, it's the tree. Yeah, yeah, yeah, typical search tree. You can use basically any search tree. You can also use other structures. You cannot really use a hash map, for example, because in a hash map, you don't have access to the next or previous element. But in those, you can do it. So it's relatively simple, relatively. Actually, basically sort of when I figured out that when I did this one in Haskell a while ago, for like calendars and stuff, first I wrote my own and then I realized that the built-in one has this one. So I just threw out everything I just wrote and just used to build and look up. Yeah, okay, so right now, what I told you, we have those half open intervals. So in a sense, the intervals always include the left bit, but never include the right bit. How do we get over that one? And there's a trick, sort of the trick is, some mathematicians have also used the trick, they call it non-standard algebra, but you don't have to worry about this one. Basically, we double everything by changing everything instead of having a k here. We have either a k plus zero or a k plus some epsilon. And what it does is basically how we represent this one is as a tuple. And basically we have k comma, say false because false is one of the true or k comma true. What would also have worked is k comma zero and k comma one. What you get out of this one is when you query now for the query comma false or query comma zero against this one, you get the previous behavior. If the query against where the jump point is true, we get a behavior where query comma false is completely smaller than k comma true, even if they're the same. So this is a way to basically transform, we still keep only a smaller equal here. But we add a bit more data, we basically blow up our space a bit so that we can emulate that we're having the other comparison operators as well. Yeah, so let me get the lower point, yeah. We basically blow everything up by two. So basically every point is now doubled. Now basically once an infinitesimal to the right and once where it was originally. And now basically if you query for the original point, which is basically sort of as a false, we stay at below. Yeah, this one stays below. And then you can build all of them, have another example for that. For example, this one, okay, this is C my ask out here. Okay, we start at zero, we go up to three, but only for a single point. Then we go to one, and then we go up to one on both the left and right and go down. Okay, so we start at zero, it's here. Now this one at k, we go up. But it's immediately, same k, we jump down. But this one is k false and this one is k true. So basically this one is very small, it's between the k plus zero and k plus epsilon, this thing is that jump. Now this one here is we start at all false, which basically puts it into the left. But we go up to m true, which basically includes it on the right side, and then the rest is just zero. This is an example how you can use it. So basically we add a bit of extra data, so that our algorithm stays simple. And we don't need to change any of the things because it doesn't actually know that it has numbers, also in Alexa it wouldn't also know that it has numbers because you're only comparing. And the comparison of tuples in both Alexa and here in Haskell is doing the right thing, it first compares the first elements, and then the same the second elements here, so that works. And then you don't need to change any of the algorithms, that just works. Yeah, when I first basically wrote my stuff down, I was putting in all the extra cases for all the extra comparisons, it was a mess. And then later on I discovered that you can put in the data instead of the algorithm. Okay, so how do you test this thing? In general, you can write your examples, but what you can also test is that what should be the same is if you look up first, like you have a bunch of your range maps. You do first the look up at a specific value, and then you combine those values in whatever you want to. Or you do the combining directly on the range maps, like with the merge function, and then look up afterwards. That should give you the same value. So if you only have like two, so basically we look up first. It's the same query point, and we do our operation for any operation that should give us the same value as if we merge them first, and then look up. And the same if you have a bunch of operations, then looking up first, and then doing all the operations on the list that we're looking up. Or doing the merge first with the operation, and then doing the look up at that point should be the same. For example, if you think back to the Skyline problem, is that for any particular point that we choose, it's relatively easy to figure out what is the height at that point. We just take all the rectangles and see if they cross that point or not. If they don't cross that point, we cross them out, and otherwise we take some maximum all the rectangles remaining. And then we have the value at that point. And then we can do the same, we can properly solve it and then just look up at that point. And this one will work for also if you have a three-year representation instead of a list representation. Okay, and that's for the slides I've prepared. I think you probably have questions because I skipped over, like was a bit quick on some stuff. Operations, you mentioned that. You can do all operations, other operations. So for example, it depends on what you're storing in your thing. So it depends on, so basically here we're doing max, you can do union. You can put it all in set and so on. There's no restriction on the values you can store. No, no, union, I would need to draw a slightly different picture. Yeah, yeah. So for example, our picture is, makes more sense, say, for calendar. And you wanna know, like on your calendar you have things like, I don't know, what you know about the day. And you could say, for example, I don't know. Supposed to days are color-coded. And you know you have, okay, you have like red days. So basically, red goes up to here. You have blue days, those and those and those are blue days. And green days are those and those and those are green days. And you could basically, if you put them in a sort of make a singleton set here. If you take a union, then you would have, okay, here's only red. Then you have a short streak with red, green. Then you have a streak with red only. You have a streak with red and green. Then here would be red and blue. Only green, maybe nothing for this one. So empty list, empty thing and so on. You can do that. And for example, what you can also do is, what you sort of saw in the lookup. We're walking that thing directly. You can, for example, what you might be interested in is, suppose if you have a skyline, you might be interested in calculating the whole skyline. And then sort of integrating that, sum it up to figure out how much there's in. Then you need to work the data structure. But you can also expose it as an API for that one. What else is interesting? Yeah, what you sort of, if you come from a Haskell background, what is always interesting is, what if you have two of those data structures nested into each other? So basically what is the value you see having here is range maps again. That's crazy stuff. And then you can, for example, flatten that one into a single range map. And see how that works. And the thing is what you want. So I can tell you exactly what you want. Then you can think how to implement it, what you want is. So that if you look, if you first look it up, and then look up the same value in the range map again, in the resulting range map, should have the same as if you flatten it first and then look up a single time. That's sort of the thing you want canonically out of those kind of things. That's basically the monad instance if you're into Haskell. That's sort of useful. For example, what you can also do is, if you have long structures, you can look and, hey, can I do this parallel? So the linear case, like with the linear list, you can't really parallelize, for example, the merging. But if you're having some kind of tree structure, you couldn't look into merging those trees in parallel, like doing some concurrency. Or the opposite, or not obviously, but another thing that I needed recently is taking this thing and going more dimensions. Sort of like two dimensions having, like one dimension you can look up. You basically have two dimensions where you have specific values for rectangles in here. And you want to officially look up and you also want to combine and you want to move. And so what I said earlier about having available, I need to look up, hey, can I find a host that has memory in this direction, free memory, and this is disk. And here basically there are sort of hosts living there, like computers. And in order to basically this host, this computer that's living here, can run any job in this particular area. And this computer here can run any job out of those, this area. And then you want to find one that can run quickly and maybe you want to do some other things like that. So we are using Erlang and basically we're using proper and Erlang. A proper, yeah, proper is called for property-based testing. Yeah. I understand the best implementation of property-based testing. I heard that quick check is better, but I haven't actually used it yet. Because like John Hughes is the person you were talking about. Yeah, but that's okay, my insight man says that actually, so if you wanted to property-based testing, you should try Python's hyperthesis, yeah, that's actually sort of one of the best I've seen. So actually, quick check in Haskell is not as good as the Python's hypothesis. But I'm not sure about the quick check in Erlang, I haven't seen it. Proper has a bit of rough edges. It's sort of usable, but it's not as nice as hyperthesis. But Haskell now also has a headshot. Yeah, yeah, and also some of hyperthesis is from a library, from Haskell library called Jack, like Jack in the Box. So the people are doing back and forth and so on, so it's good, yeah. Okay cool. Thanks. Thanks. Thanks. Thanks. Thank you. Thanks. Thanks. Thanks.