 My name is Leo Su. Regina is my spouse and partner. She is on the steering committee for postures and she will be doing nine of the ten items that we have for you today. Because we may have a lot to cover, we ask that you save your questions till the very end. So, oh, and this introduction slide has a link to buy our book. So we have written a very thick book on postures and if you go to the link, you can buy either an electronic copy or a paper copy. So we're going to cover problems that you'll run into even if you're not a GIS person or a spatial person. The first common problem that people want to answer now that we have all these GPS devices like your phone and your browser and your car is if you're in a location, what are the end closest things to you? So the query that you'd use to answer that looks something like this. And the key part of it is this operator called a distance operator, a KNN distance operator, and it will return the distance between any two geometries or any two geographies. And geographies is basically just a round earth model versus a flat surface model. So in this example, I want to find the five closest Indian restaurants to this location and the result will give you the names of the places which some of them are fast food and the distance. Since I'm using the geography distance operator, it's going to give me distance in meters and order them by the distance from here. The next common problem that people want to answer is if you're in a location, what are the things within a certain distance from here? Like for example, you want to find everything that's within a thousand meters from here and what you want to find could be restaurants, it could be a road, it could be anything. So in that case, you'd use this function called the STD within function. And STD within works for both the geography type and the geometry type. The key thing you have to remember with that is if you're using it with geography, your geography will be in longitude latitude and your units will be expressed in meters. If you're using it for geometry, then it's whatever the units are of your geometry. So in this case, we're asking what are the things that are within, what are the restaurants that are within 1,000 meters from here? And in that case, there's only one. And the third common problem that people have, and this is something that people do mostly for statistical analysis, is if you're doing political, you know, political canvassing, you want to find out, you have this database of people, which political regions are they in so that you could go and encourage them, get out the vote. And you might also want to aggregate people by where they're located. So in this example, for each borough, I took data from the New York City Open Data, which is the 2013 campaign contributions. And where the people were located, I did a join with this postage function called ST Covers to see for those people, like which neighborhood or, I mean, which borough are each of them in. And then I aggregate the total amount that they've expended and the total number of people. So I get this nice data set that says from Manhattan, they're 4,872 campaign contributors and they've contributed a lot of money. So that's the third common thing that people like to do with postage. The fourth, postage is used often for map tile generation, and that has become even more popular with OpenStreetMap data, which is a crowdsourced data source that anybody can download and use. So OpenStreetMap has more data loaders for postage than any other spatial database that you can think of. So you have your option of loading data with OSM to PGS to all in Possum, GDAL, and I think there are a whole bunch of others. And once you have the data loaded, you can style it using styling tools and then spit it out as raster tiles that you can then use as overlay. So here's a common example, which is generated with TileMill, which is an open source tool that people commonly use in conjunction. And you can define how you want to style the different features using a kind of CSS that's, I think it's called CardoCSS. And you can generate tiles that look as beautiful as you have the capability to make things beautiful. And then use these as overlay on your maps as a main layer. So once you have tiles for data that changes frequently, you generally don't want to make that a static tile. So what you'll do is you'll, someone will click on a point on the map and they'll want to see what are all the features in your database that, so if you've got like trees, for example, and they want to see the detail about the trees, they would click on a region on the map and you'd show them pop-up that shows the different features and you might highlight the different trees where they're located. And so for that, you'd use this function called STSGO JSON, which doesn't show up very highlighted, which is a post-dispunction. And it'll output the geometry part in JSON format. And the rest of it, you can use the built-in JSON functions in PostGrace to output the attribute data as JSON so that you get a single feature collection, what's called a JSON feature collection that can be consumed by tools like OpenLayers and Leaflet and just overlay it on your map. The sixth common thing, which is my favorite, is 3D visualization. In PostGIS 2.0, PostGIS came up with new 3D types, which are called polyhedral surfaces and triangular, irregular, irregular network. So TINs is basically a way of representing geometries as small little triangles, so it's really good for landscape analysis. And polyhedral surfaces is more designed for showing buildings and showing the different features that are not flat. And what came with that is a function called STSX3D, which allows you to use the W3C standard X3D to render in your browser and other X3D tools. So a sample query that you can overlay on an X3D scene would look something like this and it would convert your geometry to X3D format. And the output would look something like this if you texturized it with X3D textures. The seventh most common thing that people do is geocoding, and that's probably something that has become even more of a need now that Google charges more money for geocoding. So if everyone wants to have their database be able to geocode data. So with Postjust, since 2.0, there's been an extension called Postjust Tiger Geocoder, which uses the Tiger Census data to geocode. And part of the extension generates scripts in Linux shell format and also Windows format that downloads the data from the census for you and loads it into the database. And once you have that, then you can geocode your data. And we're working on a friend of ours, Steve Woodbridge. We're working on a newer geocoder and address standardizer that's going to be useful for loading other data sources besides Tiger and also be used for world data. So that's still in the works and that probably you won't see until 2.4, possibly 2.3, but probably not before 2.4. So the first step with geocoding is you have an address and it's usually like either a two-liner that has the street number and street or it's a two-liner. Which is street number, street, you know, and then city-state zip as the second line or just as a single. So the standardized address is another extension that's packaged with Postjust. And it can take data either as a two-liner or a one-liner. So in this example, I have an address which happens to be the high school that I went to in Brooklyn. And it will parse it out into its elements. For this, I'm using another extension which is called h-store and that basically allows you to flip the columns of a data source into rows. So normally I would get a single row that has city, name, state, suffix type, postal code, but instead I get it as rows. I'm just doing this for display purposes so you can see the different portions of the address. So this will break it up into the separate parts, the city, the name of the street, the state, the suffix. And what it uses to do that are three tables, which there's one for the U.S. This package is a separate extension which has these three tables in it. And you can modify it for your own needs if you don't like the way it standardizes the addresses. So the default one, for example, prefers to spell things out. So PL becomes fully spelled out place and the state becomes fully spelled out New York. The post just, Tiger Geocoder also has tables that work with the standardizer address, but it goes by, it uses abbreviations, which is the way the tiger data comes. The data abbreviates things like the suffix type is PL and state is abbreviated to New York. So the reason why the tiger data standardizes it that way is because you want whatever standardizer you're using to look like your reference data to use the same kind of abbreviation and format. So once you have your data standardized, then you can geocode it. So the post just Tiger Geocoder comes with a function called Geocode. Internally, it's going to use either the address standardizer or its own built-in older standardizer to standardize what you feed it and then use that to use the tiger data street index center lines to come up with the location of an address. So here, if I give it 29-fourth green place, it will return a record that is composed of several things, which we're just outputting. You'll have a geometry which will be in longitude latitude, which you can break up with these two functions. X will give you the longitude, Y will give you the latitude, and the rating will tell you how good of a match it is. A zero is a perfect match. Anything higher, if you get into the past 20 or so, then it's probably not a great match. Like past 40, I probably wouldn't use it. I'd be questioning you because then that means you probably only were able to geocode to the city or the state worst case. Or the address it guessed at, it's completely wrong. Like the address just doesn't exist. So another piece that's packaged with the post just tiger geocoder is a reverse geocoder. And this is especially useful if, let's say you're driving and for wherever you are, whatever your GPS says, you want a descriptive meaning of where that is. So if your GPS reads you that you're at negative 73, 97, 6, 8 longitude and 46, 8, 9 latitude, you want to know, okay, where am I? So this will give you an address representation of it. And it will also give you the cross streets of where that street is. So Fort Queen is crossed by D. Calvin Fulton. And the next common thing that people do is analyze data with raster. For a raster, you probably have to be fairly into GIS to appreciate this. So basically a raster you can think of as nothing but representation of a matrix. And so each matrix will have a number of columns and rows. So it's really an n-dimensional matrix because for each dimension, which they call a band, you have a separate matrix for it. So for example, if you're dealing with aerial data, you'd have a band for the red, blue, green, alpha. And then on top of that, raster data that's used in GIS tends to be what's called geo-referenced. And that means that for the upper corner, it's positioned to some place in space. Like, you know, it's positioned to a particular longitude latitude. And then each cell has a width, has a corresponding width with reality. So each cell is a pixel, but then that pixel takes on a meaning such as like it's one inch in width or it's one meter. And so raster data is often generated by machines, you know, cameras and chemical testing tools. And what's interesting about the raster implementation in POSTIS is that you can convert a raster to a vector very easily. And you do that, there's a geometry that's kind of built into the raster, which is the shell of it, which is the convex hull. So it's basically, it's usually a rectangle, but it could be turned if your raster was what's called rotated. It's not like flat with the x-coordinate. And you can also get the envelope, which is really the bounding box of the raster. And if you select pixels, like what you consider as turned on, then you get a polygon. And then there's, there's variants of functions you could use to vectorize if you only want to vectorize. Let's say you're doing chemical analysis and you only want to vectorize that portion that, you know, the chemical rating is greater than 100. You can choose to just turn those pixels into a vector. And it's basically just treating each pixel as a square area and then gluing them together to create a vector. So a common thing that people do with rasters is do things like digital elevation. So given a position, a position in space, what's the elevation that we're at? And you can also use this elevation data. If you intersect raster, a raster with, let's say, a road network, you can figure out the elevation in each part of the road and create a 3D, well, not quite 3D, but you can create a line string that has a Z element of it, which represents the altitude at each position in the line string. But Postjust also has other uses beyond GIS and beyond spatial analysis. And my favorite is doing things with pictures. So if you think about a picture, even if it's not geo-referenced, it's still a raster. It's still a set of pixels. It's got bands, you know, the RGB, the alpha channel bands. And you can use all the functions that Postjust provides for raster with those. But you usually have to enable them. So some people are very, I don't want to say dogmatic about it. They like to keep pictures and things outside of the database. They don't want pictures inside the database. So if you're one of those kind of people, you'd want to enable this setting which says enable OutDB rasters. And that allows you to keep rasters outside of the database, but still be able to reference them in your SQL queries. And there's another setting which is called enable drivers. And the reason for that is that some drivers we determined are not safe. Like if you have a driver that goes out to the web, you might not really want people to use that in your database. So since about like 2.06 of Postjust, we require that you explicitly specify the drivers that you're going to allow. And if you want to allow all drivers, there's an option that says enable all, which will enable all the drivers that you can possibly support. So if you wanted to load pictures in your database but not actually store them in the database, there are two ways you can do this. You can do it with a tool called raster2pgsql, which comes with Postjust. And the key element with that is this dash r switch, which means don't load the rasters in the database, just register them. And the key thing you have to remember when you're doing this is you can't then later move the pictures because then the data that you loaded has no meaning and you can't really do anything with it. You'll get errors that, oh, the picture's not there. The other option you could do is you could load the reference to the pictures with just SQL using this Postjust raster function called staddband. And so the basic idea behind addband is you start off with a null raster, and then you specify the location of a picture or another raster that you want to add to this null raster. And the reason why you would do this is sometimes you might have a lot of rasters that are separate files, and so you might want to make them, if they're aligned the same, you might want to have, for example, a raster that measures the digital elevation and a raster that measures some sort of concentration in the same record, but as separate bands. So this example just loads a whole bunch of pictures into the database. And once you have the pictures referenced in the database, you could do things like take the width of the picture, take the height. So these width and heights are all in pixel size. So this picture is 192 pixels in width. And you can also resize the pictures. So even though the pictures are outside of the database, you could still use this function called stresize, which will, for this particular case, I have a generate series so that I can get a number between one and four. So when I multiply by 0.25, it'll give me 0.25, 0.5, 0.75, and 1. So that will end up with, for each picture, I get four pictures. I get one that's 25% of the original size, one that's 50% of the original size, one that's 75% and one that's 100%. And so resize would, of course, look like this. So you can resize your whole table and just one SQL query. Okay, and raster, just like geometry, has the ability to stitch things together. And the function that's used is called stunion, which also has the same name when you're stitching geometries together. And this functions, I'm combining a whole bunch of functions here so that I can glue pictures together in very odd and crazy ways. So if I took all the pictures that I just loaded and I combine them together, I end up with this thing. And the reason why things are just kind of stacked on top of each other is because when you load in just a regular picture, since it has no geo-referencing, the upper left is always at zero. So everything starts at the same spot. And then so when you combine them together, you end up with the pixel that has the max value. So you end up with a ghostly-like picture. And of course, this is just a fun thing to do that you can apply. There are numerous functions and postures for dealing with digital elevation models. So you could take the slope of a model. You could take the aspect. You could take the hillside. And if you use it on a regular picture, you end up with something that looks like an engraving, like a stone engraving. And finally, you can also use postures for dealing with time. And I'm going to switch this over to Leo. So we found this innovative use for postures. As you know, back in 9.3 or 4 of Postgres, a range data type was introduced. So a range data type is basically you have a start and the end. And the underlying type could be integers, for example, starting from 1 going to 3. It can be, in our case, dates or timestamps. So something that starts at January 1, 2016. Going through 4th of July, 2016 would be considered a range. So it came out and we were excited. We were able to use the range type immediately. When you think about all databases, the data that you have stored in the tables are current data. It's a snapshot in time always. Invariably, this data will change. And you often want to record a history of the change. So you would have data where, for example, person A was measured at, say, a baby growing. Was at like 12 inches from the point of birth to a week later. So that's actually a time range from birth time zero to, say, a week from the birth date. And you could record at 12. And then they grow bigger, 13 inches by the next week and so on. So you really get a series of data that is periodic or it becomes a temporal database. So with the range type, we can at least represent the time intervals. However, a big shortcoming that we found was that there were no aggregate functions that came with date range. Let me give you another example. Let's say you have a bridge, say the George Washington Bridge, and you metered every car as they enter the bridge. Okay, so at, you know, 12.08 and five seconds after the hour, the car one entered. And then you also know the time when it left the bridge, let's say 12.15.06. And let's say you have all these measurements stacked up. And I want to ask the question, well, give me a profile in time of how many cars are on the bridge. There's no way to do that without an aggregation function of the date of the date ranges. Let me give you another example. Let's say you have a stock market index and from a given point in time to another given point in time, let's say you have 10 stocks inside this portfolio that you're tracking. And from one moment in time to this other moment in time, the price was such and such for stock A. And then for stock B, it was this. For stock C, it was this. And then these keep changing and changing and changing. And so how do you quickly figure out at any given time what your portfolio is worth? Because they're taking a lot of date ranges and trying to squish them together via aggregation. That is missing with the range implementation. But we have post just. So now post just deals with line strings already and can do all sorts of things with line strings, combine them, simplify them. So we created a helper function that will take two helper functions. Let me start with this one. So we'll take a date range and we'll convert it to a line stream. So we can think of time. We just start with some arbitrary point back in time. You can set your own standard. The idea is to take periods in time and convert them to a line string that is completely flat. So if you think of a time going just on a single string, okay, here is the stock price. From this segment to this segment, stock price five. And then here, six, seven, eight, so on. So we flatten them out into a line string. So now they are a geometry. So we have lots of tools to deal with them in post just. And of course you need to have a corresponding function to get a back to date range once you're all done. Because people don't know that you're using post just in the back. Okay, so here is a first example. So in here we have three date ranges that are overlapping. So they overlap three ways. So they all overlap. And I first used my little function to convert them to a line string. ST union will put all the line strings head to toe, start to end, and create a multi-line string. ST line merge will actually, this will just create a line string. ST line merge will give me the multi-line string. And then there's a simplification function that will allow me to simplify line strings. So if I have a line string that goes from one to two, two to three, three to four, four to five, I could collapse that down to a single line string of one, two, five. So that's what I do. And finally ST dumps back out my line string as a single geometry line string. And then I convert that back to my date range for presentation. So with these three overlapping date ranges, I end up with just one. Okay. And so this is the case of the overlap. This one has line strings. Let me see that. It has one period that's sitting outside from the others. After the process, you can check this out for yourself. This is what it will break down my date ranges to. So what I'm trying to say here is you can pretty much throw anything you want into this. And you can throw hundreds and hundreds of disparate regions or disparate periods into here. And it will clean it up for you. It will combine things that could be combined and leave things out that has to be left out. In this case, the first period is being left out because it's just an independent period that's in front of all the others. The finally one more example is where if you have everything back to back, this is used quite often because you often get things that are back to back and you don't want to have to track all the differences. So for example, in your stock market data, if the reading is taken every minute and the price did not change, you're going to have 60 records, one for each minute, and it's all going to be the same price. So you may wish to just report that as a single period where the stock price remained constant. And this is one way to do it when things are back to back. Okay, so that's it. And this is a trick and there are variants of this. And if anybody wishes to sort of do something with this, package it into something more elegant, please. When you do, let me know. Okay, now I'll turn things back to Regina for questions. If any. So any questions people have? Yep. What? Oh yeah, we'll post it at the end. Great. Yep. How customizable is the address standardizer? It's all controlled by the three tables. So whatever you want to put in it, you can use, so there are three sets of tables. There's the guzzatier and there's the lex and then there's the rules. So the rules you can control based on the order of the words, which part of speech, so to speak, it is. And it looks at the lex and the guzzatier. So the guzzatier is places, things that are places, and lex is common words like one, two, three. How do you want that formatted? So it's pretty customizable. Are there real world applications that you would either recommend or specifically recommend against for the address standardizer? Are there applications, like, just being afraid that it would get it wrong once and if you used that you would send it to the wrong address? Are there other applications that you would say, yes, use this for this, but don't use this for that? Well, it's not like postal accuracy type thing, so it really just standardizes the address. So it's not going to tell you that this address exists or anything like that. So yeah, you wouldn't use it for that. You'd use postal system data for that. So you'd really need, like, I think for what you're thinking, you'd actually need a listing of real places. If you were to use it to say, like, don't send to there, it doesn't exist, which it won't help you with. It's just going to... But the geocoding confidence thing. Yeah, so the geocoding confidence, that's more likely you could kind of use it. Like, it'll tell you, since it's based on street center line, it's not based on actual addresses. And so if you have a street that's, you know, Main Street 235 to 268, you can tell it, I want to send to 265. That doesn't necessarily mean 265 exists, but theoretically it can. And it will return to you the theoretical location if you consider the street to be, you know, each house to be evenly spaced on it. Yep. Oh, yeah, we do. We have, there's another function for that. It's called geocode intersection. And so that takes two streets and it will return the location of the intersection. And it'll usually return, like, four or five addresses if you don't. And the way it works is if you say Adams and, you know, Smith, it'll return an address on Adams. So we turn the two for, I forget, that are on Adams, the Crosswood Street. And if you say, I mean the Crosswood Smith, if you do Smith-Adams, it'll return an address on Smith. And also the location, the, you know, the longitude latitude for that, for each of them. No, so it takes them as two separate fields. So the, so each argument, I don't know if I have. What if you don't know the city? And you don't know what city that is because, you know, your customers don't want to know that. They just want to say, yeah, or I need to get to Harvard Adams. And, you know, the city, can you produce a story like that? Yeah. So basically the way the arguments are, it's the first street, the second street, the state, the city, the zip. So the more you know the faster it goes. But as long as you know at least the state and you know the Cross Street, it'll come up with something, it might come up with a couple of answers. And the result that comes back to you, locations that is closest to the, which is pretty much the intersection. No, it's kind of designed for the structure of the Tiger data. So that's why we're working on this next generation one, which will allow you to do what you're describing where you could, I mean, we're going to have sample loaders for open street map and Tiger for those, like whatever the free data sources are we're targeting. And then you can use it for your own custom data to load it into our structure. So that would be for that. Yeah, you can use it for anything. It's just, we use that because that's what people commonly use it for. But yeah, we've used it, like I use it for, you know, keeping track of things in buildings, for example, and like finding out if this wire intersects like where I'm going to put something. So, I mean, people usually load data from CAD. I mean, right now we're just building a web application for it that manages it. So we're not using anybody specific to us kind of like our own native thing. But in those cases, you don't necessarily have to have like a spatial reference system. It's not as important as long as you know all your data is positioned within a building or something. So if you have a power plant, you could use a state plane or you could just make up your own. Yeah. And I think like AutoCAD has, I haven't used the AutoCAD importer, exporter, but a lot of people pull data from AutoCAD. Any other questions? Well, it came about after the range data type was integrated into Postgres. Then on everything else, it's already pre-existing in Postgres. Yeah. Except for the two casting functions, which will, you can see it on our slides, all the codes on the slide. So that's, and the code is very simple when you can take your own conventions. But the other functions have existed in Postgres for a really long time, like since 1.5 times. And that's just because that's the time period he cares about, but you could always make it go, you know, like to the 1800s or the 1700s. Yeah.