 Just before I start, my background is actually after I did my PhD, I was working in IT for several years before going to more sort of an academic position. So I have, you know, some experience in IT, information technology. So I'm going to be talking today more from the analyst point of view as an analyst trying to answer sort of database, trying to answer data questions. So I'm not sure if anybody here is associated, I mean, with a hospital in New York State. But if you're associated with a hospital in New York State, there's probably one thing you've been focused on in the last, you know, year is this project called DISRIP. And this is a big project. So in New York State, there's a program, every state, there's a Medicaid program. This is insurance for low-income residents. In New York State, it's a pretty large program. There's about five. I think right now it's five, a little bit over five million people, 20 million are on Medicaid in New York State. So it's a little less than a quarter of the population are in Medicaid. So this population is traditionally the very difficult population to manage. And New York State has the highest, I think they still do it, the highest per capita spend in Medicaid in the United States. And so there's been a, when Andrew Cuomo became governor, it was a push to sort of make Medicaid more efficient. And one of the things they started negotiating with CMS, which sort of manages the Medicaid program, was this waiver program, which will allow to bring money back into New York State. So if New York State can save money in Medicaid, now it's a 50, I think the last number that I saw was like a 50 billion dollar program in New York State. State pays part, counties pay part, the federal government pays part of it. But if they can save money by reducing, you know, proving health care outcomes, they will be providing money back into the system to improve health care, to reform health care. Also, if you're in the health care field, you hear a lot of talks about the move from what's called the fee-for-service model to a value-based payment model. So what that basically is in the past, and it's still, let's say, a large proportion of the ways things get paid in health care is, you know, you see a doctor, doctor provides a service, they bill a specific code, they get money for that code, right? And it's a very sort of disjunctive, disjunct system. The idea of value-based payment is that you're going to focus on the quality of the outcomes. So in the past, if you got a, for example, a hip replacement, you know, the surgeon made money, the hospital made money, and if you came back sick, or like there was a problem with the implant, the hospital would make more money, the surgeon would make more money, right? So it, you know, there wasn't really an incentive to sort of improve the quality of care, and so with value-based payment, it's going to be, you get a flat amount for a hip replacement, and if you have problems with the hip replacement, the hospital system has to eat the cost. So that gives an incentive to sort of improve quality and start using data to understand the process. The other aspect and district that's an important aspect is integrating care across multiple settings from the inpatient to the outpatient. When you go to the hospital, you know, they discharge you, you're in a acute care hospital, you're there for a period of time, it's very intensive care, you have nurses, you know, and then they discharge you from the hospital, and you know, it's kind of like you're on your own. And so the other part of this trip is for, you know, the Medicaid members that are in the hospital and discharged, we want to sort of continue their care out of the hospital and improve that, and so have sort of a better care transition. So another part of this trip is the first sort of Medicaid, changes in Medicaid were done at the state level, and now they're going to be basically in this trip, you push the changes to the local system level. So Mount Sinai is a participating provider all across the state. I think there's about 20 different sort of networks that are sort of responsible and take risk on people. So it's a, you know, so it's a very complicated program, a lot of deadlines, but potentially, this is probably going to be the largest like infusion of money into the New York State healthcare, you know, hospitals for a period of time. I mean, it's, you know, we're talking, you know, not millions, we're talking billions, right? So if we meet the metrics, you could potentially about the eight billion dollars can come back to New York State in the New York State hospitals. So I am from Stony Brook University. Stony Brook is in Suffolk County, Long Island, and we have sort of branded ourselves the Suffolk Care Collaborative, and basically we're responsible for the care of Medicaid members in Suffolk County, New York. And eventually we'll be taking sort of risk on them. So we want to be able to sort of understand the data, work with the data. So we're at Stony Brook University. We're an academic medical center. We're also a part of the state government, we're a state organization. And, you know, I've been sort of in doing data analytics at Stony Brook since about 2006, 2007. So, you know, it's like I had a lot of experience sort of working through programs. We had a contract with the New York State Department of Health for data analysis. And we've had, you know, been doing this for, you know, five, six years. During that time, I've got a lot of experience sort of, you know, what works doesn't work. So, you know, we have, for example, in Stony Brook, New York State, or through SUNY, we have licenses for a lot of the commercial software products. We could, you know, because there's been negotiated some kind of contract. Oracle, Microsoft SQL Server, ArcGIS, we have a campus license for, we have a campus license for SAS. So we, you know, we can get a lot of software at no or little cost, you know, commercial software. But then, this was particularly for us in this one, because we were sort of in a role of not just being like an academic institution, but now we're actually going out and acting as a healthcare provider. Can we use these licenses that we have an academic license for something? Can we use it in something where we're going to be, you know, potentially taking risks on people? It gets a little bit complicated. So that kind of worried me a little bit that our traditional approach of using some of the proprietary software tools would lead to some future problems. And also, we are a state organization. So we have this purchase order system, which it's real pain. It's, you have to order licenses, you put a purchase order in. It takes really forever. You put a purchase order in and, you know, the system goes down in the summer. You can't order anything during summer. It's a real, you know, so we had really a tight timeline. That was one issue. And, you know, I have experience using some of this commercial software on top line. And it's always been very complicated to install. You know, you spend a lot of time, you know, working on configuration. You have to get the correct license files. You have to worry about activating licenses. And a lot of times we're on restricted networks. So we have issues with license activations. You might have a licensed server on the other side of the campus. And we're working on the different side of the campus, the medical side, and the firewall between them so we can't talk to the licensed server. So it's a very sort of complicated process. It just adds, it makes the process more difficult. And keeping track of licenses can be a real pain. You almost really basically need almost one, if you're for a big project, you almost need one person, one full FD, just basically doing the administration for the ordering, the licenses. It just takes a long time negotiating with, you know, vendors, you know. And so in this project, the state had given us a very, like, strict deadline to meet. You know, we start, the CMS approved, in the first slide, they approved the proposal in, they got the approval in May of 2004, or April of 2014. We had to be up and running by May. And sort of our first deliverable was basically in September. And we had most of our analysis done by September. And so we had a very tight timeline. So I felt that, you know, we could have went with the traditional solutions, but I thought, let's try out, we're not going to be working with that larger volume of data, let's try this approach. And then if it doesn't work, we can always move to sort of proprietary solution. So if anybody here is from Health IT, they know about, it's sort of always a couple steps backwards, right? When you, if you have insurance and you get a claim, like you get that bill, explanation of benefits, EOB, right? The most likely your bill was processed by a mainframe system. And, you know, running COBOL. And, you know, it's, that's still running today. So the systems for New York State, the Medicaid payment system is called E-Menni. It's written in COBOL, it's maintained by CSC. It's a gigantic system, it's complicated. It runs on a mainframe. SAS is also another very popular sort of analytic tool in healthcare. It's expensive if you have to buy sort of the licenses for it. And also other aspects, why is healthcare IT so sort of, I would say code it a little bit backwards? There's issues like, you know, legal liability, right? So you're taking care of patients. Something goes wrong, right? Do you want to have people worry about legal liability? You know, and there's this rule called HIPAA, which has good intentions. And what it's, you know, does is, you know, protect one of the aspects of it's protecting data privacy, but it's used a lot of times to sort of shut down certain solutions because it doesn't meet the HIPAA criteria. And if you, I think, if you have Blue Cross Blue Shield, hospital insurance, you just, you just probably found out that I think Anthem, they got 70 million people were, I mean, so even though we have, you know, it's, you know, so it, you know, the, and the liability cost is high. So, and I think in the population health, because it's outside of the direct care of patients, you know, can be a context for proving ground for open source solutions. So last year I went to this, I mean, I went to this conference because I had done some, you know, I had, I had these concerns about some of the proprietary software, you know, as we were, as our team was getting ready to do the analytics. So I came to the Postgres, Postgres ULPG Comp last year and, you know, what I found there was that this is, you know, it's a mature database platform. I also liked that there was a lot of, seemed like a lot of active development. It wasn't a project that, you know, where, you know, there was no, you know, it was quiet. It seemed a very active project, right? And it was easy to install on common Linux distributions, even on a Windows server, you can install it. And if you know in Health IT, a lot of, there's still a big dependence on Microsoft Windows in a lot of the Health IT shops. And the one thing, one of the talks I went to last year was from a company, I think it's called Spatial IQ and they were showing how they were doing things with Postgres and I was pretty impressed with what I saw. So, and I'll go in a little bit more detail why the Spatial aspect is important in doing healthcare analytics. Also, you know, good support for some of the more recent ANSI standards for SQL, so giving you flexibility to write pretty complicated queries. So now I'm going to switch a little bit, but, so the other aspect is that, you know, in the, I wasn't given that much choice of what I could do in terms of the installation. So it's not a, I would say, highly optimized database installation. I mean, they said this is what you can have, right? This is the environment you have to work in, you know, you install it in that environment, right? So, you know, they, in that, when an RIT shop at that point, they, nobody had installed Linux. So, they had installed the first Linux installation in the Health IT department. So, they had actually, they had vendors who installed it for certain products but not themselves for an internal project. So we had it deployed with an existing hospital IT infrastructure. It's a virtualized environment. I know this is not, you know, for a database, but we're not doing, we're more on the analytics side. It's not a high-therapy database. So, you know, you could provision cores to it. I mean, this is sort of the, it's not anything like, you know, sort of groundbreaking. So I'm gonna go through and talk a little bit details on some of the, I'm gonna focus mostly on two and three, but I'm gonna talk about a little bit about why the schema functionality and the data, you know, where you can sort of have fine-grained access into certain databases, schema is important. I'm gonna talk about discharge data. Also gonna talk about spatial data and how we use PostGIS. And then I'm gonna give an example of, you know, how we can kind of tie things together in sort of a traditional business intelligence approach to get insight out of the data. So, you can, I don't know if you, well, you can see the actual name, the database of the schemas, but one of the things that, when we started the project, you know, we just want a small team, we put everything into the public schema. And we quickly found that, you know, got kind of messy. So I really liked that sort of easy to add schemas. We could start organizing our data into sort of separate categories and we could work with no separate categories. And then also some of the data that we had access to, Spark's data, they had different restrictions on it. Some data sets were public, other data sets were contained, what's called PHI, patient level information. Other data sets were limited, but they were licensed for only certain people. So we need to have control, we need to have different levels of access. I'm not gonna talk, I mean, I learned this stuff basically reading the documentation on the web, how to do it. And it wasn't that complicated sort of to set the access permissions for different users. So one, now I'm gonna maybe focus a little bit more on the analytic part and how we solve certain analytic problems. We had, let me step over here. We have, one of the things in healthcare data is what's called temporal data. We have a lot of data that's associated with time-based elements to it. And really transition of care is really important. Now hospital systems and are basically graded on things like readmission risk, which is the, which I'll show an example of this. So we have, when you're basically in the hospital, it's like in some ways like being in prison, because you can only be in one hospital at a time, right? So we know that if you're in the hospital, you're in that hospital, you're not getting out, right? You can leave, not like a prison in that way, but you can always leave. But you're basically, when you're in that, stay at the hospital, you're basically at that hospital, you can't move your associated with that. And we, you know, when we wanna compute what's called readmission is that, so what happens is if you're in the hospital, you know, getting treatment, you have nurses 24 hours, you have doctors, you have residents kind of kind of looking over you at very levels. And then, okay, Medicare or Medicaid only pays three days. Now it's time for you to go home, right? They send you home, you know, here's a stack of scripts, you know, you go to your own, you go to the pharmacy, you get these filled and, you know, good luck. And that's really kind of how the, you know, you're somebody else's problem now, you're not the hospital's problem anymore. And what they found is that a lot of people were coming back to the hospital, right? So the transition of care from the inpatient to the outpatient, there were problems, so people were coming back to the hospital. So they had this idea that, okay, if you can get past 30 days of being an outpatient, then probably when you come back to the hospital, it wasn't really related to your original discharge. And so we want to, you know, we want to look at these time intervals and try to compute, okay, is this a readmission to the hospital or not? And so we hope that our data looks in nice, clean intervals, right? And that's what you cut out of an administrative system. Even there it can get a little messy. But if you're working off electronic health record, the data's not going to be normalized nicely, you're going to have sort of intervals of data that overlap, that nest. And if you've ever ridden any as an analyst, you've ever ridden any SQL to process this, it's going to get messy really quick. You get a lot of case statements, a lot of endpoint testing, it's messy. So, you know, the good thing, and this is one of the things that why I was, you know, I thought I had a safe bet using PostgreSQL because there's a lot of support, there's a lot of different functionality that comes right out of the box. You know, you have a lot of these new, say, no SQL database offerings, you know, that don't have all the same level of like the same functionality. And so you end up having to program a lot of stuff that is already built into the database system. And so this is an example where there's something already built into the system. So there's a type called range. You can have date time ranges, date ranges. And so we can represent your stay in the hospital as a range. And so this is an interesting example that, you know, you were in the hospital, you know, for four days, you're discharged on the 18th day. You can represent that in two ways. That's sort of a closed interval. I found, I mean, this is just, I mean, maybe there's better ways, but I found that it was easier to work with sort of this, where you have a point and then you have sort of just a little bit below that point to work with. And so this is how I represented sort of your range, your stay within the hospital as a range interval. For the work I was doing, I thought, okay, they had date range operators, but I said, okay, let me actually just convert it to integers, it's easier to work with, it's easier to understand. And so I converted to what's called Julien Day, which is a way, which is used in claim processing systems for representing dates. And then we could construct a, basically we could, and this was ready, actually, I just searched Google, like Julien Day, PostgreSQL, and then the answer came up. And then I figured, then we worked out how to construct a range. And then what's the nice part is you get all these operators. So once you put your start date and your end date into a range, now then you have a whole bunch of operators which you can work with, and which you can work with. You don't have to write those complicated case statements anymore, you can just think in terms of the ranges. And you probably can't see that from here, but a whole bunch of different range operators that you can work with, pretty powerful. It saves you a lot of coding. And so what we wanted to do is, first we want to sort of normalize the data because the data we got was not clean. I used the range operators first to test. Is this, do we have intervals that overlap with each other that are adjacent to each other? And I was able to use that quickly, get an answer back that I questioned, yes we do. And so then we had to normalize the data. We had to basically construct those inpatient stays based on underlying data. And I'm just highlighting where we used the range operators, and I can show you. So we're basically, question? We used, what we used Tableau, we connected the database to Tableau. So we used, I mean so for the, to get the sort of like a spreadsheet out, we used Toad. So I mean like Navicad is similar to Toad, right in that they're sort of, it actually in the Toad, it actually displays correctly, it wasn't a problem. It was some issues which I, with the geometry, and it uses a lot, I guess when Toad brings the geometry data in, it uses a lot of memory, so you have to put a limit in there, because it can cause like, for some reason, if you don't put a limit in, it can cause some of these crashes. But that's more a function of Toad, than not of the database. It looks nice, I mean I could, this is how it looks in Toad, right? So it actually represented this nicely. So it's looks, you can read it, right? That's always good, right? You want to be able to read your data, right? So what I'm doing here is, since you're basically joining, you want to basically, you have to do it in sort of an iterative fashion. You join, you look at whether there's any overlap for a specific patient, then you take a union of that, and you update, and you do that iteratively until you can't make any more updates to the data. So it's, we start off with, let me see if I can move this over, can I get over to the side? So like here, these are sort of overlaps, or they're not perfect intervals. By applying that sequel repeatedly, we're kind of doing a join, we're saying, okay, does this intersect? If it does, creating a union, updating, and doing that until we can't make any more changes. So it takes, I just did it sort of, I know you could probably do a more sort of programmatic way, but I just basically executed until there were no more changes. There's nothing we're doing all the time. You kind of process once. And then once you have the data sort of normalized, and you have nice intervals, it's very easy to say, find the intervals that pair with each other. And then you can basically, once you can find those pairs, and you can find the smallest pair, the smallest difference in pairs to get the one that follows the other one. And so now you basically have two paired discharges, and you can compute the time difference between the two. And now you have your days to readmission. I think most people, they actually take this out, and they do it in either a SAS or an R or in Python, but we could do it within the database. So this is just showing the chain in patient stays. So this is a, I mean, it's basically in about 70 lines of code. I've normalized the date ranges, and we've linked to in patient stays, so now we can compute readmissions. I developed this on a test data set, but then I applied it to a data set of 100,000 discharges. I don't say, it wasn't like it took hours to run, it was pretty quick to run, I would say, in the minutes. So now we have this sort of these intervals of time, and now I can pass it on to other analysts who wanna build predictive models, predictive risk models, and do things like machine learning. But you have to start, you have the data in a nice sort of organized, clean fashion in order for the next people to like build the best possible models. Now I'm gonna switch to talking about some of the spatial aspects of the, so when I was, the talk last year, I saw about the spatial functionality associated with post-JS, I think there was, I wasn't here yesterday, but there was a talk on QJS, and also a talk on post-JS. And I had, before I had done some work with ArcGIS, so I had some basic just function, like knowledge of just how it works spatial projection. So it is a very mature, very sophisticated system, post-JS. And this is showing the, this data is being served from a post-JS database, it's showing you sort of doing a spatial projection, which these are not trivial things to do. You don't wanna write your own spatial projection algorithms if it's done correctly. So QJS is a open source JIS system. I've used ArcJS a little bit, and I feel that you can get most of the things right now, like functionality that ArcJS has in QJS. And you can connect right to the post-JS database, post-gross database, everything, it's like it works nicely. It's fast, it's quick, it draws things correctly. So that's where we can basically do our spatial data processing, and then move it in post, and then have a central repository, and all the analysts can work off of that to do sort of spatial analysis. So why is spatial analysis so important in this healthcare data setting? Part of it is because now we have regional health networks, we wanna know where people live, where they are in the network, which providers are they near. So a lot of things become spatial questions, like finding providers, finding hotspots of Medicaid members. So it becomes sort of a lot of spatial-based questions, and we can answer those a lot of spatial-based questions within the database system. So as I said before, Stony Brook is in Suffolk County in New York, so this is Suffolk County, Long Island, we're kind of, I guess, right here, right? So Long Island stretches out pretty far out into the, and then Suffolk County has sort of a, it has agriculture areas where it looks more rural, and most of the portion in the west is more suburban. So one of the things if you've done, if you ask somebody in healthcare data spatial analysis, they say zip codes, postal codes, right? That's the level of, that's the level that we sort of work with, because the postal codes are there, they're easy to work with, right? And a lot of times the postal codes are good, I mean, but we wanna move beyond postal codes if we wanna answer more sophisticated things like the distance between a patient and a provider. We can take the midpoint of the zip code, of this postal code region, but that's not always accurate, some of these postal codes out here in the eastern end of Suffolk County are pretty large. So we wanna move to a more spatial-based perspective. So now we're gonna sort of zoom in, we're, as an example, to look at why we wanna move beyond zip codes to more spatial-based healthcare data processing. And just to say that this is, I mean, there are, I think Oracle has spatial extensions, MSQL has spatial extensions, but they're expensive, and you get really mature functionality within post-jiz. And so I'm kind of zooming into one area, and I grew up in Centiport, which is up in this corner here, which is just on the, but I was, when I looked at some of their original data, I saw this, I always thought that, for example, that there's two regions, Dick's Hills is one, Hunting Station is another one, during this 11746 zip code, and I can just, maybe if I can get that, oh, I lost the mouse, but in this zip code area here, but it's a pretty large area. And since we're interested in the Medicaid population, we have this area here in Dick's Hills is different, very different socioeconomically than Hunting Station. So if we combine them into one zip code, we're gonna miss, we're losing some of the granularity that we need to do a kind of an accurate analysis of the population. So within that area, so Hunting Station and Dick's Hills, they have the same zip code. Zip codes weren't really developed for population health. They were developed for mail delivery. And a lot of them are historical. If a community developed a certain point, they put a zip code around it. So they're not really sort of good units for population health analysis, but that's not all we have to work with. But they're actually, you know, within the Census Bureau recognizes things other than zip codes, they have census designated places, and they recognize separate communities of Huntington Station, South Huntington, Dick's Hills, right? So now I'm gonna show you how these communities are different, right? So, you know, we go to Dick's Hills, right? Google street map, you know, it's, I don't know, what's the zoning? Is there a three quarter acre to one acre zoning for houses? You know, it's, you know, very wealthy area. It's the same zip code as Huntington Station, which has more dense housing, and we'll look to see some of the differences between these communities. In order to see some of those differences in the communities, we need to convert our patients' addresses into latitude and longitudes. And so we need to use a geocoder for that process. Particularly when we do what patient addresses, you can't just go to Google and put a patient's address into Google, because basically you are telling Google your patient is, they can look at your IP, you're coming from a hospital, and now you're geocoding, you know. So you wanna have, we have to do it sort of internally, and Google has limits on the number of geocoding you can do, you have to use it within their environment. So we need another geocoder. I've used the geocoder within ArcGIS, scripted something for it, it's kind of ugly. I found that when you install post chest extensions, you can also install a tiger geocoder. I found some instructions on the web, how to operate it, get it working. I installed the street data for New York State, because we're only interested in New York State. And this works well for residential addresses, because Census Bureau is really interested in people living, residential addresses, they're not interested in businesses. The one thing I did found with this geocoder is that when it does fail matching an address, it fails pretty badly. Like, you have to check on zip code, you have to add a second level of checking so that if it fails, then you find yourself that it's actually geocoding address in upstate New York. Which is, you know, it's outside of your zip code. So it's like, this is really bad. Like, that's the best match, really. And so, you have to add a second level of checking. I mean, it's great that it's freely available, but you have to add a second level of checking to that. But, if you use that extension, there's no privacy issues with that. And I go, this is using the geocoder where you just put, you can just, and within the SQL, we have a procedure, you can function. You can then call that geocode address. So I took those two addresses that I showed on Google Maps, geocoded them. You know, one is on the station, the other one is Dick's Hills. I'll talk a little bit why these are sort of different communities. You can see that in Census Bureau data. So we can look at Census Bureau organizes data in sort of different levels of granularity. We have things like Census Tracks, which are a little bit coarser. We then we have block groups, which is finer grained. But this is looking at medium household income. And you can see that in the addresses that are on the station, right? The medium household income is about $50,000, the lowest there. Now, if you go over to Dick's Hills, the median household income is, one of the areas is 162,000. So there's really a big difference in these communities, right? One's a very wealthy community. The other one, you know, not as wealthy. And, you know, we find, you know, if you look at the sort of like the map, the data from the state gave us, they color this very, like a bright color. There's a lot of Medicaid members within this region. But those Medicaid members are not living in Dick's Hills, they're living in Huntington Station, right? So this is an important sort of conclusion, but if you lose when we just focus at the zip code level, we can also look at the ACS as all kind of the data. So we can look at the primary language, right? So some of the communities within Huntington Station have, you know, up to, I think it's hard to see here, 73% of the households, English without the primary language, Spanish is their primary language in the household, right? So if you're gonna be mobilizing care managers to patients living within these regions, you wanna make sure they're bilingual, right? Because the language barrier is a big issue. So, you know, these are really separate communities and we don't see that when we're at that larger green level. I just, I also, why don't you sort of do this and you do it on a, you know, within a database system, I can do it all in New York State, right? And so you can even see within, this is the Spanish language households, you know, what areas in New York City, you know, have high Spanish, you know, where the household mostly speaks Spanish. So how do you get all this nice spatial data into your Postgres database with the PostGIS extension? You start with things called shape files and they come in zip usually and they have a debase file format, part of it, and then they have some kind of binary format. There's nice tools that available to basically upload that data, convert that data into a format that can be loaded into your database. And I use, I mean, there's a GUI tool for it, but there's also command line functionality for that. How do we load the community survey data in there? You can go to a website called FactFinder, get data from there. Because I was working with a lot of different sort of state, I was looking at a lot of different variables. I wrote a tool to process some of the ACS data and load it, bulk load it into a Postgres database. Once the ACS data and the shape data is in the database, we can use what's called the GUI to link the two, and now we can link those two. And so now we can basically create these maps because we can associate the GUI with the shape. So once you have your data in sort of in spatial geometry, you can be basically, you're free from postal codes, right? You can do spatial join. So if I wanna ask a question about towns, school districts, water districts, fire districts, I can answer those questions now. I don't have to find a mapping between the zip code and the school districts. And a lot of times those mappings aren't perfect. So I can now do sort of what's called spatial join, doing, based on those two addresses, I can see which census tract are in, which block group they're in. As an example, I'm gonna quickly go through this because my time is running out. Other things, you can also do spatial processing. So when you load the shape files into the system, these are directly from the census website. If you look at it, this actually doesn't look like Long Island, right? Because it's missing the shape files overlap areas of water. And when you present your data to the higher level people, they say, oh, that doesn't look like Long Island. This doesn't mean, you know, it's like, this looks, so you wanna actually, polish as the presentation is important, right? But you can within all the functionalities within PostGIS, you can do what's called, you can intersect a base land area with the shape files to create nicer looking shapes. So you can, you know, these are, we can trim down to the land area. So we get a nicer looking map. If you're in the middle of like, for example, in Kansas, it probably doesn't, this is probably not, you probably don't need this, but an area like Long Island and New York where on the coast, the land areas are important. Sort of defining, making it look like, actually I'm on Long Island. This is an example of just how to do the intersections. PostGIS has a lot of functions for working with spatial data. I can't go into all that detail now, but there's quite a bit, you can just get on the website, read the documentation, understand it, and do some example, but it's not, I mean, it wasn't that complicated. And there's just more than basic things, you can do more complicated things, unions, you can combine shapes together, you can do intersections, which I showed in the other example, you can find midpoints if you only have. If you can't geolocate somebody to their address, but you have a zip code, you could use the midpoint as an example of the shape. So there's a lot of different sort of spatial processing you can do with PostGIS. The last thing I'm gonna go quickly through this is also working with other traditional types of healthcare data. SPARX is the system of hospital discharges for New York State. What's interesting, they like to deliver these files in a very flat format, non-relational format, or everything that you have repetitive columns. So it makes, what most people do is they write SAS programs to analyze this data that go through and iterate through all the different columns, but you want to take advantage of that full relational database model to get more insight from the data. So we didn't have that much time to write a lot of sequels, so we wrote tools to basically normalize the data, do automatic normalization of the data, look at and inspect the database structure, and then build automatic load scripts for that that sort of normalized it. And then what we can do is we connect traditional business intelligent tools, Tableau, relatively new player in business intelligence, but the sort of visualization front end environment. But ultimately what we want to get to you is we want to basically get insight. So we're doing all of this to sort of understand our population better, right? So we're processing the data, we're normalizing, we're looking at spatial relationships. Ultimately what we want to do is we want to get to the people, the managers, the people that run the hospitals, a report that they can understand insight from. So this is from the Sparks data where we've normalized the ICD-9 codes, and we looked at secondary diagnosis of Medicaid patients within Suffolk County. And what you can see is they were actually very surprised that the psychiatric disorders in the population is very high. So they always look at the primary diagnosis and what's called the DRGs, but they weren't looking yet at the psychiatric, secondary diagnosis of the patients. So these Medicaid patients that are coming in have a lot of secondary diagnosis. So this was something we can, through this pipeline, be able to give this conclusion out to the higher level, the people that wear the ties and the tucked-in shirts and the suits, and this is insight for them, right? So we have to get to that point. So parting thoughts, health care data for population health is not, I'd say, that big. We're talking, we can make it big, but we're talking mostly in the millions, 10 million, the biggest database we have has 100 million rows in it. Really that spatial aspect for data processing, very powerful tool. Post-GIS really kind of makes us a very powerful environment for doing health care data analytics. One thing, I mean, this is more of a thing, is that a health care data analyst, we need to develop sort of a sharing environment. I think that's what we can want a lot from open source software. Share what we do, you know, generate synthetic data sets so that we can develop algorithms on and not have to worry about private data. And the other aspect, I mean, you look at the roadmap, when you look at the 93 to 94 roadmaps and you look at what was being implemented, what was being implemented for 95, for example, you see that sort of, you look at all some of the extensions, that there's definitely good support in the future for, let's say, a more data mining database. So I think we don't have that much time for questions, right? But question? It gives you a, it does, it gives you a rating, right? What I found is the rating, it's, when you get zero, it means like it found the house, right? That location. But then I looked at, sometimes the 20 rating was better than the 10 rating. So it's, I think it's because it's based on some kind of, on a string comparison. And sometimes, yeah, so what I found, trying to restrict it, if the outside of the zip code, don't trust it, right? So that's kind of what, I had to build that extra level sort of check on it. Question? Okay. Okay, yeah. Yeah, then we can, especially on the business addresses, I think, I could, I'll tweet it out and there are, I have it, I can tweet it, I can put it on my Dropbox and then tweet it out or I'll give you the, okay. Yeah, there, I guess we have the next talk coming.