 Go ahead and get started. I'll be talking about all the big data science stuff that Postgres has as features and capabilities that maybe some of you haven't heard about. Talk about developer features, administration, performance. There's a community aspect to this as well, I think. And then finally, I actually have a case study for one of our clients that kind of shows how you can put all these different features together. So developers, there's a number of things that Postgres has that you can use to make your job of developing an application or developing predictive analytics models easier. The most obvious of which is the JSON data type. This was added in Postgres version 9.2, and there's features implemented around JSON in every release since then. 9.5 gains the ability to actually go in and very easily modify parts of the JSON document without having to go through a bunch of extra work to split things out and bolt the document back together. There's two different data types, two different JSON data types that Postgres supports. The plain JSON data type basically takes your JSON and it stores it as text. So it keeps it in the raw form. The only thing that it really does is it validates that it is valid JSON. That's nice if you expect to be ingesting JSON and then not really doing very much with it. It is faster to pull data into that data type because it doesn't have to parse it. It just does a syntactical check. The second type is JSON B or binary JSON. That data type, when you provide a JSON document, it actually goes and it parses it out. And it stores the JSON in a compact binary format. So your original formatting and things of that nature are gone. But the nice thing is that the access to that data is then much faster. The other thing that you can do with JSON B is you can actually index intelligently on it. There's two different indexing methods that are provided. One only operates at the very top level of a JSON document, but it knows about every single key. The other one operates really at the path level. So it knows what paths are in your JSON. As I said, there's a very rich support for it. As of 9.5, there's 12 operators and 23 built-in functions for it. And importantly, there's also very strong interoperability between the JSON data types and other data types. So what I mean by that is you can do things like take a Postgres row out of a table and say, hey, take this row of data and turn it into a JSON document. It's a single function call. It's very easy. And whatever your input data is, you then get a valid JSON document out. And this is just one example of the kind of interoperability that you have with JSON. The next thing that Postgres features is an ability to really connect to everything that you could possibly want to connect to. Foreign data wrappers is the foundation for this. Those are added in 9.1. There are now just short of 100 foreign data wrappers. In fact, there's so many foreign data wrappers that there is a foreign data wrapper that reads the list of foreign data wrappers for you. There are 15 foreign data wrappers specifically targeting NoSQL technologies across 10 different NoSQL databases. And there are seven that are targeting big data systems like Hadoop and HDFS. Now, if you need to talk to something that doesn't have a foreign data wrapper provided for it, first of all, it is not that difficult to produce a new foreign data wrapper. There are two projects. One is called Multicorn. The other is called, I read it yesterday, and I can't remember the name of it. But basically what Multicorn does is you can write a brand new data wrapper in Python. The other version of that is the same idea. It makes it very easy to develop a foreign data wrapper in Ruby. Should all those steps fail, and you don't want to go and create your own foreign data wrapper, the next capability that you have is support for multiple procedural languages. And from the standpoint of interoperating with other data systems, this really gives you the world. Because you can write code to run natively inside a PostgreSQL database in Python, in Ruby, in Java, in all kinds of different languages. So whatever data you need to get at, there's a procedural language that will allow you to get at it, assuming that there's not a foreign data wrapper that you can use already. And again, because PostgreSQL is so extremely extensible, if you have a language that's not supported, like for example, I looked to see if the Go language had a PL language, and I didn't see it. So that might be an interesting project, is to add support for Go in PostgreSQL. Finally, on the developer side, PostgreSQL data types are themselves extensible. Perhaps the simplest version of that is a domain, which really is a data type that you put some additional checking on. You can also create a composite data type. So for example, PostgreSQL does not have a data type for complex numbers, like Python does. But if you need to store complex numbers in PostgreSQL, you can easily create a complex data type that just has your real and your imaginary components as two fields to a single data type. You operate on that as a single data type. So from the standpoint of data, it's one single number, but that single number has two components. You can also create brand new data types. Just last week, I decided, hey, let's see what it would look like to take a NumPy NDRay, a Python data type, and port that into PostgreSQL. And once I figured out, most of the difficulty here was figuring out some compilation issues between PostgreSQL and Python. In order to implement that data type, it's like 10, 20 lines of C code and 10 lines of SQL code. And here's a brand new data type now that you can make use of. In addition to the type extensibility, there was a feature added in, I think it was 9.4 or 9.5, called transforms. And these allow you to control how specific Postgres data types are relayed to the different procedural language functions. So for example, the NDRay data type, I created a representation function. Those of you that are familiar with Python will recognize the wrapper format. And this is a function. It says, hey, I return text. I'm a Python language function. The important thing is the transform for type NDRay. That tells Postgres, hey, there's a special thing I want you to do for this data type. So when you call this function in Postgres, and you hand it an NDRay data type, that gets treated in Python. It's an NDRay data type. So all the Python function has to do is say, oh, well here, I'm just going to return the representation. And that's what you get out. So all these developer features, ultimately, what they mean is that if you're working with complex analytical tasks, the last thing that you want to have to do is go learn a whole bunch of new tool chain, go figure out a whole bunch of extra stuff. You just want to be able to do your job and get it done and move on. And these developer features make that very easy for you to be able to do that. And any place where you see something missing and go, oh, man, I wish Postgres did this, the extensibility capabilities make it easy for you to add something new that you don't have. And by the way, if anybody has any questions during the talk, please just raise your hand. I don't want anybody to wait till the end. So next up is administration. If you're going to maintain a big data environment, administration becomes a really key facet to this. And there's a number of things that Postgres has that, honestly, I think are actually superior to a lot of the big data options that are out there that are more traditionally used. A big one of those is backup. I think this is something that a lot of the big data tools, they just go, oh, the data set's so big. I mean, how can you back up 10 terabytes of data? Well, that data is your business. Most companies, if they lose their entire data set, just lock the doors because you're done. So I think that dealing with backups, that needs to be a foreground thought. That needs to be something that you think of upfront. And Postgres has done that. So there are two major ways to do backups in Postgres. Point-in-time recovery is an extension of the same kind of crash recovery system that Postgres uses to deal with a server going down. So it uses the write-ahead log and takes that write-ahead log, copies it to a separate place. And that, in conjunction with just the plain old file system-level copy of your database, that constitutes a backup. The really strong capability that Point-in-time gives you is that when you need to do a recovery, you can say, OK, recover to this exact transaction. Because if you look at traffic on the mailing list and when people are asking about backup questions, I'd say probably half the time, it's not because they lost their entire database. It's not because a file system just went kaboom and everything's gone. It's because somebody made a mistake. And they said, oh, this table with test data. Drop table test data. Oh, wait, that wasn't test data. That was production data. And now you have this database that is all still there. And it's up and running. But now you can't get to this very important data. So this Point-in-time recovery capability makes it much easier to deal with those kind of situations. Because you can recover the database to the exact transaction before the mistake was made. The other backup option that you have is PGDump. From a big data standpoint, it's probably not what you're going to want to do. It has gotten significantly faster. It is now possible to run both a dump and a restore in parallel. Really, the reason that I put it on the talk is, depending on the criticality of your data, the thing that I really like about PGDump is it's basically impossible to screw it up. Point-in-time recovery, there's different ways to configure it. There's different tools you can use to configure it. It's very binary level. You can't do things like take a point-in-time recovery on, say, an Intel architecture and move it to a different architecture, which isn't as big of an issue today. Whereas PGDump, it's a sequel level dump of the database. So when you're thinking about disaster recovery, I would encourage you to at least ask yourself the question, hey, should we be doing periodic PGDumps as a last-ditch recovery safety net? Replication with big data is interesting, because a lot of times you're dealing with these very large data sets that you don't necessarily think, oh, I want multiple copies of it. But then again, a lot of times you're dealing with these data sets and tools that run in a sharded environment, and as part of the sharding, they keep more than one copy of the data. So Postgres, you can do the same thing. There's multiple different options that you can use to achieve that, and that gives you a lot more flexibility. Probably the easiest thing to set up is just simple streaming replication. So this builds on top of the same code base and the same concepts that are used for point-in-time backups. But it extends it so that now it's a point-in-time backup that's happening continuously. And at the same time that it's happening, you have a server that's restoring that backup and that you can run queries against. So you can stand up these replicas. They're very easy to stand up. And you also have the capability for these to be truly synchronous replicas. So what that means is if you have synchronous replication enabled, and on your master, you do a bunch of work in a transaction, and you go to say Commit, that Commit is not going to return to you until the Commit record is recorded by the replica as well. So when that Commit comes back to you, you know that that data is stored not only on the master, but also on your replica. And in 9.5, you can actually have multiple synchronous replicas. One of the other nice things about the synchronous replication is that you can control this on a transaction by transaction basis. You can have a transaction, in fact, not only that, but you can have a transaction where you say, look, I just want the transaction to come back ASAP. I don't care if the data is on disk. So if you have tasks that need to deal with ingesting data at a very high rate, this is something very powerful, because you can do the ingestion, get the data sent over to Postgres, and get an immediate return from Postgres, let your code keep going, gain more throughput without having to resort to actual parallel processing or something of that nature or something asynchronous. The other interesting thing that you can do with streaming replication is you can have the exact opposite of a synchronous replica. You can have a replica where you say, I want you to intentionally be four hours behind whatever's happening on the master. This is very useful from a disaster recovery standpoint, because again, if we go back to the scenario of somebody goes and they drop a really important table, and now you have to recover, you don't have to worry about, oh, we have to go, we have to get the backups, we have to stand up a recovery server, we have to start processing the backup. No, you have a server that's been doing all that. Now all you have to do is tell that server, hey, something bad just happened, so I need you to recover to this point and go do that, and as soon as it's done with that recovery, it'll come up, and then you can pull the data off of there. Postgres also has logical replication support with a PgLogical extension. This was just released not even a month ago, I think. It's very similar in nature to trigger-based replication, but the big difference is PgLogical uses the logical replication framework that was first added in, I believe it was 9.4, that is based on the transaction log. So this is logical quote, unquote, statement level or trigger-based replication that doesn't use triggers, so it's extremely fast. It is comparable in performance, in fact, to streaming replication. The other thing, so why would you want to use this? One reason is streaming replication, you have to replicate an entire Postgres instance. All the databases, all the data, all the tables, the whole nine yards, you have no choice no matter. Logical replication, you can replicate only what you need. So if you have certain tables that, a subset of tables that are hit heavily, you want to do some predictive modeling on them, things of that nature, you can take just those parts of your data, break them out to a separate server, and not worry about other parts of the database that aren't important, and you don't need to support that workload. The other thing that you can do with logical replication is this provides a mechanism for doing a Postgres version upgrade, where you can actually test out the new version of Postgres. You can run queries against it. If it looks good and you're happy with it, you can switch your production environment over to it, reverse the replication, so you still have your old version of Postgres that's up and running. It's got all the current data. If the new version doesn't work out in production, there's a problem, you can switch back. The in-place version upgrade that is more typically used, PG upgrade, that doesn't have that capability. Even if you use the method of upgrading that copies all the data, once you've stood up your production traffic, well, now your old version is out of date. You don't have the same information. From a multi-master standpoint, there is now binary data replication. This builds on top of the logical replication framework. It's very similar, but instead of being truly acid, as PG logical is, this uses eventual consistency. And it supports wide area replication. There's a number of people that are running this where they have data setters spread across the planet. And BDR is very happy dealing with the kind of latencies that you get on those kind of internet links. The one kind of downside to BDR is that right now, for BDR to function, there's a few things that it needs within Postgres itself that are not in Postgres yet. So it currently requires a slightly modified version of Postgres, so it's not 100% true community Postgres. Finally, there are other replication schemes that are out there for Postgres. Sloney and Lone Deast are probably the two most popular. There's a few other ones that are out there. At this point in time, I would really tend to recommend that you don't use these ones unless you have a very specific need. PG logical or streaming replication will most likely solve all the needs that you have better than either one of these would. So clearly when we're talking big data, analytic workloads, things of that nature, performance becomes very, very critical. And this is something that the Postgres community really puts a lot of emphasis on. There is a tremendous amount of work that happens to constantly improve Postgres performance. And I think that one of the really important things to understand here is that big data, there's all this focus on, ooh, web scale. We have to scale horizontally. And for a large enough data set, that's true. But there are a lot of data sets that Postgres is very happy to run on a single vertical server. And you don't need the extra overhead, the extra expense of colo space for 10, 15, 20 extra servers running around when you can get a single large server that will do the same thing. I've personally run databases that peaked at over 6,000 transactions a second. And that was on a 4 terabyte OLTP database running on, I think it was like, a $20,000 server. And that was on IO hardware that had problems as well. So despite some IO issues, Postgres was able to scale very well to that. And that was version 9.2. The version number here matters because, as I said, the community puts a very heavy emphasis on continually improving the performance capabilities of Postgres. It wasn't that long ago where if you wanted to scale Postgres past four cores, it would ramp up to four, maybe six cores, and then you've had this downward angle. Well, now it'll very happily do 80 cores. There are significant improvements that are made in every major release. And typically the number that I tend to go with as a ballpark is every major version of Postgres improves performance by about 20%. Obviously, that's very heavily dependent on your workload. Some people won't see it. Tremendous, they may see 5% or 10%. Other people are hitting a problem area, and they'll do a version upgrade and go, oh, holy cow, it's twice as fast. So why does this matter for big data? Well, data keeps getting bigger. It wasn't very long ago when a terabyte database, I mean, that was a big deal if you had a terabyte. Now, a petabyte is a big deal. So that's gone up by a factor of 1,000. And what allows you to manage that kind of growth is your technology platform basically has to grow faster than your data is. And Postgres performance improvements play a significant portion of that, because if you're getting 20% year over year from Postgres, that means you don't have to look at your own software stack and go, man, where are we going to get the next 20% from? No, it's a very good comment, because a lot of the NoSQL solutions are very, very space inefficient. And it's kind of easy to dismiss that with, oh, disk is cheap. And yes, disk is cheap. But what's not cheap is the IO. Alvaro, the creator of ToroDB, gave a talk a couple hours ago in the other room. And he's got numbers that showed comparing Mongo to Toro. So sorry, let me take a step back. ToroDB is the MongoDB wire protocol implemented on top of a relational database, specifically Postgres, though theoretically you could run any relational database. And the testing that they have done, they see that storing the same data in Postgres versus storing it in Mongo, it's anywhere from 68% of the size to 38% of the size. So when you get down to 38% of the size, that means that 2 thirds of the data that was in Mongo was wasted space, basically. And that makes a big deal from a performance standpoint. So no, it's a very good point. It's certainly relevant. So finally, coming out in 9.6 after, I don't know, how many years of us wanting it, Postgres is getting parallel query execution. This is a tremendous amount of work that's been done by a handful of people. Right now it supports sequential scans. I know that there's parallel join execution. I don't know offhand if there's parallel index reads. I don't know if anybody here does. And it does support, or they're working on support for parallel aggregate operation. One of the things that people have a tendency to focus on with big data is this whole horizontal scaling. And admittedly, this is an area that Postgres is not as strong on. But there actually are options here. First of all is Postgres XL. This is a fork of Postgres that allows you to run distributed transactions on multiple instances in a fully acid manner. So from a user interface standpoint, it basically looks like, oh, it's just one Postgres instance. It just happens to be spread across a number of different servers. There are some other variations of this. In fact, Postgres XL grew out of something called Postgres XC. So L comes after C. You want the newest one. There's also Postgres R. There's also Green Plum, which was very recently open sourced. Postgres XL seems to have the biggest weight behind it right now, so that's why I mentioned it specifically. The other option that you've got for sharding is a project called PG Shard. And PG Shard, I was going to put examples in, and I completely spaced out and didn't do it. But basically, if you want to take a table with PG Shard and spread it across multiple servers, you run one function to say, hey, this table, I want you to shard it on this field. And you then run a second command to say, OK, now that this is sharded, go actually spread that data across all the clusters that you have. Once you run those two commands, you write queries against that table, and they just get spread to all the different data nodes. There's nothing else that you really have to worry about. So very, very easy to use, very flexible. The only downside is that with spreading the data out, you don't necessarily have a truly acid database anymore. So if you want to do updates, and in fact, it does require that when you're doing updates, you have to include the partition column in your where clause of the update. So really, updates are not going to be acid. They are going to be spread across multiple nodes. But as long as you are OK with that, it's a really good option for getting horizontal scalability should you need it. Postgres also has support for column stores. For certain workloads, certain data locality patterns, a column-oriented storage technique can potentially be a lot faster than doing the equivalent in a row-based storage mechanism. The C-store foreign data wrapper is, I would say, the most up-to-date version of this. It was created by Citus Data. It's basically one of the parts of their commercial technology, and they've open-sourced it for the community to use. It is implemented as a foreign data wrapper, so that's kind of the interface into it. There is also an in-memory column store. There was a presentation on this at PGCon last year, and I did look when I looked it up. There is current activity in GitHub for this. So there were commits back in, I think, October. So there is activity on it, but I haven't, frankly, seen much publicity on it. So if you need a column store, I kind of included it for sake of completeness. If the C-store FDW doesn't do what you need, you might look at the in-memory column store. And there's also a foreign data wrapper to Monet, which itself is a column store database. Finally, on the performance side of things, Postgres has support for GPU utilization. For certain workloads, this can provide just tremendous speed improvements, because GPUs are all based on this idea of take a single or a handful of small operations and apply them to an enormous set of data as fast as you possibly can. That's how graphics processing works. And for certain activities in a database, that actually lends itself very, very well. I would encourage you, if you're interested in this, go ahead. Take a look at the Wiki page. It's got some performance graphs on there. And admittedly, the performance graph is from an ideal best case scenario. But the speed improvement is orders of magnitude, like a factor of 100 or more. So if you're pushing the envelope performance-wise, this can be a great thing. And the nice thing is that it works natively. Once you load the extension, it just goes and does its thing. So if it sees a query that it thinks would be a good candidate for running the GPU, it's just going to go do it. You don't have to sit there and manually hand code or anything like that. Finally, another aspect of this that I think comes into play is there is really a very large, very extensive community that has built itself around Postgres. And these are two examples that come to mind that I think are relevant to big data. This is certainly not anything comprehensive. Postgres is a restful API for Postgres. So if you want to be able to treat Postgres kind of like is if it was a NoSQL database and provide a nice easy web interface to it, Postgres will do that for you. Also, as I mentioned, ToroDB is the Mongo wire protocol implemented on top of Postgres. So to kind of put all this stuff together, I've talked about a whole bunch of different features, just trying to kind of get you thinking about the different things that go into predictive analytics and data science and some of this big data stuff, let's look at what happens when you actually put all this together. So in a traditional analytics environment, there's a number of different steps that get performed in order to go through the work of building a predictive model and trying to vet it out and trying to decide, OK, is this thing working? Is it not? And really, this is pretty painful. So the old school traditional way to do this is first you would do a bulk export import of your production data into an analytic tool like SAS. When you're dealing with a four terabyte database, doing this becomes painful. And then having to keep this thing up to date and having to do that every time that you need to spend a new version of the model, it kind of stinks. Once you've got it in the tool, now you can use the tool to start identifying correlations, patterns in your data, things to be looking at, and you use that information to start creating your models. And at this point, you're basically you're creating your model by hand. It's almost as if you're just writing the model as math formulas on a piece of paper. Once you've got an idea for, OK, these models, here's what I think is going to work, you'll do some amount of validation of the models based on the data that you've loaded into your analytics tool. But ultimately, you're probably going to have to take this model and run a regression back against your raw data set, basically trying to simulate, OK, if this model was up in production when this event happened, what would this model have said? Would it have made a better decision than the old model did? And there's different ways that you can do that. The way that I've seen it done is that you take the model and you write it in pure SQL. And that can end up becoming kind of messy and difficult because even if the model itself isn't terribly sophisticated, if you're dealing with a model that's looking at 50, 60, 70 different variables, you've got to hand the code. Well, how do I pull each one of these variables out? How do I present it? And then you have to run the calculation as well. And you're also worried about trying to do this in a way that's going to perform reasonably well so that you can run this test and it doesn't take three weeks to run. So you go through some iterations of that and you use the information you gain to go tweak the model, change the parameters in it, things of that nature. Finally, you say, OK, I think we have a model. I think we're good to go. Here, Mr. Production Team, Mr. Application Developer, here's the model. Go code it. And now you wait. And you wait. And you wait. And six weeks goes by. And eight weeks goes by. And finally, your model makes it out into production. And now, three to nine months later, you can finally see whether all this work that you put into this model is actually producing good work if it's producing a significant improvement or not. And I've talked to people that do a lot more of the consulting in the predictive analytics space. And this is not uncommon. This is a very prevalent thing that you see fundamentally because your data scientists and your analysts use a different set of tools and have a different set of needs and have a different way of thinking than your application developers do. Your application is frequently written in something like Rails. Well, your modelers may be using Python. They may be using R. They may be using something else that you wouldn't really want to try and write your web application in. This is the right tool for the analytics job. This is the right tool for the application job. And we have this big gap in between. And that causes problems. So when we started working with one of our clients, they needed to be able to do this stuff. And they've seen this kind of thing before. So they were already very much in tune to this idea of, look, we need to make this analytic workflow. This has got to be made easier because we can't be spending three, six, nine months on spinning models. So if you take a step back and you think about creating a predictive model and especially executing the predictive model in a production environment, there's kind of two things that go into this. The first thing that you need is data. And that sounds kind of obvious and silly. But I think it's important to kind of segregate that out and separate it out. Because no matter what, the data portion of executing your model, it's a data problem. Whether your data is stored in a relational database, whether it's stored in Mongo, whether it's stored in React, wherever it's stored, the problem of how do I gather the data together that I need to make this prediction, it's a data problem. So the key facet there is solve your data problem using your data tool. So in this case, they are running on Postgres. Our methodology for pulling the data together is it's all SQL. We have a view that goes and pulls this stuff together. And then there's just a simple function that you call and say, hey, here's the person's identifier. Give me the data that I need to run this model. And the function returns a Postgres complex data type. That is every single piece of data that is needed by the model. Some of that is very simple scalar data. Some of it is arrays that are actually arrays of complex data types themselves. So there's an actual nesting that's happening here where you have this multi-layer document, essentially, that's being handed to the calculation function. But in order to ensure robustness and to make sure that there's no surprises and that we know exactly what this data is and what format it's in, we're not using something as valuable as, say, JSON. It's done as an actual Postgres data type. So there's no question about what is being handed back and forth. So the data gathering is done in SQL. The calculation function, on the other hand, and again, we're keeping these two things separated very specifically, so now the calculation, you wanna run the calculation in R? Hey, great, fine. You wanna run it in Python? Okay, not a problem. You wanna run it in Perl? What the hell's wrong? I mean, yes, we can do that. It doesn't matter. It'll run in whatever procedural language you want to. As long as Postgres supports the procedural language, you're good to go. And the nice thing is is that these ideas have been wrapped up in a framework. So there's a set of database functions and tables that are built around this notion of here's the data stuff, here's the calculation stuff. And this framework does things like when you go to set up the next version of a model, you say, okay, here's version three of the fraud model. Here's the function that goes and gathers the data and that gets put into the framework and the source code of that function gets put into the framework. Here's the function that runs the calculation and the source code gets put into the framework. So now we have this record that says, oh, here's the exact version. And every time we do a run of the model, we say, well, here's the model we're running. We capture all the input data. We store that as part of the execution, this individual execution. We then also hand this to the calculation function. It does its thing. It provides another Postgres compound data type. We then store the compound data type of the results and now we have a record of absolutely everything that went into executing this model. Now, for a lot of you, out of curiosity, how many of you are doing big data? How many of you data science data? Okay, good, about quarter. You know, a lot of you, if you're doing like weather modeling or something like that, why are you going to all this problem? Why are you going to all this trouble? Well, Kyrgios Capital is a lending institution and there's a lot of regulations that are wrapped up around lending institutions. So when they're at work one day and the auditor shows up at the front door, they want to be able to answer auditor's questions and part of this is the auditor can go in and pull individual loans or pull declined loans and say, why did you decline this person alone? Did you decline it because they're a woman? Or did you decline it because their credit stinks? The first one is bad, the second one is okay. So this framework was developed specifically specifically to support those kind of ideas that now they can explicitly say, here's everything that went into this model decision. So those of you that are in a heavily regulated industry, those of you that want that traceability, you know, that's something that might be very valuable. Others of you that don't care so much about the traceability, a lot of that I think is still valuable. You know, this notion of separate the data gathering from the calculation. But even more importantly, take, you know, utilize Postgres as the thing that binds this stuff together and it makes it easy. Because now, since all this stuff is done in Postgres, we can do things like spin up a snapshot of production, go mess around with the models, whatever. I don't care if you screw the whole thing up as long as you don't leak any sensitive data, you can go test on it. Oh, you think you got something up and workable? You wanna do a regression? Okay, fine, let's run the regression on a replica, say. Or we could even potentially run the regression in production itself and just not, you know, not use it for decisioning. And the nice thing is we would be able to prove, oh, nothing's decisioning on it because there's no records that indicate it. I could easily put a trigger in place that says if you try and create a record, deny it. You know, so there's all this flexibility that comes about because instead of this data analytics team and the application development team, I'm standing here in the middle as the data architect saying, come on guys, let's pull this in together here. So finally, you know, one thing I do wanna mention, obviously I'm a big Postgres fan if you haven't picked up on that. There is no such thing as a magic bullet. There was a tremendous amount of hype around NoSQL when the tools started coming out and a lot of people were like, oh, this is gonna solve everything. And you look over the past year or so and all of a sudden now there's all these projects that are springing up around taking a NoSQL technology and putting some form of SQL back on top of it. Because as it turns out, SQL is a pretty good language for dealing with data. Maybe you can't do all of your big data analytics, data science in Postgres, that's fine. You know, there are certainly workloads where Postgres is not the right tool for the job. It's not the right answer. What I do think that Postgres has to offer is this incredible amount of connectivity and this incredible flexibility through its extensibility and the capabilities that it already has built in so that the parts that don't make sense to put into Postgres, fine. Put it in the technology that makes sense but then use Postgres as kind of your hub. Use it as the thing that can pull all this stuff together and make it easier to do your job, make it easier to work with. And of course, if you do have parts of your system where you really do care about immutable data quality and you need to make certain that, hey, when I say this thing is committed, it's committed. Whether it's because of regulatory reasons, whether it's because you're dealing with, you know, things like a shopping cart. If somebody buys something from you, I tend to think it's pretty important that when they click submit, and you say commit that that record is stored but when you show them the confirmation page, it says, yes, your brand new widget, 5,000, it's on its way. So, questions? Yes, so because Postgres has this support for multiple different procedural languages, you can run pretty much whatever you wanna run in the database. Now, I will say that obviously if you've got something that's gonna just burn through AD cores like there's no tomorrow, maybe you don't wanna run that on the database server itself. But you can also, because you've got this advanced language support, maybe what you're doing in the database is you make a call into a procedural function that itself can go call out to a more distributed environment and then the data comes back in. Now, on its surface that sounds kind of weird because well why wouldn't you just have the application do it? And the reason I would personally do it in the database is you're doing your data gathering, you run the calculation, you get the results back, you probably need to store the results in the database too. So, really this is all still a data problem. It's not until you've captured the data, calculated and then stored the results that now you go back to the application and say, hey, here's the number that you were looking for. Postgres does not and I don't think it should because it's got Python just as an example. And I know that there's some nice neural net capabilities in Python and you can pull those classes in. All you have to do is install the class library on the database server and you can run it right there native within Postgres. Again, this is where you do run into the trade off of if it is something that is computationally intensive that you're going to be doing, you may not want to run that directly on the database server. The question then becomes okay, what's the data component of that look like? Like how much data are you dealing with for a model run? If you're doing something that's say, finance based or something that's not handling enormous amounts of data, I think it's reasonable to say, well hey, we just got a Postgres database, it's got it, we pull it together and then we fire it off to an external process. And there is parallel capabilities within Postgres and cluster capabilities. On the other hand, if you're trying to do like weather modeling and you're trying to pull in every temperature sensor on the planet for the past week, okay, Postgres maybe not the right answer. So it depends on what you're trying to do. Really what I want to do with this talk is just get people kind of thinking because I think right now there's this mindset of, well it's a relational database, you don't, big data relational that doesn't mix. Well, it can mix. Yes, I do not know off the top of my head. Right, I don't know, honestly. The thing is is that I think, so when you talk about foreign data wrapper sophistication, at the base level, all that a foreign data wrapper is doing is just passing data back and forth, it's doing protocol translation and it's basically turning something like a select statement into whatever you need over here. Where the sophistication comes in is when you're trying to do stuff like, okay, I'm gonna join three tables locally and two tables over in this foreign data source and three tables over in this data source. So what's the efficient way to plan that and push that all out? That level of sophistication I suspect is not in the Hadoop foreign data wrapper because that level of sophistication is still being worked on in the Postgres foreign data wrapper that lets you talk to another Postgres database. So that I don't think is there, but I suspect that for a lot of these tools, it's probably not a really germane question because a lot of times they don't, I don't think Hadoop even has the concept of a joint. Maybe Hadoop does, but I know a lot of the most equal tools, what's a joint? You're pulling an individual document or an individual key. I think there was some questions over here. No? Anyway, awesome. Thanks, guys.