 Good morning, everyone. My name is Peter Gagan. I work for Heroku, and this is my talk. It's called Upsert Use Cases. Let's talk about a new feature in the just released PostgreSQL 9.5. This particular feature has been, I would say, a long time coming, and I'm quite relieved that it was in PostgreSQL 9.5. So let's get started. What is it? What is Upsert? Upsert, incidentally, is a sort of a format to update or insert. It's a way that we casually refer to this feature. So the idea is that you, as you don't know, you insert a row or on the basis of an existing conflicting version of that row being present, you instead go to update that existing row. PostgreSQL 9.5 also added a similar insert-ignore type feature that rather than updating, would have the insert statement, which we'll see the syntax of in a moment, not insert anything at all on the basis of there being an existing conflicting row. And the big picture here is that you don't have to worry about concurrency. You quite simply write the Upsert statement and you leave it at that. You don't have to give too much further consideration to race conditions concurrency. As we'll see, this is important. Thank you. It alleviates the burden of writing what can be fairly subtle code. So what I'm taught, what I specifically mean is that there are various errors that may occur in similar features in other systems without you perhaps expecting them to occur that we have void it. So sometimes I've used this more formal terminology that I myself invented as jargon, the so-called fundamental Upsert property. And these are the errors I specifically refer to, what I called unprincipled deadlocks, which means you have two sessions that are Upserting at the same time and they deadlock with each other, but in a rather unconventional way. Specifically, there's no user-visible mutual dependency. You as an end user cannot very well say, oh, I know, I'll just reorder the statements within my transaction and that'll make that go away. So it wouldn't have done if you simply had to live with deadlocking of that nature, even though you had no such conventional mutual dependency. You didn't have two updates that you could reorder to make them consistent in every transaction because there's two locks involved in this one statement. So that doesn't happen. And that's known to happen with other systems, which we will have a little bit more on shortly. Also, no spurious, unique violations. The whole point of an Upsert is you insert on the basis of that there being an existing row, by which I specifically mean a row that has the same value in a constraint that you specify, then you ought to go update or ignore whatever it is. But you certainly don't want to have a unique violation in that unique constraint or index. The whole point is that you're avoiding that. That happens. The feature, in my view, is broken. So that also doesn't happen. This may seem almost obvious that this should not occur. And yet, as we'll see, there are examples from other systems, whereas these things do regularly occur in the event of concurrency. So the syntax is sort of driven by an insert. I'll show that shortly. Like I said, this is a very widely requested feature, mostly for all OTP and web app sort of use cases. Now, the syntax. So what you see here is that I've extended the existing insert statement. I've added a new clause that makes all this possible. So in fact, yes. I'm afraid not. There was an AV guy. Actually, I think he said, don't turn off the lights during the presentation. So he specifically told me not to. I think it's being recorded. That's why. Okay. So the syntax you see here, first of all, we have a fairly basic example of an absurd, a conventional example. So here there is an insert of two rows, the one that gives them all transloval, and then another that says associated computing. So this is a simple case where we will either insert one or both of those, or zero or one or two of those, or update them alternatively on the basis of specifically a unique violation in this is probably the primary key. See the way it says on conflict, did. That did is the primary key. We're saying that's the constraint. We are taking this alternative update path on. The second example shows what I call before the ignore variant where we're, rather than updating, we're simply doing nothing on the basis of an existing version of that row. We're leaving it as it is. And the third example is a slightly more worked out absurd example where we are inserting or updating one row, and we have an additional constraint on what we update the row on. We're also signifying that in order to go ahead with that update, the zip code is not equal to 21201. And if it doesn't pass, the update doesn't go ahead, of course. The update will still lock the row with this variant, with absurd, rather, which is not true of conventional updates. They won't lock rows that were not affected, otherwise you'd lock almost every row in the table, whereas this will, this will reliably lock the row in the event of you not updating. Okay. I'm highlighting the excluded pseudo table here, and this is a way of us, a way that we can reference rows originally proposed for insertion. So it will be inconvenient and possibly, you know, not even possible to repeat the values from the insert, because the first one certainly is a multi row insert. So having the excluded pseudo table allows us to, you know, it feels like a join almost on what you originally proposed for insertion. So it's just a convenient way of referencing what you have there in the first place that they're repeating it. And obviously, because this is, as I said, a multi row insert statement, you know, would be rather inconvenient to do it any other way. Okay. This, incidentally, does carry forward the effects of the four insert triggers, which may have contributed to the rows taking the alternative path being excluded from insertion. So update, the update statement in Postgres in general supports an update from where you're joining against, you're joining a target table even to update against another table, and you are updating one from the other. So it kind of feels like the excluded table, it sort of feels like that in an update, except it's implicit that the excluded table is available. And obviously it's a special sort of table, a magic implicitly defined table. Okay. This is, this highlighted syntax is, the did column is the, this is used to infer, as I say, or as we say now, the unique index that you are interested in using to arbitrate, whether or not you take this alternative update path, this will figure out what, which of any of the available unique indexes you had in mind. Dealing with various subtleties. Makes your intense clear. So, specifically what it does is, if you happen to get a duplicate violation and any other defined unique indexes that you did not infer here, then you will in fact get a conventional duplicate violation. This is a problem with, still with the MySQL implementation, where you just better make sure that anywhere you get a duplicate violation is where you have expected to get one. If, for example, for whatever reason, you end up getting a duplicate violation in some other unique index, then that implementation will, you know, spuriously update, possibly causing something you might describe as logical corruption. So we avoid all that by offering this and requiring you to use it, requiring you to be explicit about exactly what you mean to update. Okay. This is much better than naming an index or a constraint directly. It's more robust than that. Columns can be in any order. It could be a composite index, certainly. You could have, you know, you could be inconsistent in the ordering and that would just work fine. There are other sorts of advanced abilities which we'll see in a little while. So it's going to figure out which, you know, which unique index you want to use and implicitly which ones you don't. These, incidentally, these unique indexes that you infer will appear in explain output if you attempt to explain, or if you go explain the insert or update statement, then you'll see precisely what actually was inferred and other details like that. So insert on conflict do nothing also supports exclusion constraints. Exclusion constraints are sometimes called a generalization of unique constraints. That means that they, if a unique constraint is a constraint that enforces equality, that is, that there can be no two equal things in a row or set of rows at the same time. Exclusion constraints generalize that to other operators. For example, the overlaps operator. So this could be useful in a domain like event management. We're in this room right now. We are occupying this room. If we wanted to have not fixed time slots, but time slots over continuous ranges, arbitrary, you know, arbitrary slices of time that things could happen in a room, an overlaps operator would determine if something overlapped with something else. But by having it, having that represented as an extrusion constraint, we can enforce much like a unique index that no one can use the room at the same time and have that work in a robust way. And so this will work with the do nothing variant, but you may want to spell out the name of the UD constraint yourself directly by using a syntax which I have not shown, which is basically on conflict, on constraint, constraint name do nothing. Okay. One question I get a lot is why did I not go about implementing merge? For those of you that don't know, merge is a feature in a number of other major database systems described by the SQL standard. It's a DML statement, which is often thought to be comparable to what I'm talking about, but I would suggest that isn't really the case at all. It's kind of a way of combining a bunch of insert, update, or even delete statements. So you're sort of, it's defined as joining two tables together and inserting updated or deleting one of them a target from source, useful certainly in data warehousing, that kind of thing where you want to reconcile two tables. But importantly, merge has implemented and as described by the standard, there's no counter example, does not provide guarantees around definitely getting an insert or an update. There's plenty of evidence that people expect this and are often disappointed when in production it doesn't turn out that way. So I was able to add some amount of merge like flexibility to what I came up with, but it's not equivalent to merge. Merge has all of these various problems and I think on the whole I was happy that I was able to offer something that I feel gave people what they really wanted. Okay. This is, can I get a show of hands? Does anyone recognize what this is? Anyone? Oh, sorry. Yeah. Visibility is not so good here. Okay. What you're seeing here, and maybe this will jog some memories when I describe what it is. Yeah, that's pretty bad. Yes. It's showing you how you ought to do in previous versions how you ought to go about correctly implementing more or less what I've described that you definitely get either an insert or an update. And what you can see here is that this is like a PL, PG SQL function that involves a sub-transaction. So initially you attempt an update. In the event of that not affecting any row, you go and insert in a sub-transaction which may itself have a dupf violation which you can catch in the terms of sub-transaction and if that happens then you try from start. So this is pretty ugly, but it does work. It does offer that useful set of guarantees. So this is what you would have been required to do previously. And a lot of people weren't doing this anyway. They were doing other things that maybe seemed like they were safe, but in fact were not. But this would be the safe way of doing it that is correct according to my own definition prior to 9.5. This is error prone. You can have problems, for example, with getting a unique violation in the wrong unique constraint. I already talked about why I think that's important in general. You could even have, say, a trigger defined on the table that affects some other table which may itself raise a duplicate violation. So perhaps the spurious violation isn't even on the same table. There are a number of subtleties like this that must not be missed when using this particular approach. So I mentioned just a moment ago how Merge doesn't offer these guarantees. I'll now show you how to make Merge do just this to make it more or less offer these guarantees. And this is an example from Stack Overflow. This is an Oracle Merge statement. Again, this is rather difficult to read here, so I'll tell you what it says. It's an example of someone had a Merge statement and essentially said why doesn't it do what I just said it ought to do? That is, why do I sometimes get a spurious violation, unique violation? I'm not expecting that, isn't that the point? And someone here proposes a solution that was well received on the Stack Overflow page where they do what I showed in the last slide was Postgres. They have a sub-transaction, a transaction they put the Merge in and they have to handle all the errors. They're catching errors there at the bottom in that exception block. So in fact, all they've accomplished over and above what Postgres was doing in earlier versions was to combine an insert and an update into one statement, which is really in the main, not what people are interested in. People are interested in simple guarantees. So I guess this illustrates in my mind what the problems are with Merge. It hasn't really, as we see here, hasn't really saved this person any trouble at all. So in summary, on Merge versus Upsert, Upsert is guaranteeing a certain outcome, insert or update, whereas Merge, quite a different kind of different thing, it operates according to the same rules as individual insert, update or delete statements just combined together. In order to make Upsert have these various guarantees, I had to essentially create new exceptions to the rules about how NBCC, a core mechanism for transactional isolation in Postgres, works. I had to suit the purposes of this statement. I had to create special cases there. Merge does no such thing. And that's sort of what I'm getting at here when I say there's a fundamental difference because Merge doesn't make any guarantees at all. It's good for bulk loading, it's good for data warehousing. When you're going to drive everything with a big join, because maybe it's a conventional join, certainly nothing is required to go through a unique index. In fact, maybe there isn't even a unique index defined on the table. After all, in data warehousing that could be very common. If you're doing a bulk load before, you might then only create the index afterwards. So quite a different thing. This is a MongoDB bug. I'm not in the business of bashing MongoDB already in my style. In any case, I don't know very much about it, so I wouldn't be very well qualified to do that. But I'm pointing at that, and this bug is current today. I did check it again. So this bug still exists and has been acknowledged by MongoDB as being a valid bug. It concerns a regression in the latest version of MongoDB due to their, presumably due to their wired Tiger storage engine having more granular locking than before. So what we see here is people all of a sudden getting duplicate violations where, you know, they're supposed to not have that happen. That's the point, that you get an update, not a dupe violation. And so this is a regression that they have, which they're working on. My SQL, I think, similarly had problems of this nature in the past. I believe that those are sorted out now. But I guess when I was working on the feature, I looked into problems people were having with other systems with you to bring those. So it's rather subtle to get these details right all at once. Yes, yes. No, it's not. No. Merge is in the SQL standard. This is the, it tends to be the case that when people, when vendors, when implementers implement something like this, they'll use their own flavor of, or their own syntax. So my SQL has its own one. We have our own one now. Terrig data has its own one. There's a number of other, I think, is it Firebase? So there's a tendency, perhaps because of these sort of subtleties for them to do it, in each case, their own way. So yet another non-standard syntax. I don't see that the SQL standards committee would, I mean, they don't even, the SQL standard doesn't even acknowledge indexes. They don't mention indexes because they would consider that to be an implementation detail. And so if they don't mention indexes, you know, it's not, they don't concern themselves with, you know, not unreasonably, they don't concern themselves with that stuff. So it's kind of the thing that they, I think they would never do, basically. Okay. I know a more general summary of Upsert. It offers what I would call a very flexible syntax, which I'm very happy with. We got all the edge cases. We fixed them. So some of the things I've talked about here have occasionally maybe sounded a bit complicated. But ultimately, it's all about guarantees that you, as an app developer, can leverage. Making it easy to use correctly, hard to use incorrectly. We may still do merge in the future. I'm not opposed to that. It does have its uses as I've acknowledged. And yeah, I think that's about all I have to say on merge in general or on Upsert in general. Okay. So this point, I will take any questions on Upsert in general if anyone has them. Does anyone have any questions about anything I've said so far? So I'll now discuss new use cases. Some less obvious uses of the feature. In some cases, using things that are unique to Postgres, which I think are notable for data integration. What is data integration? For our purposes, it's sort of a requirement to integrate data from diverse sources. These are often autonomous or distributed sources, which is to say that we don't control how they might be formatted. It might be that a vendor does that and it's not practical to get them to do it the way we would prefer. So maybe there's inconsistencies across these various sources, that kind of thing. This is very common. This is just the way of the world. And how those are represented might be consistent too. Maybe, for example, it could be that they're using different actual character encoding, but it might also be something like maybe they're using metric units because it's a Canadian or European supplier, that kind of thing. So these are all inconsistencies we must account for. Maybe this is something like a common separated values file or something terrible like that. So these are sort of practical real world scenarios, not especially glamorous, but I think common. Now, in this sort of complicated landscape, it's important that we have, or sometimes it's important that we have items, which is to say, in this case the absurd statement can be executed multiple times, so they're changing the result beyond the first application. So typically you'll be able to write the absurd statement such that it doesn't especially matter if it's executed multiple times. That's something that can happen from time to time, so that'll be just fine if you write the statement correctly. If we need to reconcile the table in a live system, I mentioned before special visibility rules, how I've changed in very specific cases how MVCC works, that could be necessary here, that could be particularly necessary here because you could have multiple concurrent processes doing this sort of thing and you can't really sort of tense the tables, make sure nothing is happening from any other process at the same time, that's really not practical, and there's probably other various messiness as there always is, so my perspective here is when you're dealing with all this complexity, just be careful with bad plans. It's certainly not going to scale all too well. ETL is always complicated, but meshing all this complexity together can be sometimes very difficult, very challenging, so by having all these guarantees, you can sort of leverage them to the hilt. We've got a few basic good primitives that you can use and you can get a lot of mileage out of them and you don't have to be an expert to get all this benefit, and because it's easy you sort of get it automatically. So I feel this is a very important goal for Postgres in general, so I think I've done that here. Now, getting a bit more into the example, this is something calling an accidental distributed system, which is... It's not something like involving a Apache Kafka or some kind of message bus system, something fancy like that. It's a more day-to-day kind of example. So this particular example involves retail systems with disparate suppliers, with ad hoc text file formats. Some of these may sell the same product without necessarily being all from independent manufacturers, so they could be selling, multiple suppliers could be selling the same product essentially. This is a standard boring sort of business information system, and this requires periodic syncing with regional suppliers. These are basically autonomous systems. So the end retailer needs to maintain details of each product with autonomous systems of non-obvious... It's not obvious that one is started in respect of what values, for example, a product description is... So there's various subtleties because of that. So this is an example a little more worked out. I'm creating a products table here, a suppliers table, and a supplier prices table. So suppliers perhaps don't have a particular interest in making this easy for you. They would have you buy everything from them for practical reasons, economic reasons you're not inclined to. They don't care about distinctions that the end user, the retailer, naturally cares about. They're not especially inclined to accommodate the requirements that person may have. So we need to work around that, okay? So this is an example, an absurd statement. So what you're seeing here is we're downloading a new comma-separated values file, inserting some rows into a product, and then potentially updating, but on the basis of the existing data that's being less than the new timestamp that you see there. So it's kind of like your update will only go ahead if that data is considered authoritative on the basis of... on the basis of it being the most recent according to that timestamp. So it's sort of like to make it more technical. The update sort of has its own snapshot, which is the timestamp that you specify. That's going over people's heads. But again, the whole point here is that you just... there's semantics it provides. They're not especially complicated, although the details of how you might avoid problems are more complicated. You get that facility and you use it, and you don't have to be an expert. It just works. If you're trying to do it the other way, there's significant potential for problems to arise. All right? So, you know, as I said, if we make all this easy, if we make it just happen, then even people who are, you know, not liable to... you just automatically get all these sort of benefits and they will accrue, I guess. All right? So, you've seen how UPSer can selectively update a row. We'll look at more advanced capabilities now. This is where a lot of the detail work went. Your favorite existing feature should work seamlessly. There's a principle in PostgreSQL development that all features must play nice with each other. So, you'll have a hard time finding any case where we failed to make the new UPSer statement work with an existing feature. Okay? So, I'm going a little bit back to the example in making this further point. So, consider that you might have a system where products need to be... or other objects that represent the database need to be logically deleted, but not actually deleted. Maybe you have historic sales records that need to continue to refer to the products. You have line items that are sold years in the past. Even though those products have logically been deleted, they still need to be in reports and so on and so forth. And yet, you probably don't want to display them when you do a generic search for the product. However, you can't very well delete the product and expect to, you know, have them in... Sorry, excuse me. In the first place, you can't delete them because they're simply not going to be available for the report, but it's also true that if you... you don't set a difficulty with constraints in that you... if you have a flag that represents the Boolean column is deleted, there's a difficulty there with making sure that you still enforce constraints, say, on the SKU, the BAR code, whatever it is, the supplier code. If you still have the product record, the old product record in the table, it can still happen that you need to create it again because, you know, you need a new representation of that same product for whatever reason. It does happen. So you cannot very well have both records in the same table if you naively have a unique constraint. However, Postgres is a good way of working around this. You could use a partial unique index. You could have it, say, you know, create a unique index on column where is active. And so when you set is active to false, thus logically deleting the product record, at that point you will have then... you'll then... it won't matter when you go to create a new version of it because uniqueness will only be enforced among those rows that are currently active. So you get everything works basically. But the question then arises, well, where does this leave absurd? And that too is supported. So this is slightly... I don't know how visible that is. So you see here there's a where is active. That is distinct from a predicate on the update. Notice before the update, not after. This could also be a predicate afterwards that just isn't. So this is saying when we go to look for or infer unique index, it's okay if it only covers those rows where is active. So that partial unique index will work just fine as you would expect because we spelled that out. Again, this just works. I also already mentioned this phrase. I was fond of making it easy to use correctly and hard to use incorrectly. I think I've accomplished that, we've accomplished that. It's a little bit unusual that the presence or absence of a constraint affects the semantics of a query, but that just needs to be the way it needs to work. So we sort of end up backwards to make it safe. Having offered this inference capability, we ought then to make sure that we don't add further ways to yourself in the foot with the inference capability. So having made people use it, we better be very sure that we haven't given them more problems than we've prevented. If the cure was worse than the disease, that would be pretty bad. So there are further edge cases that I'll talk a little bit about. I mentioned earlier you can explain. Well, in general, not everyone realizes you can explain an insert or an update, but indeed you can. This is an example of explaining an insert, showing the novel aspects of it. These are highlighted. So you can see the arbiter index. In this case, it's called UpsertPkey, the primary key. This, of course, is what determines whether or not we had a would-be dupe violation, which is grounds for taking the update path so it's specifically limited to that one here. It is possible sometimes that you would have multiple such equivalent unique indexes. For example, you're doing a migration. Maybe you are replacing an index because it's bloated something like that. These things do happen, and we need to worry about the rare cases a lot because when it does come up, even if it's very rare, it's very surprising if you have a spurious, unique violation. So you can create unique index concurrently and end up for a short time with two exactly equivalent unique indexes and a number of other possibilities. But then, if that happens, this will figure out both of them. You'll use both of them. Of course, usually that won't happen, but if it does happen, you certainly don't want to have errors in production all of a sudden. So, again, easy to use correctly. This is a partial index example. It's a similar example. I'm showing two arbiter indexes here. I'm going to talk more about advanced features of inference. I haven't said so yet, but it also supports expression indexes. Postgres will allow you to create indexes, including unique indexes on arbitrary expressions. For example, you could do that to create a lower column name expression index. So everything in the index will be lower case, and if it was a unique index, then you'd be there by enforcing uniqueness over lower case versions. You'd thereby avoid not recognizing... Well, you would recognize as equivalent cases where the only differences were in the case of the text you were inserting. So I mentioned forgiving of ordering redundancy. Predicate works on unique indexes that satisfy it, as we say. So you could still have a non-partial unique index, and if it happened to be... If it happened to be satisfied, or it happened to satisfy the inference specification every other way, it would work fine because it would cover everything, which is sufficient. It doesn't just cover some subset that you spell out. It covers everything, so that's good, too. Again, if you added a partial index, if you're doing a migration and production, you have both at the same time for a little while, it would just work. So inference should be used in preference to, as I said earlier, the on-conflict-on-constraint-constraint name variant. That's really kind of an escape hatch. We don't recommend you use that. And as I mentioned earlier, exclusion constraints require it because there's no way of inferring an exclusion constraint. You have to spell out exactly which one you mean, but in general, best avoid doing that. Okay. So I'm going to talk a little bit of other advanced features. So Postgres, Foreign Data Rapper. Who knows what Postgres, Foreign Data Rapper is? Okay. I guess, like half people. For those of you who don't know, it's a way of, it's an extension that Postgres offers that allows you to create foreign tables that more or less appear as tables in your local Postgres QL instance, but in fact are maintained by another Postgres QL instance that you connect to. So it's creating the illusion of having those be local tables with a high degree of accuracy. So this will itself work with Upsert, but it'll only support the do nothing variant. And actually that's simply because you're required to spell out what you mean or constraint you have in mind when you use the do update, Upsert. So because foreign tables in general have no concept of a constraint, for that reason alone, it does not yet work with Postgres, Foreign Data Rapper. But the do nothing variant when you don't ask for a specific constraint, when you don't require that it be limited to that, when it's everything, which is probably the common case with do nothing, then that will work just fine with Postgres, Foreign Data Rapper. Anyone heard much about ORLS, the row level security feature? Also a new to 9.5 feature. Is that something? Okay. Can I get a show of hands for that? So people know about that about the same proportion. So this is a way of... So the existing permission system in Postgres is basically column-orientated or table-orientated, where you have permissions on columns. Can I... Can I see the data? Can I update the data, et cetera? ORLS basically offers similar, although somewhat different capability with row, at the row level, where you can also limit which rows are updated. So this is something that is a major item in the 9.5 release, which this will also play nice with, more or less, transparently. There are some... There is a little special consideration to how this must work with ORLS, but not too much. So it's nice that we were able to offer those both at the same time. They work very well with each other. Updatable views. Chara, as the name suggests, a way of defining a view and having an update on that view automatically be rewritten such that it updates the underlying reference table if the system can determine that it's equivalent to such an update. That can happen transparently. Also, we'll work just fine here. Library. If you just install PostgreSQL 9.5, it's part of the core release. There's no... It's not a library as such. No, no. It's part of the core system. It basically works by... Typically what you do is you have a column that has the name of the user. Can I consider a database? It doesn't have to work that way. You have to create a policy on the table and that forces any query against that table to have an extra bit on the predicate essentially limiting what it can see or what it can update. It could be that an update is not allowed to see rows. It could also be that upon attempting to, it throws an error. There's a lot of complexity and variation there depending on exact requirements. The possibilities are considerable but basically, it's part of core distribution. Logical decoding is a big feature in PostgreSQL 9.4. It's a way that we can do logical replication. A logical decoding plugin can potentially support for any arbitrary purpose. It can produce... It can consume rather logical change set records that can be used for any number of purposes. The way this will be reported if you do an upsert, it'll be reported reliably as an insert or an update to the row. It's not possible to reconstruct that it was an upsert. It looks like inserts and updates if you are consuming these change set records from your plugin. You're not able to reconstruct but probably not. Certainly not for logical application. What you didn't see also is that the on-conflict-do-update statement, the syntax, is totally unrestricted in its structure. Merge won't allow you to put sub-selects in any of the handler clauses that it has, whereas this has virtually no restrictions at all. You can, for example, have sub-queries in the update. It could be correlated sub-queries. They may reference the excluded pseudo-table, et cetera, et cetera. So it's really very flexible in terms of what you can you know, in terms of what you can throw in there. There's an awful lot of flexibility there over and above what... Yes, sir? Yeah, I mean, you're certainly not restricted in that you must use the values statement. You could, for example, insert into whatever on-conflict-do-update, and then you could do select-star-whatever. So you can select an arbitrary thing. In general, you can select an arbitrary number of rows to insert, and so it follows that you can select an arbitrary number of rows to insert or update because that's just an extra clause on insert, right? Well, that's one possibility. So it's a two-to-nine-five feature. It'll let you, for updates in general, are now nest-sport updating multiple columns from a sub-select. So in other words, you could do update-table-set in parentheses ABC equals select one, two, three from some other table. So you've got a sub-select with multiple columns updating multiple columns all at once as opposed to, for example, having multiple such sub-selects, which is messy. So that can all be combined into one nice little one. This will, of course, work just as well with insert-on-conflict-do-update. That's pretty good. There's also some of you will know that in general, insert or update statements in postgres support a non-standard returning syntax, although in fact Oracle has a similar syntax. This allows you to show rows after they were inserted, updated or before they were deleted, in the case of delete. So you can insert something and then you can see what you actually inserted. What would you want to do that? Well, maybe you have, for example, a serial, an auto-increment sequence and it's not you can't really determine ahead of time what the primary key value will be. Well, that'll be right there if you do return and you can get that back immediately. This will work with an insert-on-conflict-do-update statement the same with inserted rows, of course, but with updated rows it'll show you those as well. So you can see inserted or updated rows returned to the client as if they were extracted more or less. So, again, no reason why that ought not to just work, which it does. I guess we spend some time discussing whether or not we should show the updated rows as well and the decision was that you ought to and so you do. Okay. So, because of that, you can, does anyone know what a writable CTE is in Postgres? Like a with statement? Okay, so that's just some of you will perhaps know that in other systems there's always been, or sometimes there was an ability to have what is essentially like a temporary table that exists for the duration of a statement. So you do a with something as when you define it could be, you know, it's another statement selecting something from something else and then there's a main statement that appears underneath that that can reference the CTE as if it were a table. This is a nice way of breaking up the logic of certain complicated things by sort of pipelining them. So, anyway, returning, which I mentioned just now, can work with this in Postgres which is, you know, a unique Postgres thing that's not in any of the other systems that do have with statements. So you can do things you're inserting or updating across CTS as well. So, again, immense flexibility, very useful stated integration. I'm really looking forward to seeing how people use it. Okay. Partitioning mostly works as before. That's not significantly different. Probably not going to specifically note that. That's all I have to say on the subject. That's any questions? Any further questions? Yes, sir? Yeah, I always get that question. Which is the answer I is no, but let me just explain why that is. Basically, if you do an absurd what if, so that did work, so that was offered. What would happen if both were violated at the same time? That's where you get the complexity of that, you know? If they were both violated at the same time, then you can't really do both alternative actions at the same time, right? Now it is true, as I mentioned, that you can technically, although it's not really intended to be used in a fancy way, you can technically use two unique indexes as arbiters at the same time, but only if they're more or less equivalent in the event of migration, that kind of thing. That works by just looking at the first one and because they're assumed to be equivalent. The first one has a problem, we don't even look at the second one. It's not clear how that would work if you had multiple violations at the same time, and both there are a lot of funny interactions there. The only same way to do this seems to be to make sure that, to look at one specific unique violation and use that as the absolute authority as to whether because things are so ticklish in terms of objective truth about what's going on, because the thing that you take the alternative path on may not even be visible to your statement. If it was selected, the rows may not even be returned, and yet in order to support these guarantees it has to reach into the future, if you know what I mean, because the other thing may not be committed initially. That's a bit funny, that's kind of weird. The only way to clamp that down, that complexity down seems to be have one particular unique index, and that's it. So I don't think it's possible in principle to do that unless there were some pretty hairy semantics about what happens when both. Does that answer your question? Yeah, well a practical solution would be to have two upstart statements that works fine, but it is something that people seem to often ask, so maybe I should start saying that before. Okay, that all I got, I guess, oh no, no, all the bugs are fixed. There hasn't been any bugs so far in 9.5.0, how long has it been out for? A few weeks now, so. That's my story. I know, I mean it's, there was really not that much trouble with the two list upstart actually, I think it was actually not really in the category of things that were any sort of problem with the two list, so. Yeah, we got through most of the we got through them pretty quickly actually, they were sort of garden variety bugs I would say, like, you know, with fairly well defined things that we did to correct them, so it went fairly smoothly actually. I guess that's is that it or okay? Thank you all, thank you for coming. One, two. Okay, so I'm not going to just go right on top of your ears. Sure. Excellent. So it's now okay to screen everything correct? Should we start then? Okay, let's make sure there's nothing to ring. All right, so let's let's get started. I'm sorry for this initial delay because it's usually stuff on presentation sometimes. So I'm very happy to see you all here. What I'm going to be talking about today is an open-source solution that we have built called TorDB and basically it presents a way of running no SQL loads on top of a relational database. So this is something that can be used if you're like, for instance, you're a DBA you're running a relational database and you would like to offer to your users some no SQL interfaces. So it's more than just having an instructor data. This is something that's already possible on many relational databases like especially Postgres. It goes far than that. So I'd like to talk about this issue. A little bit about myself before I work for a company called 8K Data. It's a research and development company based database area. So we try to come up with crazy stuff and this is one of those crazy ideas which we came up with. It happens to be right now our main line of development. So this is our core product right now. I also work as a programmer. So I'm both kind of a DBA programmer and I'm based in Spain in Madrid where we also founded the Postgres which has become a really large Postgres community. We're very proud of that. More than 500 people currently. And this is a little bit about myself. You just want to connect with me, find me on Twitter or LinkedIn or any place if you have any question after this talk I'll be around but just in case you want to write me this already coordinates. So let's talk a little bit about this problem that you may face. The world has definitely changed a lot you know, 5 to 10 years. A lot of things have changed and databases are no special. I mean in general terms technology has changed in so many areas I'm not going to talk about that. But in the database field it's been kind of a revolution in the last 5 to 10 years. There's been like 40 years where the technology of relational theory has been developed and advancing and advancing and advancing it has become probably better and better and better and more stable. The players were more or less the same the databases were more or less the same the market was more or less the same but now in the last years everything has changed. Many databases have started to appear new technologies have been introduced I don't know if you can see can you see this picture? Me neither. I'm quite close. I don't know if we can turn off some likes maybe or something or maybe maybe those of you who are in the back may want to move to the front a little bit I would say that's not bad. You can sleep if so you want. No, probably not. Anyway feel free to move to the front if you want I think it's going to be better. You can do some extra exercise too. Well, this picture this picture is just an event and in 2005 there were no likes or no pictures taken by biophones and tablets as in 2013 where everybody was with their own phone and tablets. Things have changed lately. So also in the database feel as I was saying so let's imagine you were a happy DBA you were running your stuff, you were running your users you were even happier if you were using foscriss, right? You were taking some time and you were buffing your users when needed. And so it was a happy life, right? It was easy. The only problem that you have to deal with is basically those programmers who haven't met before Mr. Bobby Tables. Right? So it was a happy life. Now what happened? That at some point no sequel came into place and people start yelling at you hey we want no sequel hey we want MongoDB and problem started our peaceful life was kind of ruined, right? And now we need to we start defending like this is kind of crap, you know, durability, concerns, transaction, you know a lot of explaining. So rather than doing all this why can't we say to your users okay, do you want MongoDB? That's okay. You want no sequel? Sure. You want on video? Whatever that is. Sure. You can have it. And this is the problem is that we are trying to deliver today. And the way to do this is not by, you know, installing MongoDB or Cassandra or any other no sequel database because that's a huge problem. It means if you need to install a new stack that means that you need probably new servers, you need new support contracts, you need probably people trained and certified and you need new backup procedures and security measures, new firewall rules, new network rules and of course once you have done that you still have a lot of problems like how do you synchronize data from one database to the other one? How do you make that consistent? How do you make sure if one user or one application is querying data source that the other one is going to be up to date too? How do you move data between those? How do you ensure consistency? So this is a huge problem. Basically answering this question satisfying these needs, these requests to have no sequel on your data center probably have experienced that. It's a problem. So fear no more. Here comes DoraDB and DoraDB is basically you can think of very simply as Postgres class Mongo on the same place. So no new servers, no new stacks, no new certifications, no new backup procedures. You have all that on the same place and it's open source. So let's put it very simply. What is DoraDB? DoraDB is a document database. It's a no sequel database. It's a JSON database if you want to call it that way. But happens to run on top of a relational database, data store. This is Postgres. And it's open source. And the most important thing is that it is compatible with MongoDB at the wire protocol level. Of course, that means also the API. This basically means that as long as we speak the protocol that MongoDB speaks, there's no need to to have, you know, some different drivers, some different tools, the same tools, the same drivers, the same programs that run on MongoDB. It's a little bit of at risk because of, you know, compatibility. But basically, all those work the same in DoraDB. It just happened to have Postgres behind. Since MongoDB is non-structure, the hierarchical data store, and Postgres is relational, how do you map that? How do we store the information into Postgres? Well, this is probably one of the most important aspects of what we've done with StoredDB. This is how do we store the information. And we don't just take the information and store it as a blob on Postgres. We could have done that. But we saw that there was huge improvement, a lot of benefits of not doing so, and rather transforming in some way the data from an unstructured way into a relational way. So, how do we do that? We take a JSON document, oh, you're not going to see this. So you take a JSON document, and we split that document into pieces where each piece contains at most one level of nested documents. You see, you know that JSON documents can be all the documents within those. So we split them into pieces in such a way that there's no nested documents inside a single piece of this JSON field. We call that sub-document. It's not a very original name, but anyway. Now, we analyze the type of a sub-document, because MongoDB or any NoSQL data store, documents in general, they don't have a schema, right? But they have types. Indeed, the MongoDB internal format, the BISN, it's typed. It specifies for each key what is the type of the value associated with that key. So we analyze the set of types that are within that sub-document, and that, if you think about it, pretty much looks like a table. So we have an attribute name, an attribute type, and a value. And indeed, that's a table for us. So we take each of these sub-document, find the candidate table which has the same attribute names and types, and store it there. Just store, yes, the values. What happens if we don't have such a table? Well, we create, we will create the table dynamically. So you don't have to create a schema beforehand. You don't have to do basically anything. You just use an empty database, connect to RGB, start inserting documents with MongoDB API, MongoDB tools, MongoDB programs, and then, MongoDB will take care of creating any needed tables, splitting the documents into these pieces, and, you know, basically storing the information there. Let's look at an example because it's going to be easier. Can you read this? On the back? All right. So this is a sample JSON document where it has, like, some levels of nested documents. So there is one field called name. I don't know if you can see the pointer, but anyway, for those who can. There's a field name and a field data and a field nested. These three form the root level of the document. So this is going to be one of the pieces in which we will split this document. Then there's another one here in horizontal called A with keys A and B. A42B Hello World. This is another level with no nested elements. So this will be another part. Then there's another one with J and deeper, which is inside nested, which is another level. It's going to be another sub-document for us. And finally, this one, A and B. So basically, the first step that we do is that we take this JSON document and split it into four pieces like this. The root level, which contains name and then both data and nested, which are going to be kind of placeholders. They're going to contain the nested distractors. Then we have A and B. We have J and deeper, which is another placeholder to basically say that there's a nested distractor there and the other A and B. Now, we find candidate tables for restoring this information, which match this column name and the data type. This is going to be an integer here, or a number, and column B and a text field. If the table is not existing, we will create a table automatically for you. No problem. The placeholders, we don't need to store them because that's basically a pointer. We're going to keep that information in another place. So this is how it looks like once we store this information into TorDB. We use some extra fields called D, I, D and index, which are not really important, but it's basically a way for us to pull back all the documents together. And then here's the information. MongoDB has an internal underscore ID field which is a 12-byte integer, 12-byte array, sorry, which is hidden, but it's actually there, so it also gets stored. So this is the root level where, if you remember, we have the name TorDB. As I mentioned, the placeholders need not to be stored. So basically, I just take the name and TorDB on the root table. This is data table. It's called P underscore 3. It's been created automatically by TorDB. You don't need to do anything. Then we have the AMB 42 and Hello World. And notice that this table is going to be reused because it has the same column names, the same data types, so both sub-document will be stored on the same table as we have here. And finally, the J42. So this is what we call the data tables. They contain the data that MongoDB document had before. Now notice a couple of things here. This is just one document. If we store, like, many documents like this one, here in the JSON document in MongoDB, all this A and B, A and B on the data types associated are repeated all the time. That's meta information that is attached to these documents. And we are repeating that only once. It's going to be on the table definition. So this means that the way of doing this, first advantage, we are saving a lot of metadata. We are avoiding a lot of metadata repetition. Now, these are the data tables. But how do we pull back together the document to form the original JSON document that we had? Well, there's an extra table called Structures. Basically, it's a tiny JSON document, small JSON document which basically resembles the structure of the original document. It tells us where is the root level, what are the nested documents, and some other extra information. So, basically, this structure says that the root information is going to be on table number three. To go back to table number three, we see that this is effectively the root level. Then it says that the root level contains another field called data, which is the document, which is stored on table number one. So data is going to be on table number one. We go to table number one, and this is the data. This means null, and there was no index here, which means that this is the data. And then there's a nested field which is stored on table number two. If we go back to table number two, this is what's inside the nested. And finally, nested contains a deeper element which is stored on table number one which is index one. And if we go back, we see on table number one with index one that this is the other field that was there. So with these data tables and this structure table, we are able to reconstruct the whole document back again to where it belongs in this unstructured JSON format. The good thing is that, again, we were saving before some metadata because it's repeated many times in a normal JSON collection. Now, this structure also happens to be repeated by itself. It documents, in a large collection, most of them have more or less the same structure, the same shape because they are alike. Otherwise, it would be really difficult to consume all the information in a NoSQL database. NoSQL means that you can change the schema, but you're not going to change it per document all the time because that would be crazy. How would you query that? All structures are repeated. And hey, in relational theory, we have, well, not in relational theory, but in relational databases, we have four in keys. Right? And that means we establish a relation. So we also establish a relation on this final table, which is root table, which creates an association between documents and structures. So we have many documents which have the same structure. We will not need extra entries here in the table of structure. So we have a document called SID, which is this tractor ID. If you think about it more visually, this is how more or less data is stored on a NoSQL database, basically any NoSQL database stores data like this. It looks a little bit like a mess, doesn't it? Well, it is because documents are just stored one after the other one. It doesn't matter what information they contain. It doesn't matter how many fields or nets of documents they have, they are sure, they are indexes. But for any reason, you cannot use an index. Then you are basically screwed up because you have to do a whole database or a whole collection scan to find your information because information is stored like this. After the process that we do in TorDB, the information is more or less stored like this. Just intuitively, you can imagine that it should be better because we classify documents to be more precise of their sub-document. We have analyzed exactly the data types that they have, the columns that they have, and we have put them in separate bins. Now, this is what we call partitioning by type because we are taking the data and we are classifying it depending on the type of the data and putting that on separate tables. That is partitioning, right? So queries will be more efficient to set or subsets of document types because we will only be looking at those partitions which refer to that data type, data structure. Now, this is a very typical question. In Postgres, there is JSONB. It was introduced in Postgres 9.4. It was significantly improved on Postgres 9.5. So why don't you use Postgres? JSONB type, data type. This is a data type for storing JSON. Right? Well, I guess that's a yes anyway. So, there are many reasons not to use it. I mean, don't get me wrong. It's absolutely cool. I love it. I love JSONB. It's one of the most important features that came with Postgres, but it's not enough for our purposes because our goals are, first of all, we want to get the data normalized. We want to get the data partitioned by type as just shown you. But JSONB will do the same thing as NoSQL does. Store every document one after the other. It doesn't matter what type with structures they have, they will store in the same way. It, of course, does not provide a NoSQL API. So the problem that a lot of people are facing in the relational world is that they are asked to provide a NoSQL API. So basically, they have to install Mongo. JSONB will let you store an instructor data, and then run a MongoDB program. TorDB will allow you to run a MongoDB program on top of your Postgres database. Plus, it is, again, it's not compatible with MongoDB, and it can also not replicate or chart as MongoDB does or NoSQL does. I think that scaling or essential scaling in Postgres and relational databases is hard because the set of use cases that they represent is very wide and doing a general solution for such a wide use case is very difficult. Now, NoSQL is using a smaller subset of that use case which happens to be easier to scale. It's not freelance, definitely. It has a lot of problems, but for whatever reason, people like it and use it and expect it. So I'll talk about that a little bit later, but TorDB implements MongoDB replication protocol, too. So you can replicate from a MongoDB. You can participate on a MongoDB cluster. That will not be provided by JSONB. And finally, JSONB is tied to Postgres. So far, TorDB runs on Postgres, but it will run on all the relational backends soon. So we want to make it run in different backends, too. I already mentioned this. There's a lot of metadata and repetition. If you look at the given collection, some documents have the same type, the same shape, and so this creates an overhead. If you use JSONB or NoSQL in general terms, you'll get a lot of this repetition and this costs you disk space, I.O., memory, insert buffers, and so on. So this brings us to what advantages does TorDB have over MongoDB? And the first one is related to this metadata repetition. As long as we are classifying the data and repeating the definition of those attributes and types at the table definition level, which means only one place, rather than every single document, we're saving a lot of disk space. And, well, honestly, disk space is not very expensive today, but I.O. is. And you can trade today disk space and price by I.O. So that means that, basically, if you're using class disk space, you can either pay less or you can pay the same and get a faster I.O. disk. And then, of course, you're using less memory. So this is really significant improvement. And, basically, if you look at the numbers here, where we compare MongoDB with TorDB, basically, TorDB just requires from 30% to 68% of the disk space that MongoDB requires. And this is not with compression. That's why we're using a MongoDB here This is before compression. This is not compression. This is just saving metadata, avoiding to repeat extra metadata that is repeated all the time on any NoSQL data store. All right, so this is the first advantage. Basically, I.O. The second advantage. It is surprising that NoSQL is trying to get back to SQL. That's why it's a badly chosen name, right? The problem is that either as much as they are trying to get back to SQL they're doing a poor job of going back to SQL, which is even more shameful. First of all, because what they call SQL it's not SQL. It's SQL-ish. Looks like SQL. If you look at Couchbase, and one QL, it looks like a lot like SQL. So developers are good for it. But it's not SQL. So tools are not that happy. Tools basically don't work. SQL tools don't work. BI tools don't work. GUI tools don't work because it's not SQL. So it's good for the developers. Definitely not good for tools. And then even if they were absolutely compatible with SQL, that SQL is just a tiny subset of SQL. It's the basic select where join, offset or something like that. It doesn't go further than that. And we know that SQL is way more than that. Right? Especially Postgres SQL is so advanced. So as a friend of mine says, hopefully you're not stuck on Windows 3.11 11 days. And Windows 3.11 is when SQL 92 standard came up. And so this no SQL database that are trying to get this SQL-ish language, they're trying to make it compatible with a subset of what SQL was in year 92. Which means basically you're not getting a lot of power. But Postgres has one of the most advanced SQL, most compatible SQLs available on relational databases. And guess what? It's free in our case. I mean, of course Postgres too. But I mean, it comes for free. As long as we're using Postgres as a relational database backend, you get all this powerful SQL. You import the MongoDB API and you can insert data with the MongoDB API, MongoDB programs, drivers, whatever. That's good. But then if you want to query in a more special way, if you want to use the power of SQL for querying data, sure, just go to the database. Don't go through our layer per se. Just go to the database and run SQL. Tables look a little bit weird. Might be. There are solutions for that I'll present to you later. But anyway, I mean, it's SQL. Just go and do it. And it's no SQL-ish and no subset. It's pure SQL. So that's why some time ago we introduced what we call TorDB views. And these are machine-generated. I mean, TorDB generates this, creates some views, which kind of try to pull back together all those tables in which documents were split by TorDB in the process of insertion and pull them back together so that they look like an entity that is easier to query. This views, so, well, basically, if we create something like this, which by the means of varying fields, it will create different tables in TorDB. So these two documents will require four tables in total. And if we look at the views created, it all looks like it's been put together. So it's very convenient for basically doing queries. And in order to create these views, you just need to issue MongoDB command, which is called createViewPath that we invented. So you won't find that in MongoDB, but you will find it on TorDB. You can create that from your MongoDB program. And then you can use these queries. We also created a new MongoDB program. You can run from the console or from the Java driver or whatever, which is called SQLSelect. And you can just put a SQLSelect in that and it will go to the database, issue a SQLSelect for you, and return the data as in MongoDB as a JSON document. So you have the full power of SQL at your hands with TorDB. You can even use tools. I mean, tools are compatible because, again, it's SQL. It's not SQL-ish. Then we have what we call Toro query by structure. If you remember, we are partitioning the documents by the type that they have. So this partitioning, if you issue a query to TorDB and that query cannot use an index, then in MongoDB you'll need to scan the whole database. Rather here, we will look at the structure of the documents that fulfill your query and only scan those tables. So you basically can get huge performance improvement. It can be as high as the inverse of the likelihood of your document. Anyway, you know, it's not from query to query, but it can be really, really high. Even negative queries, which are queries that return zero results, they can be resolved sometimes, but just by looking at the structure. And the structures are in TorDB. They are cache to memory. So we could probably resolve negative queries just from memory compared to a whole database scan in MongoDB, which is, of course, almost infinitely faster, as much as we want. There's, of course, the possibility of relational and noSQL data on your same database. Some people requested this. And basically, there's nothing we can help with. I mean, you can do it. You don't need any kind of support. Just put TorDB and your relational data on the same database. Probably in different schemas, just to make things clear. And don't write to TorDB tables on your own. You could, just in case you don't screw it up. But other than that, you could throw automatically generated data tables and then query both to joins between your relational and noSQL data. No problem. Then there is atomic operations. So MongoDB doesn't have atomic operations. Well, a single document is called to be atomic, because it's held by a, you know, an exclusive lock. So document, per se, document operations are atomic. But if you do a batch operation, and you try to insert 10 documents at once, it's not atomic. In MongoDB, basically, you have to iterate through all the documents to check whether they inserted correctly. If one failed, you have to probably, and you want that operation to be atomic, you'll need to delete all the documents that really got inserted. But then you'll need to check the results of that deletion, because it may have failed, too, on some of them only. And that's a big problem. Unless everything works well. It's of course not all the time happens. So, I mean, we know the value of atomic operations. In MongoDB, there's no atomic batch operations. In ToroDB, it was hard for us not to support atomic operations. So, you know, everything runs on transactions, and transactions are atomic units. So we get them for free. There's another point. There's a huge debate whether MongoDB supports clean reads or not. Well, the reality is it doesn't support clean reads. A clean reads means a read on a consistent view of the database, on the data. I'm not going to get into the details, unless, if you want to, just let me know, raise your hand. But basically, MongoDB runs on reads and committed, which means that, you know, it can pop up in the middle of your queries. And you'll... Is that okay? Yeah. And you'll lose... You'll see new documents popping up in the middle of your query. You can see a document twice on a given result. So... And, well, of course, Postgres, it took us just two lines of code, you can see at the bottom, two lines of code to implement clean reads. We basically said, hey, query transactions are going to run on repeatable read mode in read-only mode. That's it. Clean reads for free. Not for free, thanks to Postgres. It's a great database. This is very fun. You know, the MongoDB 3.2, which was released recently, and now support for a connector to connect to BI tools. That's why I said they are trying to come back to SQL based BI tools are better than non-existing BI tools in MongoDB, or no SQL for that matter. So they announced this connector. Do you know how it works internally with this proprietary, by the way, connector? It's not available on the MongoDB open source version. It's only on the enterprise version. But do you know which is the critical piece that makes that connector work? Yeah. It works in Postgres. Really? Yes. So I've seen this connector uses Postgres foreign data wrappers to convert from MongoDB to relational tables. Guess what? That's called TorDB. And performance, it's... Well, I can't speak because I have signed a license agreement which basically prohibits me of speaking about that performance. But it is dot, dot, dot. Really. So it works on Postgres foreign data wrappers. So... Well, when foreign data wrappers in Postgres are going to get better, MongoDB BI connector is going to be better. Anyway. So they need this connector. It's proprietary, it's slow. Oh, I said that. Well, whatever. And, you know, it requires Postgres. Why don't we use TorDB? There's nothing else to add. I mean, MongoDB... The BI connector is not needed. We have to work with Postgres already. So there's nothing else to do. All right. So what about performance? Because after all, we're doing a lot of stuff. We're receiving the document, speaking the Mongo protocol which is not native for us. We're transforming, we're creating for tables. We're checking those table accesses. We're creating tables. We're, you know, splitting the data out. So, I mean, there's no free lunch. There are no miracles in the world. I think there are a lot of advantages that probably outweigh the sum. Of course, there's advantages that TorDB has. Now, I don't want to play the benchmarking game because I hate it. So let me be quite upfront. MongoDB, when you look at MongoDB benchmarks, they don't benchmark what you're going to be using in production. MongoDB can have, like, many tunables. And the performance varies greatly from one tunable to the other one. Most benchmarking that you see is running MongoDB in a completely unsafe way, in which you will never want to run that way. You'll lose data, for sure. The database may be coming consistent. So, don't look at those numbers. Look at the numbers who run MongoDB on what is called safe mode. Imagine what would happen if you don't use the safe mode. With journaling enabled, I don't see any scenario unless, you know, for some very particular use cases where you don't want journaling on your database and with replication enabled. Because basically, if you don't use MongoDB with replication, I mean, if you use MongoDB on only one node, that's not a very interesting, you know, proposition. You're better off using Postgres. Or, yeah, of course, TorDB. So, when you enable all those things and then compare the performance with TorDB, honestly, with a very patched version of TorDB that was released, not published at GitHub yet, but, you know, released, like, six hours ago, the performance running IA Bench, which is a very standard benchmarking software for MongoDB results look like this. Higher is better. The line on top is Toro. Yeah. The reason, by the way, the reason why this line ends up here, that's the question, is because this is just inserting 300,000 documents so we required less time. So, the test is finished here. No. Oh, sorry. So, in other words, we are faster. This is pure insertion. So, if you wonder that, you know, all this stuff is going to be really expensive, it's not. And the most important reason why we are faster is because Postgres is fucking fast. It's not us. I mean, we're not smart. It's Postgres. It's really, really fast. So, if you disable journaling or if you disable replication, MongoDB is going to be faster. But if you use MongoDB in the same way that you would use it in production, we are faster. That's simple. This is MongoDB 3.2's latest version, so this is not, you know, an old version compared with ToroDB running on Postgres 9.5. It's significantly faster than Postgres on 9.4. All right. Let's quickly move over how we're doing with time. All right. Replication. I mentioned that before. I'm just going to put it simple. We're going to do a quick report replication. So one very interesting use case for this solution is rather than replace MongoDB, I'd be happy if you do it that way, but it's just to replicate from MongoDB. So let's say some of your users or your yourselves have a MongoDB replica set running already with live data probably used for all of the B. Now, you can set up ToroDB to just lose on the replication protocol, connect to the MongoDB primary node or any secondary node, which can also serve the replication data, and then ToroDB starts replicating all the data, live, like it's asynchronous, but it's usually fast, and you'll get another copy. And then you can run SQL on that. You can, of course, run the MongoDB API, too, but, you know, you can run SQL. So this is a very nice way of doing very cheap and fast ETL from MongoDB to SQL. Just use Toro. This is, by the way, this is present. It's going to be released as ToroDB 0.4. We're currently on 0.4 alpha one snapshot, but it's going to come anyway. If you look at the devil repository, this is on GitHub. Anything that's on GitHub, it's open source, AGPL. So just look at the development branch and all the code is going to be there, except for this latest benchmark. But, you know, it's going to be on Monday. So what about sharding? This is where everything becomes, like, really interesting, because we want to implement sharding on MongoDB, which means implement the MongoDB sharding part of the protocol. This is not done yet. However, and we want to do this in such a way that, you know, ToroDB will work exactly as MongoDB on a sharding environment. It will talk to the Mongo OS, or the coordinators on a sharding environment, and we'll talk with the other MongoDB nodes in order to move data around, you know, just to participate in normal sharding environments. It's not a very difficult thing to do. We'll come to the next version. Now, what if we, apart from doing that, it's, again, we're going to do that. What if we try some of the ways of sharding information, sharding data, within the relational world and shard below ToroDB level? Well, we can do that. One option could be, like, PG shard, the sharding extension that Citus data developed, and it's also open source. We're also considering all the databases that are already good at sharding at this level, like Green Plum, or Redshift, or Citus data, too. So, if we really pair this concept of sharding at the database level, with the concept of replicating from MongoDB replica set, what we're effectively building is a new way for NoSQL users to perform data warehousing, data analytics. So, basically, this is a technology that will enable data warehousing, for, as I say, those poor souls in the NoSQL world are struggling to do this. If you try to do data warehousing in NoSQL, you're basically out of game. It's so terribly slow. You won't believe it. It's basically unacceptable. That's why most NoSQL users are using tools to ETL from MongoDB or whatever to Postgres or all the databases just for doing the analysis. So, with ToroDB, you can just replicate the data because Torx speaks the replication protocol, get the data to ToroDB, and then use Backend, which is already sharding at the database level. So, we did some experiments, and we tried Green Plum. It's a great data warehouse. It was presented here yesterday. So, we did some benchmarks with that. Green Plum was open sourced late October last year, and we kind of already hiked ToroDB to work on top of Green Plum. It's not released yet, but we did some benchmarks. The goal was exactly this, to take ToroDB, connect to MongoDB, replicate the information from this replica set to ToroDB. ToroDB will talk to Green Plum. Green Plum will shard the data across all the segments, and then we run SQL. We'll run distributed SQL, we call. SQL that will get pushed down to all the shards, to all the segments in Green Plum terminology, and get back to the results. And then we will compare how this SQL sustained with SQL will compare to those results run on MongoDB. All right, ToroDB is very slow. So, you can already get the results. It's not going to be as surprising, but anyway. So, we took some data set, the full Amazon review data set, which is a data set obtained by basically web scraping, a lot of all the reviews of all the products that are in Amazon. Run on Amazon Cloud, you use C for Xlar, Xlar server, which is not a big machine. We want to show that, you know, this is what you may have at home. We set up four shards inside the same host so that network traffic is not going to slow us down. In MongoDB case, that requires three comping nodes. Comping nodes basically do no jobs, just metadata, so it doesn't add to the load any way they are required to run the cluster. So, you have to help them anyway. And four shards. And four segments in green plan. Same size, same memory, same everything. Data set contains 83 million records and 65 GB playing JSON file. So, if we import this of MongoDB and TorDB running on green plan and compare the disk space required to store this data set, the results are quite surprising. This is comparing Mongo 3.0 with wiretiger and snappy compression enabled, which is kind of the, you know, in terms of disk space, it's the same thing as MongoDB 3.2. Just didn't repeat the test, but it's the same thing. And on green plan we are using columnar store and compression too, to make this test fair. So, I don't know if you can read the bars over there, but it's basically this big bar is Mongo, this small bar is green plan. And this is the same thing. These are kind of more or less the same, the index size. The table size is significantly, significantly smaller from 20 GB to 70. That's more or less the difference. It's very significant. And why is all this? Because we are storing this metadata separately from the data, all the time. And when you take that into account and you store it in a columnar way and then you compress it, it compresses very, very well. Because we have aligned data. We have classified the data. We have put it nicely on their own bins. And now we have compressed those bins. So, it compressed very well. What about the benchmark itself? So, if we take some queries, and what we are comparing here, if you remember, what we are trying to show here is a query on MongoDB API versus a query on SQL, once those tables have been generated by TorDB and imported via replication. So, if we want to obtain like the distant reviewers of products, the query in SQL looks like this. Simple. Query MongoDB looks like this. Weird. But it basically does the same thing. So, this is query one. This is another example of the query. Please let me know which one you can read and understand easier. The one on the left or the one on the right. Especially nice is this MongoDB allow this use true. This is because MongoDB has a limit of 16 megabytes return on an aggregate query. So, if you have an aggregate query on MongoDB and that aggregation produces a result larger than 16 megabytes like you're doing a string concat or whatever, you're out of luck. You need to spill to disk if that's bigger than 16 megabytes. And it's really been automatic. You have to explicitly say, please use the disk. Well, so this is the benchmark. I don't know if the ones on the back can see this tiny, tiny, tiny bars over there. Yeah, that's TorDB and Green Pump. Basically, first query that we were simulating took 969 seconds to execute to complete on MongoDB in 35 seconds to complete on Green Pump. That's a 28x improvement. The second query took 1,007 seconds on MongoDB and 13 seconds on Green Pump to execute. That's a 75x improvement. The third query took 31 seconds on Green Pump to execute. We don't know yet how much did it take on MongoDB because it crashes consistently, very consistently. We don't know the error. We don't understand it. So, you know, this is the idea. This is a real enabler for data warehousing on NoSQL. So this is basically please go to GitHub, download the source, try it, read about it. If you like it, please start it on GitHub. And, of course, go and check out our FAC. There's many questions. Probably some of those are going to be asked now, but the rest of them are going to be there, so just go and check them out. And that's it. Questions? Thank you. Hmm. That's a good point. So, the way we have laid out the source code, it's very modular and has some abstraction layers. And one of those abstraction layers is the incoming protocol. So that's one layer. And so we right now have one layer called MongoWP, which is MongoDB wire protocol. Then it goes down to another layer called KiviDocument, which is an abstraction of a document. And we transform from MongoDB representation to an abstract key-value document store. And then we process and do all the sub-document stuff and so on. So it's not hard to speak another protocol and transform to KiviDocument. And then the rest of the thing will be exactly the same. So it's not hard. Now, this is prepared for... I don't think we'll run well with pure key-value stores. It will run better with document stores. Next on our list is going to be Couchbase. More questions? I don't know. The orders pop up. Okay, here. I hope nobody... I mean, this is still on development phase. So we're going to release Meet February version 0.4. And the version after that is going to be 0.8. We're going powers of two. One... Yeah, the first version was 0.1 and then 0.2. So this one is going to be 0.4. Then it will be 0.8. And very soon after, it's going to be 1.0. So it's not for production ready yet. But I hope it will. And there's many people just trying out non-in-production loads. More questions? Over there? Okay. So this is basically very very straightforward. Because first of all, there's no type change. If you remember from what I was saying before the metadata which specifies the type is associated with the document itself. It comes with the data. So it's just a different type. It's not that it changed. It's a different type. So the way we process this is that if you remember, we just analyze the sub-document and find matching data type and value. That combination will create a new table. In other words, will create a new table. Now, next question. Isn't that a problem? Are you going to be created thousands of tables? Well, sometimes yes. Sometimes yes, that may happen. If there is a lot of kind of combinatorial explosion of different keys and values and so on which sometimes happen, we'll end up creating a lot of tables. But fortunately, this is not a big problem either. We have, with these previous R&D projects some three years ago in Postgres where we created a billion tables inside a Postgres database. And it worked quite well. And table creation was very fast. We were doing 12k, 12,000 tables per second creation. So, you know, it's not a big deal. More questions over there? I'm sorry, can you repeat it again? Oh, we're a little bit screwed up. We just cut that key. There's nothing else we can do. Yeah, there's some restrictions with identifiers. We escape some values and we cut keys. Yep. Not really. As I mentioned before, React is mostly key value. So, currently we're targeting more document data source than key value source. Because definitely a key value source can be used. Even though some key value source store documents which document is a value, right? But the other way around it's not on our roadmap. I can't say it can be done. Maybe it's great too, but I don't know. More questions? Anyway, I'm going to be around. I'm going to be also on the Postgres booth in the next two hours or so. Feel free to pop in if you want to and of course if you have any questions about Postgres. And just let me know if you try it. Let us know, give us feedback and hopefully you like it. Thank you. Test. Test. Seems to work. Okay. Mike seems to work. I guess I have to use the microphone just for the reason that the recording works. You guys want to come closer? It's not school anymore, guys. It makes it easier for me to understand all your questions. Speaking of which, if any question comes up just show me your hand and we do it while we are at that position in the presentation. I don't really like the idea of doing the whole presentation and then circle back to the first slide to answer the question. So just in case somebody is in the wrong room, this is about the special setup you get in Debian when using Postgres. Or the special needs of Debian in terms of Postgres usage. Okay. Okay. Mouse doesn't seem to work. Okay. Let's do it manually. Got it. So just short introduction. A brief look into the room tells me that roughly half the people know me already. This is from a web site called OpenHARP. It just shows some of the open source engagements you have. Probably for either one of us. It does not show everything I noticed. Some things I cannot really make out. I mean, the revelation. I don't even know that I committed a patch to revelation at some point. I don't even know that I had right access to it, but it seems I did something about it. Anyway, this is meant to have been doing open source and free software for quite some time. Or might I say for too long already. I started as an open source developer in 92 just to get this straight. Yeah. This slide says 93. Whatever. Linux as in kernel work 94, but I stopped since I actually stopped a couple years after because that was just too much to combine with work. I'm a Debian maintainer since 95 and started doing Postgres which is what I actually learned. I got my degree in database science, so it came quite naturally to go into Postgres. Very roughly, I'm going to skip those introduction slides because I'm not sure of interest what I do for a living. Just a brief idea. The company I work for, we do a lot of open source stuff, so this is not just about the open source project but also about experience from the outside, from users. Although for this topic it's not much of it, because it doesn't make sense. Debian is special. Sorry. For those of you who are interested in what exactly everything is that's so special about Debian I'm going to talk about Debian itself about the distribution and all the specialties tomorrow. I think at 11.30, I'm not sure. I just brought a couple points up here. The stuff that really intrigues people most of the time is that Debian is indeed a real community and only a community. There's no company behind it and it's completely geared towards free and open source software. Everything that's Debian is free just by definition already. Some call it democracy and then I don't even know if doocracy is a real word. I heard people say it but I guess it's made up saying, yeah, do the work and then you own the work. Some might use more negative words like anarchy but it does work as democracy. Again, more about that tomorrow. As I said, it's the largest distribution and maybe even the largest project. I'm not sure about that. There's a world map showing all the developers all those that put their addresses in at least. I'm not sure about those that seem to be in the middle of the ocean if there are islands there or the coordinates are wrong. Just to show some faces this is last year's developer conference and no, I don't think we had many people at Postgres developer conferences so far. Keep in mind this is only the group that traveled to Heidelberg, Germany for the conference not including the hundreds of others that didn't. Back to Debian the project what Debian is well known for is some special guidelines, the constitution and the social contract technically the free software guideline is part of the social contract but I still listed it as one of itself because the Debian free software guideline is the basis of the open source definition. The Debian project came up with those definitions first and then literally driven by Bruce Brown's who back then was the project leader it became a much broader movement. The other point I just wanted to briefly point you to is the social contract we as Debian kind of signed the contract with the open source word or the free software word that we are going to be and still and want to be and keep to be 100% free software. We don't hide problems, we give back what we have and again the priorities are our users and the free software at the same level of security. Largest distribution of them all in terms of packages but also in terms of hardware just put a couple things on yes, upper left the C64 that doesn't run Debian it doesn't do Linux at all anyone here ever work with one? Oh, more than I expected I thought I was the only one where Bruce used my edge there but of course it's iconic. My first Linux system was 486 PC with 8 megabytes of main memory and it worked and the old Atari isn't the right one but there are Ataris that do work with Linux on it and there is a Debian distribution for it. For all of these pieces there's a Debian distribution available and of course that means because we have free software in Debian for all of these hardware pieces there's a Postgres package available so the way Debian is organized is we have three areas in the distribution main is the real Debian free software only but because it's convenient for users we also have some what we call non-free software stuff that is available for you to just use but doesn't have a free license and there's a third one Contrib package goes into Contrib if it needs something in non-free to build or run if it's completely sustainable by itself and it's free then it goes to main if not it goes to Contrib as you can see from the numbers there's not that much non-free and Contrib software in the distribution anymore 20,000 source packages 21,000 building 250,000 binary packages I guess that's a number that a lot of distribution cannot compete with but again this is the advantage of being community driven as a Debian developer you just see something that you like to have and you upload it you package it, you upload it and everyone else can use it so I found this picture to explain the releases it's not completely up to date actually it's pretty outdated already but it's a great picture and somebody really put a lot of effort into it I didn't want to recreate one just because of new names if you look at the names to the left there are a lot of pictures in there actually to the right as well for those who don't know Debian that well all the Debian releases are named after Toy Story characters and just in case you wonder there is no version 1.0 we have 0.93 something and the next one was 1.1 because somebody some vendor released Debian 1.0 before Debian did so we couldn't because that's a different version so we had to start with 1.1 anyway the thing is there are a couple areas in there so you see experimental unstable testing stable the idea behind that is of course stable as the name suggests is the stable release the release you usually work with unless you're me and you use unstable which is completely in development testing is like a rolling release for stable so everything in unstable that works well enough that doesn't have release critical errors eventually migrates automatically to testing and in that moment when we freeze for the next stable release we freeze testing and don't migrate anymore experimental is for huge changes it's for stuff that you're not sure about how it will disrupt the rest of the system so that one is more or less kept out by the way unstable also called SIT has a system in development or something like this or the Toy Story character anyway thing is you see here you got stable and this goes up to Weezy and squeeze is still listed there as stable so nowadays the current release is Jesse Weezy is already outstable but since Debian went to long term support even squeeze is still supported and as I heard earlier we still have users in the room so what we do is we actually maintain all the not all but the most important packages in squeeze and do the security patches and everything for those old packages however have a look at this this shows you the list of the Debian releases and the list of Postgres releases now of course squeeze could have been released with a newer version just from the timing of the release but keep in mind we have a freeze before the release and Debian only releases when it's ready meaning the freeze can be for 6 months, 12 months, even more worst case and we don't release a beta version so that meant when we froze squeeze we had to use Postgres 8.4 now do the math I mean end of life for 8.4 was 2014, July and actually you asked earlier Bruce it's next month as you see so we still had almost two years of one year and seven months to cover with an old Postgres version and the same problem comes again we see ends May 18 but the 9.1 version ends September 2016 and Jesse again although for Jesse we have a short time frame so what we do with that is we actually take changes from other versions and back port them by the way in terms of long term support help is always welcome so not just for Postgres for everything in Debian if you're interested just give us a hand so we go to 22 and by that point we were out of it Postgres had declared end of life that was the last version then and this is where I'm personally involved at least partly in the long term support thing we tracked 9.0 until we came up with 9.0 23 and all the patches made there we manually tracked if they are applicable doesn't make sense if it's just it's usually bug fixing but some bugs are more important than others and then back port them and make sure they apply to the 8.4 version now we got the next problem 9.0 is already end of life so we have to go one step further and do the same with 9.1 fortunately only for another 6 weeks because obviously for everyone who ever tried doing that the further the major versions go the more difficult it is to apply a patch and you have to change it which sometimes is a lot of work sometimes that doesn't even work we created a wiki for that where we document which patch worked and which one needs which adoption and which one doesn't apply at all say it again so the question the commanders were not the only people who do that I'm supposed to repeat the question so yeah yeah you mean as in back porting patches from 9.0 something to 8.4 okay except Debian I mean these patches is not something we or I do just for ourselves okay yeah that could be if you're not doing Debian but you're running on anything but and for whatever reason you're still on 8.4 you have to do it yourself right right that's the point and that's the point why we put it into Debian and the next release as I said will have the same thing so as a user you're guaranteed to have what is it five years of usage of that version on Debian and you get all the patches you need a lot of the development a lot of the package development the same what I'm saying here holds for Ubuntu because Ubuntu and Debian the Postgres packaging is done by one team for both distributions now the LTS thing is different because their LTS cycle is different from ours so some work we can do together and some we can't at some point you will come to the point where you have to upgrade you want to do it earlier but eventually you have to and then we are facing the typical distribution problem how do we do that the upgrade in itself sounds easy right but just keep in mind the process of installing a new package is pretty complicated and there are a lot of places where it could go wrong and have to back up and reinstall the old version and keep the old version and so on you have to upgrade the database you have to make sure there are two options one is you take the data from one to the other the other is you just dump it before you upgrade the package and then you restore it that will work but I'm not sure that's the right way to do it keep in mind if you do the dump restore thing you have to have enough room on your disk space filer to store the whole database you have to do the database upgrade at the same time when installing the packages you have to have a lot of time there are databases out there that are not in the few gigabytes area talking about how many dozens of terabytes it takes an awful while to dump that out and restore it A3 and you're sure that's on purpose you know it happened to yours truly before that I just back ported to all the versions I had checked out and pushed it and then somebody approached me and said hey you do know that this had end of life like last week oh no I forgot about it happens but the thing is this is back to the upgrade problem just think about this whole complex cycle you did dump your database and then something goes wrong and you have to restore the old version it's really really complicated so looking at the postgres docs there's a good hint by the way before I forget that when I talk about the cluster here it's not meant as a high availability cluster but it's the original definition that also part of the secret standard used to be in the secret standard I'm not sure if it's still in there the cluster defined as the group of databases handed by one postmaster so what this says is you have to have let's say you want to migrate from 9.4 to 9.5 you have to have both running at some point at least I'm pretty sure correct me if I'm wrong that it also holds for PG upgrade so it's not sufficient to say okay I just dump it drop the rest might go wrong and I have to restore it completely into the old version and reinstall the old package doesn't work and for other reasons that we already discussed we don't want to do that so we have to set up a system that runs 9.4 and 9.5 and ideally even at the same time so it can copy the data over or use PG upgrade but the way it used to be handled was that the relevant binaries, the relevant programs were just copied aside to some temporary storage and then run to get the data dictionary and the data out and everything and put it over you might imagine there's a lot of chances for this to go wrong so what we came up with is a solution to do it right instead of just putting something aside why not design the whole system to allow multiple versions and multiple clusters of different versions at the same time so the way the Debian packages work now is you can install whatever you want on versions and they all can have databases on the same system at the same time so upgrading means only to tell them okay this is the old one this is the new one just push the data over so how is it implemented on the left hand side you see the Postgres package as it used to look so there's data in there UserLibPostgres, UserSharePostgres and some other files of course this one was changed because instead of the Postgres package we now have a Postgres QL95 package that has exactly the same data in it well mostly some details about that later the original Postgres package still exists but instead of having the files to install it's just a meta package it's just a dependency that makes sure you get the latest version of Postgres 95 or whatever is up to date by then in the package we also change instead of putting the clusters under Valib Postgres as they used to be and the information about it under ETC Postgres we added a version number there as well so there's another sub directory for the version 95 and then there's another one for 94 and so on so they are all separated by the way Debian does move some files to ETC that are not there by the standard Postgres install reason is file system standards configuration files have to go to ETC so we move them over and then to handle this we created a whole bunch of new programs so instead of calling in a TV directly and making sure we handle the version number in between all the time correctly we created a program called PG create cluster which takes away all the right configuration things same for PG control cluster which does the same for PG control and while we're at it why not doing it right we know something like create DB from Postgres or we don't know drop DB or LS DBs so we just edit those as well I got examples for that so you can see how those work and of course upgrade cluster sorry almost forgot the main reason why we implemented it that way you have to upgrade it let's go through detail to an example so what I did here is just create a new cluster for version 95 on my system you can see the output it's standard you can see the last thing port 5 4 3 3 that's because my standard installation already ran on 5 4 3 2 so during installation the system decided okay that port is taken just take the next one there's no need for me to manually interfere and say I have to this is what you have to do I can change the port manually if I want to so then PG F cluster shows us what's available I did remove some pieces because I wasn't able to get it on the slide otherwise the green line shows your running cluster the red line shows the cluster that's not running and guess what the next command control cluster obviously starts it and we see next class this shows us oh there's actually a copy paste error it should be online not down on the second green line that comes from copying the lines instead of taking the lines from the terminal sorry and the last one don't really need the test database so let's get rid of it interesting side effect is PG drop cluster also enables you to stop the postmaster in the same command so you don't have to worry about all those details you're gonna get rid of one the system knows how to stop it and deletes it afterwards I hope this is big enough size wise seems to work that's the upgrade command so I also did install a 9.4 version of test and then upgraded to 9.5 and you see it goes through all the steps there are actually more lines there that are removed that didn't give us so much news but you can see there's a lot of work to do for migrating one database to the other but it does work it does do the work when you want it to do it when you have like the availability the users are shut down shut out you can upgrade the database you can do what you need to do maintenance window and you still keep the old one so essentially what you do is you upgrade your cluster and then after it works you test it and after you see it works you delete the old one what does that mean in terms of accessing the databases you can see there are the two again this by the way is the right line as you can see it says online just starting psql gives you psql version 9.5 we can use the port as we used to do it in Postgres and you can see in the next line it still starts psql 9.5 but the server is 9.45 because that's the one we just installed or and this is the only change we added an option cluster where you can tell it the version number and the cluster name in one argument it does access the one on 5.4.3.3 as we can guess from the node it gives us to implement that we added a wrapper to all the standard Postgres tools so looking at this standard path user bin psql you see it's just a link to the pgrepper and the same holds for all the other tools they all go to pgrepper and pgrepper then passes the cluster argument if there is one takes the other arguments and just push them on and makes a decision which tool to start meaning psql for example will always be the latest version because that's compatible with the other servers some other tools are not or not guaranteed to be so I use drop user here as an example as you can see just the version number in the first one by telling it to work on the 9.5 cluster it gives you version 9.5 drop user by asking it to work with 9.4 or 9.5 so same thing there just because it could be not sure but it could be that there's so subtle differences and with some tools we even know there will be difference between we could say that's it we solved our problems right but why should we stop here I mean now we solved the problem of having different versions at the same time on the same system that in a production environment you don't want to have a lot of different versions and different databases on the same server but in development who knows and if you have to do tests against different versions of different databases might be nice to have them all on the same system instead of maintaining so many different systems so while we edit why not add another system to allow all versions let's say it's already possible to install all versions but why not add a system that gives you all versions as Debian packages so there comes a solution that is handled by a couple of Debian guys and the Postgres people as you can see from the URL on the left it's up as in Debian package management .postgresql.org it's a joint offering for Debian and Ubuntu again the same team doing the packages and they do it for both systems so from this up postgresql.org you can get all the Postgres versions at least those that are still alive for all the Debian and Ubuntu versions as binaries the distributions the sources are the same as used in the distributions so one of the common questions is shall I use this or that it doesn't actually matter the same source is built by the same team yet the build system is different just because of the amount of packages some of the developer versions are available actually but only for some hardware some architectures only for some distribution releases but you can at the moment you can go to up postgresql.org and install postgres96 from it try to play around with it fine, you can install it and it runs in the same framework as your production95 system on the same system the names are different they added pgdg for the distribution version and pgdg disk version to the package number okay we have to change something so you can see the differences but again all active postgres versions for all active distribution versions let's do the math 3-4 Debian releases 3-4 Ubuntu releases might even get bigger with all the long term support stuff 6 postgres releases all server all extensions, all modules just two architectures only AMD64 and i386 the rest hasn't been added and probably won't be but still 143 different source packages 1000 different binary packages essentially handled by 1-2 people all the time just to see how the size increased in 2012 when we started we were talking about something like 2000 binary packages and now we are at this is an up to date as you can see the graph that was created was from early like a year ago didn't find a newer one but still a lot of new packages have come around so how do you use those i have to take a short detour here and explain a little bit about the Debian package management for those of you who already use Debian you probably see why by the way i just see you taking pictures the slides are available online if you want to they are already uploaded so far only to my server or to our company server so i have to figure out how to get them here but worst case hmm ok so i will add it in there there is a small change i did this morning and my hotel viola didn't allow me to upload more than 2 megabyte every time 2112 kilobytes and stop so i have to make that change and upload it but will be on the wiki so you can get it so the question is just to repeat it do we have a procedure to work with changes in lipsy localization because that's a problem for postgres and the index structure of course i'm not sure how this fits into this area because the packages are built against a stable release so there won't be any major update on glibc on a release linux distribution no no a distribution release is stable and for a reason so the only thing that gets changed in a stable release is security box anything else new versions go into the next release yeah it could be a bug fix i'm with you on that one but it's probably not serious enough to make it into a stable release and the other thing is of course i have a brief slide about testing i mean 10,000 packages want to be tested as well and you cannot do that manually so those tests should find problems we have a lot of regression tests in postgres anyway just a short introduction so these lines show you how in debian you define your repositories so the first line says it's for binaries it gives you the debian mirror actually it's a drum a mirror the part the beginning is the root director is like debian the release is jesse could have used stable instead right now i only want to have main this is not because i don't want to have non-free software system but it's because with contrip and non-free it didn't fit the slide because they just added afterwards the next slide next line gives you the same just for sources and then the third one is the security system where all the security updates for the stable release are so then you add another line to this the next bucket here it's uppostgresql.org and then the path whatever it is and you say jesse pgdg and you get all the postgres packages that are available for jesse from uppostgresql.org or you want to have a specific version and if you're talking about latest development version 9.6 you have to do it because that's not automatically activated you have to use the last line here and after the main add the version number so it's completely up to you what you want to do you get you can completely flexible in changing and deciding which version you want to use and then of course standard debian once you have it configured like this up-get update to get the new package list up-get upgrade will upgrade you the packages you have that are new and that's it just use it now there's one problem if you keep it like this you might run into a situation where one version looks newer than the other to the system despite you not wanting to change those the way we solve this problem in debian is by giving priorities up-get policy shows you the priority information essentially for the Postgres QL95 package see which version is installed and you see there's a candidate version with a PGDG edit and due to the way version numbers are compared the version 950-1 point PGDG plus 2 is considered newer than the standard 950-1 version but we might not want to make that change every time and we might not want to use up-postgres QL for that because we only want to have use it for a different older version so what you do in debian is the so-called up pinning as in the red part here you put a file into preference.d that says everything that comes from up-postgresql.org gets priority 200 and then we call up-cache policy again and you see the 500 up there went to 200 500 by the way is the default went to 200 and now it's not a problem anymore it doesn't think about upgrading the package anymore there's also the priority 100 which is the currently installed package I don't think that is really a needed one at least not for us right now here no it's so the question is is the PGDG version supposed to be not newer although it looked newer yet due to the plus 2 no this is just because that was the current version the plus 2 in this special case it would be newer you don't want to play ping pong so the next release might be a dash 2 version in the main release which is absolutely the same source as the plus 2 on the PGDG and you get a new install again but the main reason for this is you make your decisions which package you want from where and you want the system to keep to that rule if your decision is to use PGDG do it if your decision is to use the standard devian org package please do that but don't switch behind my back without me that's what the app pinning is for you can make the decision you can also increase the priority same thing then it will not go away from PGDG it will not do the standard devian packages anymore but it should always be your decision and in this special case I agree the plus 2 version might be newer but I have to look into the change lock to find out as soon as the dash 2 version comes out on devian there will be a dash 2 PGDG plus 1 version again on the PGDG archive or might be a day later depending on timing and so on so how are these packages built as I said nobody is going to test more than 9000 packages so it has to be you have to find an automatic way to test the packages there are a lot of tests available on the Pugetion suite although that is not enough but more on that later but in general we need test suites for almost everything and Postgres is one of the open source projects that's good and creating test suites for what they need but then we need some more so what we do is we use Jenkins for continuous integration essentially Jenkins is a server to run scripts we need two jobs we need to make sure it builds the packages the binaries from the sources and we need another job that no, wrong the source job is to build the source package from the information we have the tarball and the vcs information it's all in git so we have git we have the postgres tarball so we have to make one source package from it first and then we get the binary package that takes the source package and creates the depth files from it let's a brief look how it looks the dashboard you can see there's a green there so these things worked chose the duration chose last failure and so on so there's a lot of stuff that goes behind the scenes but it's also a lot of work and finally I'm coming to the end here already as I said, regression tests are not enough they are good enough to show all the problems that might there, all the bugs that might be there in the source code because it doesn't run the way it should be and with all the way Debian works is we have a lot of different architectures as I mentioned and we have built servers for all those architectures so all the packages are automatically built on all those build servers and depending on what kind of hardware there is the source code of the package might not even cope that well with it so there are quite a few bugs that were found by running the Postgres regression suite through all those architectural builds but again those won't catch the packaging errors the regression suites ones so we also need tests on the installed packages and what the guys did, I'm not involved in that so I don't take credit on that one either it's a tool called Auto Package Test and you see it a small run here so it also does test all the packages by taking a standard system a new system from scratch and installing all those packages doing all the stuff that you do automatically after installation and de-install it again to see if it correctly de-installed testing things like are all files gone after I re-installed it and so on this is all done automatically behind the scene so we are pretty sure that the packages on uppost.gr.org are really up to the standard quality we expect from them and that seems to be it I'm a little bit early any more questions I'm not sure what you're asking there could you so you're asking if PGconfig has different versions as well as installable at the same time and it was a problem because you said extensions for different versions to install at the same time but wasn't that before we split it out for different versions oh no then not I have to admit I don't know I have to look it up so the general approach is use the latest version all the time if possible but if a version if you need the program in different versions for different postgres versions then always use the one that has the same version number yeah right you weren't talking about installing the extensions yeah yeah right so you can have different you can have the extension for 9.5 and 9.4 and 9.3 installed at the same time yeah but then as I said then it should be using that wrapper and you should always get automatically the right version for the server version you have now if that has correctly been implemented I have to check I cannot tell you from the top of my head no if it wasn't did you tell anyone why that would be interesting because they should have a reason for that you might need a different version of the extension might be but still the tool should work with all the servers I might have lost that I have to check it thank you hello hello hello hello I think I'll go ahead and get started it looks nice and informal since I know half of you in the room so my name is Mark Wong I'm a consultant with 2nd quadrant I contribute to the postgres project talk a little bit today about how the 9.6 branch is looking in terms of performance I'll go over very briefly what the tests are talk about some of the patches coming into 9.6 talk a little bit about some mainframes postgres xl some column oriented databases some replication so throughout most of these these evaluations again at PG Bench if you're already familiar with postgres you're probably familiar with PG Bench a TPCB derivative online transaction type basic online transaction type workload doing batch processing an open source kit called dbt2 which is just another TPCC derivative another OLTP type workload and then dbt3 dbch decision support type workload and then some various small custom type tests to illustrate specific parts of the changes coming in depending on what the patch is so first off where is 9.6 now ran a PG Bench read only test on a 4 way 8 core 32 thread system just to see what the scalability was looking like so this won't be particularly interesting but it should be at least not it should be at least encouraging to see that the performance the scalability hasn't degraded with 9.6 yet compared to the 9.5 release although in this particular chart the doing a read only test with prepared statements is showing a little bit of regression when you're getting towards a higher number of clients don't have any answers for that yet but I think I think Thomas Wandra is trying to figure that out to get more details on what's going on that's what's in the tree now one of the patches highlighting for this presentation there's a freeze avoidance patch atomic pinning and unpinning of buffers and the buffer management code a little bit about parallel sequential scans unique joins 6 decimal data types and shared aggregate states a couple of these have been committed some have been submitted so who has really large tables that don't change very much do you find it painful when vacuuming or after running a lot of deletes or updates on those large tables so this this patch, this freeze avoidance patch is authored by Sawada Masiko it's it makes a table read only and the benefits for this is that then certain DML statements can be ignored and then maybe more importantly depending on what you're doing vacuuming is ignored the auto vacuum process won't come along and try to figure out if it needs to do any freezing in a table because it would already know that it didn't need to be done now that it's marked read only so with this patch there's a couple of of extra DDL commands just to mark it read only, read right the atomic pinning and unpinning of buffers and the buffer manager is authored by Yuri I don't know if there's anyone who helped me pronounce my names here Yuri 011 thank you thank you practice my other languages so this is intended to improve the read scalability by using atomic operations instead of the spin locks that are currently in place the testing that he had done with this was on a power 8 system 8 modes, 8 processors 8 cores, 8 threads the chart of the results here is against a I think they call them L-Pars 256 logical processors for this particular system so looking at it from the bottom up as I imagine that text may be hard to read the very bottom number is actually results with 9.4 9.4 on the very bottom next line up is 9.5 the 9.5 release which is nice to see an improvement there the third line next line up from that is the head of the 9.6 branch so again nice to see a little bit of an improvement compared to 9.5 thus far already and then the 5 lines above that are various aspects of implementing these atomic operations and pulling out some of these spin locks so compared to 9.5 the or what's even in 9.6 now the throughput is looks like it's increasing 2 to 4 times as much at various parts of the number of clients that are running in the system the amount of concurrency that's going on so this looks pretty encouraging to see drastic performance in these kind of workloads he was excited for parallelism so parallel sequential scan has gone in by Amit Kapila has also involved some additional work of assessing parallel safety from Robert Haas and what this patch what this work did was improve the sequential table scan performance when select queries are retrieving a relatively small number of rows compared to how many rows are on the table so it introduces a new configuration parameter to limit your degree of parallelism so you don't accidentally overwork your system some limitations so far that it's only for select statements and it doesn't handle joins yet so what I wanted to show was a few more data points to what Harry Babu had shown on the mailing list what what I'm going to do is run some select queries on a single table and show the behavior of a degree of parallelism and selectivity of the rows coming back so just to quickly illustrate this table has about 16 columns a variety of integer floating point and far chart columns and the query is pretty simple we're just selecting one of the columns we'll control the selectivity by using a modulo on one of the key columns so we can pretty much guarantee the percent of rows that will be coming back I'm going to just quickly mention say court system, 16 thread intel system and this is what it looks like so I think what is important to understand with this parallelism is that it is important to control the degree of parallelism that's going on the red line up there on the top is what happens when you're selecting all the rows from the table that first point on the left is single user, single thread single process, sorry selecting all the rows from the table and if you're selecting all the rows if you increase your degree of parallelism and want two processors to go after it you're going to more than double the amount of time it takes to scan that table that green line in the middle is selecting only a quarter of the rows back increasing the number of processes scanning through that table from one to two will already improve your response time by about 25% but then increasing the degree of parallelism after that is start to not look so good anymore yes so the question is all of this data is in memory so on this particular system 64 gigs of RAM the data set of the entire database is 100 gigabytes so this particular table is going to be about somewhere between 70 and 80 gigabytes oh question was how many shared buffers or how big the shared buffers were set to I actually don't recall but if I were to guess I had probably set them to 8 gigabytes yeah and then that blue line on the bottom is just reducing the selectivity in half again selecting 12% of the rows back in this particular case moving between or increasing the number of processes between two and four seems to improve the response time by by almost halving it down another 40% but again you have too many processes scanning that table your response times are going to start to to increase so making sure that that degree of parallelism is limited appropriately and this set I think I glossed over this part the set tuple com cost is basically the parameter that the planner uses to determine at what point should it not use a parallel sequential scan the next patch deals with unique joins authored by David Rowley the purpose of this is to improve the joint performance when the inner side of a join is known to be unique by either a unique index a group by clause or a distinct clause so Thomas Laundra did some experiments he created a simple test where he used two tables to simulate sort of a fact dimension relationship star schema type thing and he wanted to illustrate how well this patch performed when he had 10 more factorism dimension tables at various various scales so pretty simple table with dimension table with an integer key and then a fact table with an integer key the query was pretty simple just to select account 1 when selecting all the rows joined between these two tables and this is what he found so when he had 100,000 rows on the dimension table then 1 million on the fact table the before and after change was 7% as when he grew it the dimension table to 1 million rows thus having 10 million rows on the fact table the joint performance improved by 17% and then somewhere between that and having 10 million dimension rows 100 million factor rows the improvement came back down to about 10% another extension creates this fixed decimal data type how many folks deal with numerics find them slow fast enough so so yeah so David Rowley also created an extension to add this fixed decimal data type the purpose of this was to help improve the performance of queries that were aggregating the large number of rows of numeric type data the way this was implemented was by using 64 bit integers to provide a predefined precision decimal which I think was to places I'm starting to forget my math terms so similar to a numeric in trying to be a precision decimal type but at the same time because of the fixed decimal type you do have to you may have to be aware that precision is truncated as opposed to rounded and just to illustrate in that same table I showed earlier you have a table even with out of 16 columns if four of them were numerics let's say we use the fixed data fixed decimal data type and see how our ad hoc decision support queries run in our TPCH we see four of the queries have fairly significant improvement in ranging from 35% to 55% improved response times so to give you an idea of what these types of queries look like query one which was on the far left over here is basically aggregating those four rows in slightly various ways summing them together taking averages and grouping them by a couple of other statuses whether the order was returned and whether this was where in the ordering process these line items were at so this query is trying to determine the amount of business that was billed shift and return and in the same with the scale factor of 100 gigabytes we're already seeing this much improvement on a table that's between 70 and 80 gigabytes so how many people have considered column stores, column more and databases are you still trying to decide whether you want to stick with one or the other using both? so for those who haven't really considered it just to mention I think a few popular use cases is column more and databases can be good good solutions for data warehousing and ad hoc query system or just computing aggregates over large volumes of data so once again using the TPCH workload what we want to do is take a look at how Postgres compares to what some of these column stores can do see how much room there is to gain so I took a look at a couple of open source column more and databases which I hope you won't mind that I anonymized to CS1 and CS2 but here's what some of the things look like so there is a load test as part of the TPCH we're looking at 100 gigabytes of data this 100 gigabytes is spread across 8 tables and for Postgres the load test also takes in consideration how long it takes to build all the indexes you want to have in this case we're building 34 indexes for the column store we don't need to necessarily build indexes but some of these do support indexes but for my initial look I didn't look into indexing the data so load time is pretty impressive to load 100 gigabytes of data and build your indexes for it so you can run some queries against it takes about 9 hours for Postgres but these column store databases can slurp up that 100 gigabytes of data in one case in just over an hour the other one in three and a half hours ish and it'll also be interesting to point out here that with the load test with Postgres we are loading the tables in parallel and building indexes across the tables to get it down to 9 hours at the same time it was also interesting to find that these particular open source column store databases actually could not load one of them could not load data in parallel and it happens to be the one that loaded the data the fastest you only could load one table at a time now at the power test the power test in this workload of 22 queries runs consecutively and the point the purpose is to see how fast you can power through all of those queries so the way that it scored is a weighted geometric mean of all the response times now higher number is better you can see here that Postgres almost gets 4000 whereas one of these column stores over 8000 in the score and even over 7000 so one way to interpret these test results is that these column stores are able to crunch through the decision support ad hoc queries twice as fast twice as much as Postgres so a lot of room to game seems like there's a lot of room to game to add or a large gap that could be closed by having a column store data type in Postgres this is just to illustrate the response times of each of the queries the Postgres results are the blue ones on the right but it is kind of interesting to note that to see where which queries Postgres or the column stores actually do much better than Postgres does some of them I mean even the column stores are not too consistent one of the column stores happens to run certain types of queries better than the other one but still across the board to be able to crunch through twice as many queries with a column store and you can't with Postgres and just to give a couple of examples of where the types of questions that these column stores seem to answer better than Postgres if you were looking to look at what your top 10 unshipped orders with the highest revenue or revenue to be realized listing the revenue volumes of your local suppliers and identifying customers that have been having problems and having problems being defined as which customers have been returning your product more than others anyone use the mainframe no any mainframe user anyone thought about it no maybe this will be interesting or is everyone anyone anti mainframes no okay that's good so took a look at some of the latest mainframe offering from IBM we ran some OLTP take a look at OLTP workload on how this thing does and we wanted to see how it did to a popular competitor if you don't mind me anonymizing that also although these tests were run against the 9.4 release at the time I think these numbers are still actually quite interesting to look at because they certainly wouldn't have gotten any worse so a couple basic hardware details the Z13 processor has a 5 GHz clock speed there are 8 cores per die 16 threads 64 gigs of RAM in this particular LPAR so we tried to match it up against something similar a 8 core processor that also had 16 threads 2.4 GHz clock speed again 64 gigabytes of RAM and this is what we found so running an in-memory OLTP workload the lines here are illustrating the performance of the processors with hyperthreading turned on or not so the very bottom one is our popular competitor with no hyperthreading next one up is the same system with hyperthreading turned on and then the top two lines are the mainframe results so what we see here is that with or without hyperthreading that the throughput of an OLTP workload seems to run twice as fast on a mainframe but then we want to see some writes go on and this particular test is using PG Bench oh I think I said sorry I said this was a TPCC earlier but this is a PG Bench workload PG Bench again with the right workload right only workload on the popular competitor the results with the hyperthreading turned on and off didn't seem to make much of a difference the lines are pretty much on top of each other on the bottom but on the mainframe there is a substantial difference of having the hyperthreading turned on even for a write only workload and once again the ability to do a heavy right workload on the mainframe we were able to push through twice the throughput now this one is a TPT2 the TPCC derivative what these lines are showing is as you scale up the size of the database that the throughput of the tests are staying pretty consistent this particular workload on the mainframe seems to maintain about a 1.7 times throughput I will not repeat that question but it's yes oh it's another type of system yeah hypothetically it's a different hardware vendor that may be going up against IBM that yes so we're running Postgres on the same way on two different types of hardware oh oh yes sorry so one fork of Postgres versus another fork of Postgres yeah sorry for being facetious it's the same Postgres well I won't say that because of this alright so yeah well didn't seem like anyone was interested in the mainframes in the room today but they are some interesting pieces of hardware the question is what the sticker price was on the Z13 I'm afraid that I never got to see a price sheet but I don't think any of those IBM sales folks will shy away from what they cost so next Postgres XL how many folks have looked at Postgres XL over the years anyone using it now hoping to be able to use it so hopefully some of these test results will keep you interested then so for those who aren't familiar Postgres XL is a fork of Postgres they a lot of the people involved or previously involved with Postgres XL and Stato either one Stato it is an all purpose fully acid open source scale out database solution it's been rebased against 9.5 recently so previously it was against the Postgres 9.2 branch to give you an idea that after a few releases there may be a lot more interesting things to be able to do with XL so just to quickly mention that I did a little comparison on AWS systems EC2 setting this up on some EC2 instances these individual instances had 32 am I reading this one is just core Postgres at the time the 9.5 alpha 2 was currently available and this single instance 32 virtual CPUs 244 gigabytes around 8 SSD drives in it then for the Postgres XL cluster we have something with similar resources so we have 8 nodes each of these nodes have I gotta do this math in my head so each of these nodes had 4 CPUs what is an 8th of 244 30 gigabytes so 4 CPUs in each of these nodes 30 bytes of memory I think these little ones had only 1 SSD attached to them so this is what we sell on low test again using the dbt3 a TPCH derivative to load 100 gigabytes of data build 34 indexes took about 2 hours on the with Postgres 9.5 on the single instance in the 8 node cluster splitting up those data files 8 waves only took 44 minutes to load all that data and build those 34 indexes cutting down that load time in half and while we haven't explored it the Postgres XL developers have reason to believe that if we spent more time in determining how many coordinator nodes that we had per data node we may be able to improve this load time even further and also to take extra care in sorting your data files so that the data that you're loading on the table all streams over to one node got my slides out of order here so this is where the individual node details now this slide illustrates how the response times how much the response have improved on each of the individual queries and the power test between 9.5 and scaling out on Excel almost across the board there's pretty significant improvement in the response time if we're looking at the score values scoring 2400 about 2400 with 9.5 and scoring about almost 6300 with Postgres XL almost 160% improvement and it's processing power so hopefully we'll see good things that come from Postgres XL and it won't be a fork forever how many folks have replication in there using replication streaming replication slowny londies more than one or do you got sorry yeah so we're going to take a look at Pgeological is an extension that is newly available that can be used with 9.4 and newer versions of Postgres Thomas Wander took a look at seeing how well each of these replication solutions worked with PG bench on a normally workload just to give you an idea of the system sizes they're a pair of two AWS instances with four SSD drives attached and 122 GB around so most of these tests are probably going to be ending up in memory just to illustrate or to show a little bit of tuning that's gone on that these weren't default parameters check point time up to 15 minutes set the effective IO concurrency to 32 1 GB of maintenance work memory setting the max wall size to 8 GB and configuring 16 GB of shared buffers and this is what we get so in this two node setup the two lines that are overlapping for the most part on the bottom are the throughput numbers from Laundice, Dan, Sloney at the top end which I guess in some ways would be considered I guess the peak of what you could expect out of a replication solution is streaming replication so that huge difference from having those trigger based replication solutions you guys using them find that to be a burden sometimes the performance of Sloney or Laundice so now the line in the middle is the performance of the throughput that you can get with Pgeological which is not using triggers the same way that Laundice and Sloney are building off of the streaming replication facilities inside Postgres the question is whether these numbers are coming from the master of the Slade system I actually don't know I guess I would have assumed the master but because you have to run it must be from the master based on you have to run PgBench against the master and these numbers are probably coming out of the PgBench so it's looking like streaming if streaming replication was the right solution for you you're getting 6 times the throughput compared to Laundice and Sloney Pgeological it's not quite that high but you're still getting 3-4 times the throughput and that's all I had for you today thank you for listening