 Welcome to YuccabyteDB, a distributed PostgreSQL database. Today, we're going to explain how YuccabyteDB works as a cloud-native distributed PostgreSQL database and demonstrate with code examples how important PostgreSQL concepts such as serializable transaction and partial indexes are preserved. My name is Lindsay Hooper, and I'm one of the conference organizers, and I'll be your moderator for this webinar. I'm here with Bryn Llewellyn, a developer advocate at Yuccabyte, who specializes in SQL and stored procedures in the context of distributed SQL. Bryn has worked in the software field for more than 40 years and joined Yuccabyte in 2019. Bryn started off doing image analysis and pattern recognition at Oxford University, programming in Fortran, and then worked in Oslo, first at the Norwegian Computing Center and then in a startup. In Norway, Bryn programmed in Simula, recognized as the first object-oriented programming language that was the inspiration for C++, and he came to Yuccabyte from Oracle. So I want to welcome you, Bryn. So with that, I'm going to hand it off to Bryn to take it away. Enjoy. So we're off. Who am I and who do I think you are? Let's get this out of the way quickly. About me, while you heard more than you ever want to hear, I think already from Lindsay, but I can approach it a different way. When we join Yuccabyte, we're encouraged to write a little piece about why we did it. And I did one, and it's easy to find on our blog site. There it is. That's enough about me then. I'll tell you in a moment, though it's obvious where to find this blog site. What about you? That's far more important in a sense. I'm assuming that you know Postgres very, very well, that you actually wield it in your day job for a fair proportion of whatever you do in your day job. And indeed, not a week goes by without you typing SQL or indeed various create commands for stored procedures at the psql prompt. And I guess I'm going to allow myself to hope that you actually do know the world of stored procedures too, plpgsql, I have to struggle to pronounce that. They have it about it right then. And that you actually do use stored procedures. And most critically, I'm just going to assume that I don't have to tell you about the reasons to use SQL and what's so good about it. I'm just going to take that for granted. And also that you're unashamedly happy with the idea that the notions that underpin SQL are not brand new. They're not new and gimmicky and they date from as long ago as the 1960s. So with that said, now then moving on, I'm going to do a bit of stage setting here by doing a potted history of the two things that we claim are special about distributed SQL as a generic notion. And that's intrinsic scalability and intrinsic fault tolerance. So with respect to the first one, scalability, in my lifetime as a programmer at least, there was a long epoch of what we now call monolithic SQL databases that didn't used to be a need for that name with no implied negative criticisms here. Postgres SQL in its standard vanilla form is an example of a monolithic SQL database as is most of the others that you were deferred of that were born back then or database SQL server and so on. And when they were popular in their first few years of popularity, they were notable because they were the survivors of the pre SQL era where all sorts of other things, network databases, hierarchical databases had their day but cod and date and they're thinking one over and then there we had it. And the thing that characterizes a monolithic database was exactly that. All the data was stored and managed by one single computer, typically in one room in some offices premises. Now that was okay for an epoch but as demands for throughput and storage volume increased people found that people particularly in certain spaces of application use found they couldn't accommodate everything in one of those single computers no matter how much iron they threw at it the fastest possible CPUs and so on that they could get. And that led to some kind of decline in purity that lasted a fair time where the first manifestation of it was sharding in other words slicing up data that would not ideally have been in a single database into no end of separate databases on separate machines over which the application code through some kind of net and in its own world managed where appropriate to make the whole edifice look a bit like one single database. Clearly that brought all sorts of problems not least the complexity of application development but there were other problems too as I'm sure you've heard of. And then came the era of NoSQL which was basically a low level solution to managing tons and tons of data and having lots and lots of computers at your disposal to share the load but they had to sacrifice something and they sacrificed SQL. In other words in with shared nothing as it's usually called then out with SQL and that had its distinct advantages that these people who needed arbitrarily large scale could get it and distinct disadvantages that I don't have to point out that application development and indeed correctness and all those things we know and love in the world of SQL went out the window. And then fast forward Google develops a solution that aims to meet both goals. They have a kind of low level storage scheme which is very much in the spirit of all the stuff I was just talking about lots and lots of computers shared nothing and stuff to keep it all in step but and indeed a handwritten and somewhat primitive but nevertheless SQL API layer and the whole thing was for internal use by that I mean use by their engineers to back up famous things you'll have heard of like for example AdWords and Google Play. Then the next milestone was good old Google deciding to on the one hand offer this database Spanner as a platform as a service offering exactly and only in their own cloud and they also very kindly published the algorithms that led to a wave of startups and development to make kind of handcrafted or new versions of their basic concept and that brought in then the era of open source distributed SQL databases and that's us at Ugubite we're in that class not unique but it's a fairly small class and of course at all stages during that history no end of hybrids were born and they live on for example oracle database as scheme which is a adoption into the some kind of loosely speaking database layer of a sharding scheme where you still have many separate databases but they do appear to be one and it's all managed slightly behind the scenes but in no way at all like the way that kind of bottom up way that it's done in genuine distributed SQL and so on. So that was that part, the pursuit of performance now about the pursuit of fault tolerance or sometimes known as high availability in other words all these gazillion nines that we hear about all the time availability must not be interrupted is the goal. So at one stage the goal was never articulated here's the history companies each had their own computers on their own premises, they shut down at the weekend everyone went home and had proper lives and someone over the weekend did a full backup that's all there was and they drove the tapes off to somewhere else to store them and then of course as time went on shut downs were allowed to have them less and less frequently and the era of incremental backup entered it all meant so to speak a good safety net but if something went wrong, well you had no alternative but to take up notice or outage, recruit new kits restore everything from backup and then come up again and be outages on occasions caused by whatever fault as dramatic as that could last not only hours but even some days according to the volumes of data involved and then as databases started to back internet facing applications that even that was considered well not even that's the wrong word that obviously was considered inadequate and that was the heralding of this era of a primary and standby model and that's very common today as well as what I might dare to call the old fashioned solution to which distributed SQL has a modern solution and the basic concept you know it inside out is to have a complete replica of your actual database your operation database in use which is held as close as is practical in synchronizing synchronizing with it but with a certain lag of course and it's based on mining and replaying the post committed information what in some worlds is called change data capture and in some worlds it's called redo log mining and redo apply and so on you know the game. Well, that's when no SQL came in and with its shared nothing bringing just at the one basic level scalability they also decided to take on in their designs in the concepts as they were born the notion of intrinsic low level automatic replication giving therefore folk tolerance in an entirely different way so that meant that instead of replicating at the granularity of the whole database the unit of practical replication was a part of a shard using the word slightly different from how it's sometimes used over a single table spread among several machines and kept in step by some low level magic under the covers and that in a way meant even more outward SQL outward SQL deluxe and then eventually thanks to Google and their pioneering work we had where we are now distributed SQL in other words you have your cake and eat it and in the general world distributed SQL means some query or SQL processing layer in Google's case handwritten in the case of other things you'll have heard of in this space possibly Cockroach DB, Thai DB handwritten but uniquely as it concerns today's talk especially you might DB uses the Postgres code as is and it bases it on a storage scheme which is inspired by this whole Google business and it brings I would say the best of both worlds so there we have the history and now moving on so let's just look at what it means then to have the Postgres SQL processing code not just something that's written to emulate it but that actual open source code taken and bottom half thrown away and refitted onto our storage what does that mean? Well, this down the bottom is some marketing stuff we have where they list up their SQL features but there's no point in doing that for you guys who know Postgres SQL I can rather say that our system to anyone who sits at the PSQL prompt and in our case what we call Y SQL SH it's basically the same thing with a different name you can't see the difference between the two and those Stacy promised because I rashly promised once to do demos I really think that's a bit of a waste of time that's what I mean by meta, meta, meta so I'm not gonna do any demos but I might flash one past you I'm not gonna even be very careful to tell you how to reach it I'm just gonna tell you the general way that you can find your way to demonstrations that then you can then do for yourself and then when you do that you'll really believe the truth of what I'm saying so I'll flash over here as quickly as I can now wait, what is going on here now? Here, I hope you can see this and I hope that someone will type to Stacy if you can't but on the left is an ordinary web browser and on the right is actually because it's blue and not yellow it's Postgres SQL and if I just do this here shortcut for a bit of SQL it tells me that I'm in such a database I'm such a user this is the Postgres version I've chosen that though I know it's not the recent one most recent one because UGBITDB is based on that distribution we'll upgrade presently and obviously then in this case there is no UGBITDB it's not that it's the vanilla thing it's on my MacBook and here on this other screen I've got the same thing let me just throw away that user and create them again from scratch so I don't get confused clear the screen and see who the hell I am now I'm the same user in the same database with the same default schema same version of Postgres but now the bottom half is this almost current version of UGBITDB and all I have to do now is in the spirit of metanus is go into our docs find something of interest like the array data type of functionality I know my way around this because I've just finished writing it over the course of the last couple of well more than a couple of weeks if I go into one particular thing this for each loop I'm sure you all know about that and they're just copy and paste ready examples in here I'm just going to pick anything that takes my fancy something here I'm creating a type I'm just doing an anonymous block I'll just copy it and I will stick it into this and there you go it goes through and it produces those results and it's showing incidentally if you care that the functionality of this for each loop in one of its uses is the same as the functionality of this unnest thing okay and not very complicated but you recognize these bits of special syntax that are unique as far as I recall now to Postgres but they're certainly not the same in Oracle and then I'm just going to go over to the white screen and I've got myself a clean slate so there's no objects I could collide with and paste in the same thing I think I can stop at that point there's tons well no I'll do one more but then I'll move on just I'm going to find a long source text of a procedure and by the way look here well posted this is not the long one just notice I've got you know any element and any array that's a nice Postgres feature that believe it or not Oracle database doesn't have and there it is working ordinarily in U of I T B if you were to do the same thing in Postgres you couldn't see the difference and only because it's so long and therefore so wonderful and complicated I'm just going to spray that one in and it certainly works okay no no problems there that was one way that you can demonstrate for yourself that the two are the same and I could say it now I may as well get it out of the way if you go to our site you will even hear you know you'll be confronted with things that invite you to download and install and you must have downloaded and installed Postgres many times it's quick and easy isn't it and it's the same kind of thing with U of I T B you can install the current version and be running secret the prompt if you've done it once or twice before especially in just minutes if you're doing it the first time and you want to check everything that happens you'll maybe take a little bit more time but it's minutes not hours and it is just a simple painless process now about that other stuff well all you have to do is look up on our blog site the URL actually says blog.yugabite.com author brim and here's about eight or so posts I've done you can find something on what was it that Stacey mentioned partial index is in there and your table functions all sorts of things if you go into one of them you'll find the same idea there at the USO or someone's inviting you to find out more about our stuff or get him out of the way and you know it's the same as the docs in other words take any one of these things run it in Postgres run it in what we call Y SQL in other words against the U of I T B on your own MacBook you can't see the difference so that's enough of the meta, meta, meta demos and now back to the talk there we are just in case you have difficulty remembering this is obvious author brim that's enough to find my posts okay now then a bit more on the background reading and then we'll start if you want to know about the high level what and why of all this lot like what is distributed SQL someone else's take on the story that I gave in a way a distinction between SQL and new SQL and why we did all this stuff that I've talked about and we'll talk about more Postgres on top of this here spanner inspired thing and more detail on that these are very general you know five minute reads these are maybe 10, 15, 20 minute reads and then there's this here stuff that I just mentioned that you can use to convince yourself that if it works in Postgres it'll work in U of I T B okay so enough of that oh no I pressed the wrong button yet again forgive my fumbling and now we'll move on to the real thing just one last bit of waffle um what is U of I T B and why you might be interested well um to take the second thing first I hope you'd be interested because on the one hand it'll look just the same as what you know and love so the learning experience will be tiny when you come to sit at a prompt and type up SQL and create persistent objects and so on um on the other hand you might be interested exactly because it's a database for the modern world embodying this whole stuff about distributed SQL that I've already explained um so these are the I'm not supposed to do a marketing talk these are just in the sense properties of all of these distributed SQL databases except in the sense that Google Spanner doesn't aim to be open source and doesn't aim to be run anywhere it's its own proprietary thing but with respect to the other two that I mentioned Cockroach DB, TideDB I'm sure they'd all claim the same things this intrinsic fault tolerance the bottom up thing based on a low level of granularity of replication which I'm going to explain in some sketch detail that's part of the proposition and that is also the you know this low level um replication sharding this is sometimes called is what leads to the arbitrary scalability um it's based on intrinsic auto sharding you can add notes on demand I have low latency queries and when you measure throughput considering you're distributing the demand for throughput across a ton of independent computers you can see it can go up arbitrarily high and people have measured you know IOPS in the millions and managing terabytes per node of data on lots of nodes which multiplies up the total storage and leaving Google out to the picture now no vendor no cloud vendor lock-in I should say all these three guys including us have designed from the get-go to be able to run on-premises on rented virtual machines or in some kind of containerized scheme which these days is bound to be Kubernetes it runs in such systems because it was designed to and as far as ours at least is concerned that I know more than the others it's ready for you to do that kind of deployment painlessly because all we have in our Github repo all these um helm charts that you hear so much about that mechanize the setup for you and all these things lead to this famous notion of being a distributed SQL database and uniquely for us whoop I didn't do that very well fully Postgres compatible okay then so now for something completely different in the remaining 20-25 minutes which is as much real substantive detail as I can squeeze it the substance but it will let me be unashamed about this be some kind of sketch but let's hope it tempts you to read more in our docs and in our blocks at least in our blocks so here we go the design goals it's worth restating these before I really really really get started we wanted to be before we've done it Postgres SQL compatible not in that kind of emergent sense by having programmed it all so that it will end up like that but exactly by virtue of this reusing the Postgres code as it is and we have demonstrated that we can adopt a newer version of Postgres when we feel it's the appropriate time to do so by a relatively small amount of work and we have proved by doing that that new changes in the Postgres scheme don't break any of the existing things either as the end user sees that which I mean by that the developer user sees it which you know inside out already all with respect to how it sits on our storage layer and that bled into this second point here and this is of course a statement from kind of marketing guy when they say things happened in a very very short time they may be glossing over things like how much time it took to test it when it was nominally working very well but that aside you know a movement from one major release of Postgres to another one in weeks not months and certainly not years is the take away from this and we'll do the same again when we believe that it's the right moment for Postgres 12 something and then this point again just to reiterate it you get this ability to run anywhere by stating it as a goal before you've done anything and testing for it all the time by deploying your things on all the possible clouds in all the possible ways and sure enough we've achieved that okay then so then the functional architecture is now shown in this picture but I've said it more times than I can recall now in the space of this talk that we have this upper half consisting of exactly the Postgres code a certain part of it the part that's responsible for SQL processing and then at the bottom we have something that we call docDB if you've heard the word rocksDB it's based on that but it isn't that any longer and the theory behind it is inspired by Spanner but we are not using anything that actually is Google Spanner code it's the thoughts that we're using and that gives what I've just been talking about on the previous slide so then now I have to introduce some vocabulary and it'll be a bit of a rush but I hope it gives you a sense of what all this is about so we use the word table in two domains of discourse we use it as you use it daily as a phenomenon that you get when you go create table, alter table and a phenomenon that you use to advantage when you do insert, delete, update and select but that kind of thinking of the word table is just mapped one to one to some phenomenon in our docDB world and without any detail now the scheme down there is a kind of key value pair at the lowest level of granularity where a key is everything that encodes the uniqueness of a field in a SQL table you know what table is it in what column in that table is it in who's the owner and database of that table and so on that's the key and the value is of course just the value with some representation by its side of what the data type is because down at the lowest level it's just bits and bytes and then the model that operates down at that lowest of all levels is a kind of write only and write the latest model with a timestamp and that you can imagine brings an intrinsic way to do this multi-version concurrency control you just read not necessarily the latest version of a particular key value pair when you're reading all the key value pairs that you need to constitute the results of a SQL statement but rather you read the ones that correspond to some transaction ID that is the one that you have selected by when you did it for your query okay so enough of all that lot that's what this one means now there's this notion of tablets then which is our word for sharding so in your SQL world you could think how many tablets have I got in my table how is it cut up conceptually and then at the lower level you would discover that these slices that we call tablets are mapped to things that are physical down in the docDB world but you come to understand that there's no actual manifestation visible measurable manifestation of this tabletizing sharding in the SQL world no queries you can do usefully that you would ordinarily do to find out that you don't need to care in other words when you think about development application development you don't think about tablets you only think about them if you're a kind of deployment engineer wondering how many to have where to put them how many you need to give this kind of performance you know metric that you've decided you wanted so then this is the next point then a tablet as this conceptual idea is actually a set of tablet peers replicas and they are kept in replicated synchronization with each other by a scheme that I'll sketch a little about presently but the basic concept is that this synchronization is transactional not as it is between an HA primary and an HA standby not at all that but rather properly transactional in every meaning of that word where you use such words in the same sentence as the word acid so then these tablet peers in other words the set of some number of slices each of which is identical to its other peers and each of which corresponds to some shard of data in a particular table they are managed down at the low level and each of them for some particular table ends up on its own storage managed by its own node and we use this word RF and I've got it there and I've got it here it stands for replication factor and the minimum useful value for the replication factor that you could choose is three because you want fault tolerance and you want fault tolerance for the safety net if you've got three then one can go away here we are you can survive with no safety net or limited safety net on two when you've got three working normally then you're protected fully when you go down to two in some crisis situation then you are still protected but you don't want to remain so thinly protected you're going to work behind the scenes to put a new node back into your system or maybe to notice that some network problem which was the cause of the apparent outage has gone away the way these things come and go in the wonderful world of cloud you can even go down to one but if you do go down to one then you can no longer take rights because if you did you'd have no safety net but you can by all means read from those but that's pushing the envelope of you know risking it now bigger values than three are certainly feasible five seven and so on for various reasons it should be a must be a odd number and they bring more fault tolerance but I'll always assume when I'm talking for the rest of this short presentation that the three is our number so that's that each table is sharded into some number of tablets and when I say about 10 I mean 10 not two and not 100 but not exactly 10 and then these tablets of a particular table are sprayed around the place moving on each tablet here then for a given table is on a different node and one of these by a process of dynamic like election among the softwares running on the set of nodes that jointly have this set of tablet peers is the nominal leader and the leader has some real significance basically reads and writes can only go through the leader and the other guys called followers are there exactly for the purposes of this intrinsic automatic fault tolerance now as I said each table has its peers spread among nodes and if the replication factor is three then any tablet from any table will be on some set of three nodes I don't forget you can have many many more nodes than three if three is our replication factor and the node count can be anything you want 17 or 18 or 16 or 12 now if a node vanishes for some reason well all the tablet peers that were lead that were being led there are then led somewhere else now at this point I have to stress that while from the perspective of a given table when the replication factor is three its tablet peers will be on three different nodes if you now focus on a particular node it'll have tablet peers from no end of tables where the other peers of that table are not on the same nodes as the peers of some other tablet that are on that node now that might sound too much many to one knee and too much complexity but the upshot of all that lot is distribution and no single point of failure and all those are the benefits right then so the last point I hope it's the last point here is that you can add a node to the cluster I'm in the administrator now and by the way I didn't use the word until now but a cluster is the word we use when we're talking about the set of nodes that jointly accommodate all the storage and computing power that give your end sequel experience the properties of distributed sequel that I mentioned but never there's the absolute guarantee that it feels like Postgres with everything that that implies not least the notions of acid right that's what a cluster is and you can add a node to the cluster or indeed decommission one at any time obviously you're not going to decommission one when you've only got three but if you put extra ones there for the purposes of some planned peak load the famous Black Friday we keep on hearing about and if you've got through that then you can decommission them you can do that at any time and this is of course the clue to demand based scalability and when you do add in a new node then it automatically takes over oh I spelled that wrong sorry tablet peers from other loads over a period of certainly minutes not seconds to balance things out so that you have equal throughput being sustained from all of the different nodes so then now especially now if not you felt it before this is going to be giving you a sense of what's going on so that you can appreciate if you like the physics the reality of it all to make it all seem a bit more plausible so this now is a contrast between the high level granular picture of the Postgres architecture here on the left and the corresponding picture of Yugamite DB here on the right and the critical noticeable thing is here you have one occurrence of the upper half stuff and it's all communicating with a single system for storage maybe I'm being a little bit sketchy when I say that because you all know more than me that every client it has its own back end and this really is the its own background process or whatever you want to call it and this is really what this green thing is but they're all communicating with a single storage system and they're all on one computer now here we are in Yugamite DB where these things here this bit here this bit here these three columns with green on the top and blue on the bottom are different nodes and each of them has its own instantiation of the .db code and its own instantiation of the upper half of the Postgres code that's the principle difference then in the architectural approach so that whatever happens when clients compile SQL and the SQL gets cached in the background process is now happening separately independently on each of the separate nodes but you can imagine whatever you have to do for cache invalidation and the rest of it we have it going on so let's just now sketch what happens when you create an end user table and you do some inserts so as I've said tables mapped to this same phenomenon down at the bottom level and there's this tabletization now system tables or what you might call catalog tables they are in one sense tables like any other tables but we decided to handle them especially by putting them in special .db tables and all these use just a single table so a single tablet with its peers so the whole of the catalog is concentrated and managed by a slightly different but basically similar process to the one that manages the ordinary peers from end user tables and the reason for this difference in approach is because there's almost nothing you can do that doesn't need to consult the catalog and we wanted to do some special optimizations there and without any further detail that's it that's how it came out so that this bit I already sketched all the at the lowest and lowest and lowest of levels it's just key value pairs where the semantics are sort of brought in as you go up through the stack and as I've mentioned this is a recap we have to bear in mind these notions of the replication factor being an odd number typically three and that will be by the way a fixed property of the cluster from when you created it and then the number of nodes which can be anything you want as long as it's not fewer than the replication factor and you can increase or decrease that on demand and then there's the number of tablets per table as I said which is you know influx exactly how it's pinned down I just mean by that more and more flexibility is coming with successive UGBDB releases but roughly speaking as mentioned it's on the order of 10-ish not two-ish and not a hundred-ish so then here we go the first thing you do when you create a table is obviously the create table statement whose initial effect is to establish metadata in the catalog saying what this table is about you know what it's called what its columns are what the data types are or what constraints there are all that stuff and but not only that some kind of storage stuff has got to be set up for it now in the interests of not making these diagrams ridiculously complicated there are only three nodes drawn the column on the left the column in the middle the column on the right each showing its upper half and its bottom half and we're stressing the word stateless about postgres meaning not that there is no in-memory state but meaning that it doesn't do anything itself involving persistent data on disk rather all the persistence is done by this guy communicating to one of these guys as appropriate and that guy communicating to other guys now there are only three nodes on the picture because otherwise it would get mad but these little ellipses are meant to indicate that you can have many many more than three but as far as the catalog is concerned it will be distributed among some particular three when the replication factor is three now time is running really short now so i'm going to rush this bit so somewhere over the create table statement finds its way down to the dot db node which is the leader for the tablet peer set that implement the catalog and then in comes the request and it gets to the right place by suitable redirecting along the way and then something is done there but before the response is given these guys go out sideways and keep their peers in step and eventually when all that is complete and the next step goes through which is to create the storage for the peers in the newly created table and here we're squeezing it all into one diagram but let's say that the newly table newly created table has lots of tablet peers and here is the master for the peer t3 here's the master for the peer t2 i hope you can see where i'm pointing here's the leader i hope i meant to say leader not master here's the leader for the peer t1 and these in a more general picture would be all on different nodes in a bigger picture and there'd be many more than three of them as well there'd be eight or ten of them so then that's roughly then at the first stage of the bootstrap now moving on then when we do some insert let's just say that because the thing down the bottom is a key value pair the primary key is a thing of significance it's a kind of index organized storage kind of scheme and you will be best off if you declare all your tables as you should anyway ordinarily with a properly defined primary key and then that will be the clue to link the notion in the sequel world to the notion in the storage world and because the storage is intrinsically indexed organized we've fallen into the habit of calling such things that are indexes on other columns secondary index because the rows in the table are primarily intrinsically indexed by the index organized structure but if you want indexes on other columns for ordinary reasons then we call them secondary indexes and by the time you get down to the lowest level of storage they are just other dot tv tables so you can imagine that when an insert is done into a table upon which there exists a secondary index there's some kind of recursive action to update the index to in the normal way and that is going to impact yet more nodes than if there weren't such a secondary index or set of them so this now absolutely as a sketch is what happens at this stage we come in with our insert some guy is going to coordinate it all it goes there it's spread out there and if there were indexes it would be spread out to where the indexes are as well and then that brings us to this story of distributed transactions a distributed transaction is the word you know and love from you know database links and that kind of thing and to phase commit all those ideas everything has to happen to first order to completion before the acknowledgement is given back that you really got completion but having said that with our tolerant system we can accommodate here in only from the majority of the nodes who manage the peers in question so if the replication factor is three the master certainly gets it first and is the master the leader I should say not master of the whole acknowledgement process but he has to hear back not from each of the other two in his set of three but from only one of them because that's the majority right then now all I'm going to do now is step through this and show you all these arrows flying around the place no more than that to get you the general idea that there's no free lunch so if you want this intrinsic fault tolerance and if your fault trance is brought by having these replicas on different nodes where in the limit these different nodes are not different only within a data center but they're different across data centers within one region and in the limit they're even spread across the globe then there's going to be some speed of light effect coming in and anything that you do is going to be in some way slower obviously by the laws of physics than it would be in a monolithic system like in our case the direct comparison with Postgres you could do any test compare some kind of response time thing in Postgres vanilla and in Ugobyte and of course Postgres would be faster but if you measured throughput and cranked it up especially way beyond what a monolithic Postgres could sustain well then that's where the things would kick in and long story short having got through all that lot and skipping this lot and about to do the summary we can say that in the bigger picture of everything that goes on in the application tier and in the context that this is a typical OLTP type of thing that doesn't involve masses of data going in and out on any particular SQL statement it all works out so that the net penalty of the performance effects from the speed of light effect that I showed are well worth paying to get the benefit of the arbitrary no limit on scalability and the intrinsic automatic self-healing fault tolerance so here comes the summary I have slightly fewer minutes left before the magic my time 11 o'clock but I think that's good enough on the left we have a little graphic here with the elephant bias that reminds you what's good or you know already what's so attractive about Postgres as a proposition it's open source and among these it is generally thought to be the one with the most sophisticated functional SQL system and then on the right in the world of aspiration we have back in the day Google Spanner as the only example of a SQL top half on the distributed storage system that was so attractive for the reasons that we explained however the Google Spanner had I'm going to say it unashamedly two distinct advantages one was you could have it anywhere you wanted as long as you rented it from Google and it was on their equipment and the other was that there I say it their SQL system is far from advanced because after all they wrote it initially meet their own requirements and they didn't write it to be general whereas Postgres was invented to be a proper fully functional general SQL system so what you want is those two worlds the advantages of them both and here we see them uniting in forgive me for this if it sounds marketingly but uniting uniquely here other ones in this space have that kind of system based on that kind of thinking for the storage but they don't have by construction compatibility with SQL from Postgres rather they just have a best effort by programming it explicitly attempt to emulate it so there we are just to remind you again if you're so inspired about further reading there's the high level what and why more detail about the upper half from the lower half picture and my posts about all sorts of things that you can do in SQL that you can do in both Postgres and our Ugabyte indistinguishably and finally then here's just a plug to download it and give it a try and if you're so inspired to join our Slack channel and if you're even more inspired to give us a star on Ugabyte DB so thank you for your attention I will blame the slight lateness of the end only two minutes from the nominal end on various things outside of my control but that aside Stacy did say to me and I'll repeat it now though she didn't say it at the start that I'll stay on for as long as people want to put and hear answers to questions but you can by all means feel free to go now so let me see what's if I can find it on the chat and for those of you who do feel it's time to go I'm just going to thank you for your attention and leave it at that first question what is a major advantage oppose Oracle database RAC oh well now that's a rather specific question I guess I don't know if everyone out there knows what Oracle's usually pronounced RAC is but it's its own thing I would put it in that space of a hybrid solution where it's neither properly deserving of the word monolithic but it's certainly not distributed it's basically one single set of storage stuff where every data is recorded just once it's what Larry Erson is famous for saying on stage all the time he hates the shared nothing he won't have anything to do with it so it's a no share of anything single set of storage each datum once and it has then they used to say lots of computers all communicating with it and with each other to manage it but it seems to have settled down that that system is no longer thought of as useful for scalability but useful only for fault tolerance now I may be speaking out of turn when I say that and product managers of RAC would you know fight me on that point but some people who are just neutral people in the outside world have come to think that so that's basically RAC is a solution to a problem but it's not the solution to the problem that distributed SQL solves distributed SQL solves arbitrary scale simply by adding more and more and more ordinary commodity nodes when you need it and it solves the problem of fault tolerance at this low level of within table granularity that I spoke of using this whole tablet scheme that I mentioned and Oracle database RAC or single instance has none of that so that's the best way I can contrast the two another thing I could say is that if your need for scalability has a definite predictable known for all time into the future maximum ceiling well you're okay with such a monolithic scheme and you can get various benefits with the classic schemes RAC to give you some kind of local fault tolerance of equipment and schemes whatever the database you're using is for giving you high availability by means of a primary standby failover system or even if you want to get more advanced you know redo log or whatever you want to call it change but data capture based asynchronous multi master replication which of course brings a nightmare of application coding discipline but can nevertheless be used to affect maybe that was too long an answer to the question but I hope it was adequate next question now you have a lot of people cheering you on in the comments here saying great explanation how about compatibility by the way my Twitter handle is not known for my brevity so you have been warned next question is how about compatibility with extensions oh well the short answer is yes I should always lead with that there's obviously a story and a lot of the postgres extensions or some of them a notable number of them have to do with things down at the storage level so they have no value in our world because we do things differently but something like the extension I can never remember the names but the one that brings you various kinds of random number generators unique ID generators and that kind of thing that that are entirely up a half presence well we can manage them perfectly fine now there is a little bit of a story there as I'm sure you know better than me when you want to take one of those into use in postgres its code manifestation is there on the node the one and only one node that you're using and wiring it up is therefore straightforward but what we have not done in the ugabite world is put the code for every single extension that there might be in some kind of management system so that whenever you dynamically add a node as I explained the extension code is magically present there too that's we haven't yet got that in our world though we see a way of doing it and I should have mentioned in saying that or again you know and that is some of the extensions might be entirely handcrafted by some shop who's deploying postgres in that situation in our scheme would have to be user extensible so that such a handcrafted extension could be put into our repository and automatically deployed to all the nodes as and when needed now having said that if you have a relatively stable park of nodes and you don't get disasters more often than it's just a minor trouble to attend to then you can deploy the code for these extensions manually so that it all works out okay and your only cost is the thinking about it and the doing of it as and when you need to now the extension I mentioned forgive me for not remembering its name but I hope you know the one I mean is managed rather more specifically in our world so that that one at least is just ordinarily available and all you have to do is to create extension DDL and bang you have it but other ones you would have to plan your own scheme for making the code available on the nodes you happen to have so I hope that was a sufficient answer yep it absolutely was what happens in yugabyte when you add new nodes will it redistribute data between nodes or will it only only add new tablets to the new nodes okay the first thing I have to do is of course acknowledge that that's a question that people are bound to want to know the answer to and in a respectable talk on this topic I would have covered that preemptively and properly but I just have to tell you that there's no way you can squeeze all that in for what I imagined to be the target audience in the course of any ordinary talk like this you'd need perhaps a one-day workshop on yugabyte and all the things about it so I kept on using the word sketch and I have to use it again a little bit but I'll do my best so I should first just mention something as a background nugget of fact to help you understanding and that is all our data is stored actually in an intrinsically compressed form as well so that the actual volume of stored data that you need to represent what needs to be represented is as small as it could be and we have a scheme for shipping individual tablets which is the level of granularity that needs to be shipped between nodes as and when it's needed that's part of our whole kind of management system so with that in mind there's not so much to ship and it can be shipped as a lump it doesn't need to be shipped in some sequel style so if a brand new node is recruited into the system then it obviously starts off entirely unpopulated but with the software installed on it and this software is in a kind of heartbeat and communication with all the other instantiations of the same software on all the other nodes and all these ideas about leader and follower and so on and then part of its whole world and the community of them can recognize therefore that the new node has arrived and that it is presently unpopulated and arranged for particular peers that were in one place to be redistributed I should just say recited to the other place or behind the scenes and the same sort of thing happens but slightly in reverse if a node vanishes sorry about that because when it vanishes you for a moment got only two peers instead of three where you used to have three so there's some scheme where it's not just moving your peers but replicating them but all that is possible now that's one way of answering the question an entirely different way is to say that on our website I don't know if I can find it quick enough now but since I'm on a scheme now where the clock is not running I'm going to risk this here I am on our property and among other things you can get started and among the all this this shows you how you can download and load it and so on that kind of stuff but if you go a bit further in this world you can get to the documentation and you can get to this thing called a quick start which in the first case is just going to encourage you to install it and do all on your own computer where for these purposes you can have a single node installation where of course you're not going to see any proof of concept of all the stuff I've been talking about but the next thing you can do is um let's see where this is if I can find it quick enough um somewhere here explore core features where we're sorry I lost it for a second explore core features linear scalability here what you do is um by the way we sometimes use this word universe where I've been using the word cluster there's a subtle and important distinction but I'm not going to go into it now um what you do is create yourself on your own MacBook or whatever PC we use a simulated three node cluster which you can do um by if you like some kind of IP address loopback kind of scheme but it will give you an adequate demo and then when you get this far you are going to yourself kill nodes and add them and I won't go into it any further now but we have various dashboards that show you the throughput being sustained by each node and um when you add a brand new one of course the throughput on it will be zero and on the other ones it'll be something where the something is roughly the same number on each of them and then if you go and well let's say stand up stretch your legs walk around come back in five minutes then you'll see that the throughput has ballast itself out exactly because the tablets who are the source of the throughput of course have themselves being spread around so that there's now roughly an equal number of the tablets per node I hope that answered the question at an adequate level of detail but the meta answer is you can read lots more about it in our docs and you can do a demo without any fuss without rending the cloud space or anything just in a perfectly splendid simulation on your own personal computer where you can watch this happening uh I hope that was adequate too next question what is the main difference from cockroach seems like Yugobite has a more Cassandra-like pattern oh well now um obviously now as they say on every um Twitter uh you know manifest at the top of oneself what I'm now going to say is my own opinions um and I've heard other people say them but I'm just going to say them my way um there are lots of things in that question but as a sort of ordinary non-technical way cockroach established itself in the world as a startup about a year before us and so they have had all the time this lead uh in general um let's say awareness and all that kind of thing um but at course level we are very much aiming to be comparable solutions to the same problem now in the architectural sense I think the most conspicuous difference is that um we decided and I think it's a brilliant decision to adopt the Postgres code as is and to wire it up and they decided rather to um write their own simulation of the Postgres system in a best effort sense and that means that in ours you know we get these exotic features if you think partial indexes are exotic or not that's a different discussion but we don't have to think about that they just sort of come more or less we get all the array functionality just like that we get all the JSON functionality just like that and as I earlier said I invited you to try it so that's one difference um and most notable in that I would say given my personal background is that we naturally then have PGS girls stored procedures as it happens the cockroach developers didn't yet think it was a priority to implement that so they have literally no stored procedure support so there are those kind of functional differences now um another thing that follows in the training from that is that this is perhaps getting a little bit technical now you perhaps heard of you must have heard of this basic theory that it's best to do whatever processing you do as close as possible to the data as you can and as far as it concerns the whole way Postgres works there are various things then that could be done entirely in the upper half but could ideally be pushed down into the lower half and therefore operate in tandem each on their own node with some amount of synchronization and so on now we've done a fair bit of that space and we're constantly moving further along that road and among the various blogs that you could find on our site there's a whole account of this push down approach now I gave that as the technical background and I'm sure that cockroach will want to and could do such a thing too but it's particularly straightforward for us to do it exactly because our whole lower half system is written in C and where appropriate C++ which as you know are just ordinarily interoperable and the Postgres implementation as you certainly know is written in C so this whole actual mechanics of doing the push down implementation is relatively straightforward given that it's all rocket science for us and less so would it be for cockroach exactly again because of well I didn't say it before but they decided given their background as Google engineers that they were going to write their Postgres emulation not in C at all far too old-fashioned but in the language that you will have heard lots about called go and you know there are things like that but ultimately I suppose you might say who cares about how it's done they just people only want to know how it all emerged how do we compare in various forms of performance tests and particularly the ones that in some way industry standard and the best answer I can give to that is again a better answer and that is that very recently we've published some blog posts exactly about this cockroach db versus ugabite db comparison and you can guess how it happens in this space it's rather funny in the world of open source but it's on one hand a liberation that all our source code is up there in a repo that people could just read and we don't keep our plans secret all that kind of stuff and loosely it's the same for cockroach and that means that when we compare we can't bluff it anyway but nevertheless there's a possibility for us on the one hand doing an experiment with cockroach to adopt some wrong practice and equally for them on the other hand to adopt some practice when using our stuff or when they explain how our stuff works and say it's not as good as how their stuff works it's quite possible for them to understand because no really can read that much source code so I would say then that they write something bad about us and we have a naturally come back and write something where we attempt to bring some science into the picture and exactly point out what they said that wasn't quite right well you can read a good account of that stuff and then I hope as I am you'll be convinced so again I hope metanus aside that was an adequate answer to that question okay then at the ysql level is there a notion of complied code or shared pool or does ysql have to interpret the sql and pg plsql every time it gets submitted um the model that we have is the model that we inherit from um postgres and in that sense it's rather different as I'm sure you all know out there from what I at least was used to in oracle so um postgres has its own system doesn't persist compiled stored procedure code it assists only the source code and that means when it's first taken into use it is compiled and then held in memory for use during the rest of the duration of that memory but the other thing to say is that that memory is um the local if I can put it that way the specific memory of uh what I've loosely called the background process the process who started up when you started your client and is who's is sustaining your use of it and that means that if some other brand new session in the ordinary psql or in our case ysql chef sense starts up they have their own dedicated background that is at the moment cold with respect to that kind of thing now that's how it works in postgres and that's currently how it works for us and we have the possibility because after all this is how open source works of of um taking what we got and then improving it in various ways um as long as we don't have lose anything to do with the ultimate um exposure of compatibility to the sql programmer but we haven't done that yet so and then as it was if you like in the ordinary single world sorry single node world for postgres users so it is in the multi node world for us too in that um I suppose I in that um every single node will have this going on on each of the clients it's got supporting each of the connections in its connection but um I should say there don't forget that in a modern overall application architecture the background processes that you have sustaining the ultimate traffic are um persistent they don't come and go and they soon get warmed up however many of them you do have and once they're there they're warm and they stay there and then any one of them is ready to take application traffic and that means that all the normal benefits that you get from the oracle sga um obtain here too so again as much as I could do uh on the fly without any slides or anything but I hope that was adequate so next question okay and this is the last question we have um what is the preferred upgrade path for ugabyte i.e can you have the upgrade transition through nodes or do I need to set up a replica of the database in sync or do I need to do it completely in place and the database will not be available during the upgrade I assume there that this is the upgrade from a particular version of ugabyte let's say version you know 214 or 22 or whatever they might be called have suddenly forgotten our terminology you know what I mean to the next version of the ugabyte software and now under those circumstances it's very much comparable with I hope this it doesn't confound the non oracle people out there but very much comparable to a rolling rack upgrade that um when we bring out a new ugabyte version then the change is software only and that means we can indeed take down individual nodes while all the others are still running and um you know put in place the new software in place of the old software started up again and because naturally of a commitment a tested commitment for compatibility as you go up through the versions everything will just continue to work and knowing what you know about the ability to recruit new nodes when you want to you could do that dance by adding a new node in as the first step therefore being over provisioned for a bit and then taking down the remainers one at a time until they're all upgraded and you know letting them recover and then when you've got through that exercise um then and only then decommissioning the extra capacity the luxury node that you had so that you wouldn't lose any overall throughput or ability as this process was going on so that's the short answer to it there's an implied longer discussion that um happens in our case relatively infrequently and that would be what will happen when we go to um the postgres version 12 let's say that we choose to go to if it happens to um make changes not only in software but also in various ways to do with persisted data so that things have to be rejigged now under those circumstances there would be a more complicated discussion so that perhaps as far as I can go with that I hope once again that given the whole framework we're in that was adequate don't forget that you can always if this becomes really um you know important for you to know and I hope it will that you can join our slack discussions and um we do our very best because our survival depends on this to be prompt and useful with our answers and if you ask that kind of question um because if you frame it very coherently then you shouldn't get a good answer to that there so is that it do you think now um for today I believe so so thank you so much for for joining us brand that was uh really educational oh no it is I who should thank you for hosting all this and for spiriting up an audience of course of course um so with that I hope everyone has a wonderful rest of their day um brinn you have a wonderful rest of yours and hopefully we'll see you all again soon