 Databases, a database seminar series at Carnegie Mellon University, is recorded in front of a live studio audience. Funding for this program is made possible by Ottotune. Google. Okay, thank you for coming to another Davis Seminar of Talk from Carnegie Mellon. We're excited today to have Ken Gates. He's a legend in the database community as someone who's worked on many, many important projects over several years. He was working the rewrite of SQL Server back at Microsoft in the late 90s or the 2000s. He joined Amazon and worked on building out Aurora, building out Redshift. He's done a lot of amazing things. And so he's here to talk about a side project he did while also still at Amazon on a database platform called Gaia, which he's informed me of. He's just been open source and available at GitHub. So we'd like to hear his thoughts and ideas about something he built that thought can be done with MySQL and Postgres for robots. So as always, if you have any questions for Ken Gates as he's giving a guest talk, please don't meet yourself. Thank you, and fire away at any time so that he's talking. He's a conversation for him and not him just talking to a screen for an hour. So thank you so much for being here. The floor is yours. Thank you. It was too good. Thank you very much. I don't know if I deserved it, but thank you. Oh, come on, please. All right. So my name is Ken Gates. I'm originally from the Soviet Union and my original training was in physics. So I was studying theoretical physics. Then I switched to something else. But again, it was hardcore physics. And I ended up building databases. So it's been a long way for me because how do you go from theoretical physics to databases? Again, it's simple. If you're counter-collapsed, you need to find something else. If whatever you thought would be your career does not feed your family, you need to do something else. And I was lucky enough while building my instruments for my research. Excuse me, let me switch to my slides. I was lucky enough to have worked on software engineering as well. So this is one of those big research vessels that Soviet Union used to have. And a vessel like this would be used to do a research in deep ocean. What we did is we researched the phenomenon of sound wave propagation and everything related to it, which underwater is pretty complex because the conditions in ocean are very different from atmosphere because it's essentially layered. I'm not going to bother you with these details. But one of the things that I did, I built instruments for our research because many of those things, most of those things are actually unique. So you need to run an experiment and there is no equipment for it. You need to build it. So that's how I got exposed to hardware engineering first and then software engineering. I also did a lot of BSP in order to do it because otherwise you won't be able to produce maps like this, for instance, in real time. And again, it was the 80s. And we used to have really small computers. They were not very powerful and we could not request more powerful hardware. And it was pretty much, so this is what you get and go ahead and build it. And I had to write the code using machine instructions in many cases. And it was quite a good learning experience for me. Also, these experiments, they produced a lot of data. So we had to store and manipulate this data somehow. So that's how I got exposed to very rudimentary notion of a database. So it was all built from scratch back then by us. But it was, I mean, you can't compare it with modern database anyway. So I got exposed to databases. Then I somehow learned a little bit of SQL, right? And I was amazed that, okay, by the fact that the language, the language interface is quite different from your procedural programming, right? And that's when I got the taste of the declarative programming. And I really liked it. So the two opposites, the really low level programming that you need to do if you want to build something based on high performance DSP. And on the other hand, highly abstract things that allow you to optimize your computational algorithm on the fly, these two things always started really, really fascinating. And in the end, I mean, that's how I ended up in SQL Server, because I did one big project using SQL Server Engine back then. And again, it was 1990s. And I learned how to use SQL Server. And I learned a lot about its internals. And that's when I got offered for Microsoft. And I built, so I built many critical parts of SQL Server. And then I ended up in Aurora. And it was another great project. And I was, I've been blessed to be in Aurora from the very beginning. And I architected big pieces of it. And again, I enjoyed it very much. And then when the opportunity, it was a chance for me, the pure chance, the former colleague of mine, who with whom I work both at Amazon and at Microsoft, he found that he was a founder of a little company, GAIA platform. And what they wanted to do was somewhat unusual. And it kind of, it was in the, I mean, one thing that was different about GAIA, it had both of those things, low level, really low level programming, the requirement for low level programming. And of course, we wanted, we want databases everywhere now, because databases are enormously successful and important platform now. Right. So and they wanted to build a circular platform for autonomous machines, various kinds of autonomous machines, like if you take, I don't know, a self flying taxi or a robot or surgical robot or any other piece of equipment that needs to work in an autonomous fashion. So not only very little input from a human, if any, plus it should be able to work in an environment when there is no online connectivity, right. And the idea was that so we built, we provide the platform that on one hand allows you to really do high level things like artificial intelligence and computer vision in order to inform, in order to build the picture of the environment, where your device operates and also software platform in order to allow you to express the essence of what your device is doing using using software constructs that that are specifically built in order to facilitate this kind of this kind of development. I'm sorry, I realize I didn't say it. Well, there is an essence of what your program does. And there is an actual implementation, right. So and if we have a software tool, that is very, that is not purposefully, that is not build, build well for what you're doing, then you will, you will, you will get lost in details. C++, C++ are excellent languages. However, they are way too detailed for many things. And the, and of course, the area of computer language design is a huge area and it has always been so and I mean, one might say that it was way too ambitious to actually try to do something meaningful here. But that that's what the founders wanted. And when they also told me that they wanted this whole thing to be based on a, so the foundation of this whole thing was going to be a database, a very interesting database. First of all, unlike normal databases that what they wanted to build is the platform that has certain properties like for instance, no copy and no data cracking. Normally, when you store a record in a database, it's stored in a format that is highly optimized for the storage. You want to be as compact as possible, you want it to be very flexible the format so that you can accommodate any kind of table shapes and so forth. And it usually means that in order to crack a record, the algorithms to crack a record are quite complicated because you need to take into account which record is nullable, which record is variable length, which records can be stored out of row, if it's a large object or something like that. And in the end, you get a quite complicated format for the data record. In SQL Server, I would say that it was one of the one of the complicated things and we never so we always wanted to change that part of the engine to in order to speed things up because this process of cracking the record is very iterative and it's driven by schema, of course, and you do it in the runtime. We want we experimented with code generation for records cracking, but it went nowhere because we didn't really have time for other things that were more important. So this project looked like something that okay, so let's let's try and solve this problem. And the solution was that the layout of the record is something that your program can understand directly. You don't need an API to crack a record. I mean, you do, but very little of it we chose as the format, the Google's flood buffer, and the API to crack a flood buffer is open. It's pretty simple. There is nothing very complicated. It and it's very efficient as well. So this is one thing you never need to run anything complicated or anything that you you cannot understand. Right. So this is how you crack the record. This is one thing. Another thing is that clients over databases, for instance, there is a necessary part where your data gets copied. You don't have direct access to the data in your database. So you issue a query, the engine finds the data packages it and sends it back to you. So even if you don't even if your client and the server are on the same instance, still, you usually go through multiple layers of functions in order to retrieve the record and you get the copy of the record, not the record itself, right? There is also a very annoying part that you need to somehow solve the problem of the impedance mismatch because between your favorite programming language and the database API, all these boring things like column bindings and so forth, and then the indicators, which column is there, which isn't and so forth. So there is a lot of stuff that is happening between on the way for the data, from the database to the application and vice versa. And the idea was to eliminate all of it. So you expose the data in your database directly in memory to your applications. Applications can crack the records directly. The database supports a pointer data type so that you can have complicated data structures, you can have graphs, trees, whatever. And all of it also with transactional consistency. So if you when you access your data, it's all transactionally consistent. You don't see effects of partial transactions. You never if you you can roll back your transactions, if the transaction is not committed, no one else will see any result of it. The data, your database persisted, like in the normal database, and you get all of it and you essentially pay almost nothing for it. The only thing that we asked you to do is to run a few magic methods like you connect to the database, you, I'm sorry, you connect to the platform. Let's not call it a database. You start a transaction. You call an API that retrieves, let's say the root of your graph. And then you crack the graph directly and then start navigating and then you run your algorithm and the view of the data as it's supposed to be in databases is frozen for you frozen in the sense that you don't see the effects of other transactions. So and this to me, this look like something that I really wanted to do for a very long time, something like this. There have been this, of course, there has been research in the transactional memory, software transaction memory, hybrid transactional memory, hybrid transactional memory. But I don't know if there is anything practical that I could just go ahead and use. I don't want to use the database with its traditional SQL interface, or even if it's not a SQL interface, if it's an object relational mapping skill, there is there are some things that I'm supposed to do before I can actually get to the data and so forth, right? I mean, so this sounds like the object oriented databases from like the late 80s, early 90s, like go to Versailles, right? Yes, it does. So, but we know that unfortunately, they were not very successful. I mean, they are not around anymore. So one of the reasons why they aren't is that the because many of them were based on the hardware on the properties of the hardware. I mean, they were based on some notion of a page fault, and when you wanted to retrieve an object, it would trigger a page fault, and then you would need to make sure that you will provide the payload that the memory location that was requested. And it was a lot of data, a lot of page faults and a lot of data movements. And it was somewhat slower than people wanted, one of the reasons. So there were other reasons, I'm sure, but one of the reasons that I can see that is exactly this. So, and now the question is, can we do it better than the good old databases of the 80s and 90s, right? Object databases. And what we ended up building is something that tries to minimize the overhead associated with anything that causes page faults and anything that would need to move a lot of data around in order to create an illusion that your data is actually where your pointer points to, right? So, and I'm sorry, I forgot advance the slide. So this is going back to the motivation. This is one of the potential targets for the platform, right? So this is a Komatsu autonomous excavator. It's a huge expensive machine and they were experimented with our platform Komatsu. This is another example of the Gaia platform in real life, so to speak. So it was a, this was a first autonomous car race. It was a year ago or so. And this is the race car that was controlled by Gaia platform software. Anyway, so let's just for a second leave it here. So, and I would argue that what we built, it tries to minimize the overhead of this, of the memory management, right? Because page faults are expensive. And if you need to, if you need to pretend that your objects are where your pointer points to, there is a lot of movement of data, right? So, but before I, before I talk about that in detail, I mean, just a couple of sorts about database. So two things, in my opinion, that makes databases so great is that the fact that the front end, so they solve the optimization problem of accessing and manipulating your data, right? So, and it's a highly declarative way. And it's, it's one of the most successful platforms that ever did anything like this, right? And another thing that the second part of the database magic is, of course, the transactions because transactions make it possible to, to make it seem like parallel programming is easy. The whole point of transactions is to give you an illusion that you can write serial programs in a parallel environment, right? And this is, this is one of the most things that database make databases great. And arguably, and I don't know if I have any counter example, databases are the most successful parallel programming platform. There are other parallel programming platforms, of course, but there is nothing that is nowhere near the simplicity of databases, right? And so, but when you write a code in C and C++, it, for me, since I, I mean, I've used databases and I build databases. And obviously, we write database called the actual systems we write using lower level languages. And there was always this feeling, I mean, it was if there is this satisfaction that on one hand, we create this wonderful illusion that allows you to simplify parallel program, but we can't use it inside the database. Because we have to do all those things by hand, so to speak. So all these highly efficient synchronization and concurrent access patterns, all of those things in order to, but we have to do it by hand. We can't do much. We can't use much. The tools that we use, since we want to achieve the maximum performance and maximum efficiency, maximum scalability and so forth, we cannot afford any higher level tools. But I always wanted to, I always thought that if I had an environment that would give me just transactional memory, everything else I'll take care of. It would be a great step forward and it would be a really good simplification for me. Because if I know how to think in terms of transactions and if the platform gives me transactions, it's very easy, it's much easier for me to write programs because at least one thing I don't need to worry about, the parallelism, right? And so the Gaia platform, one thing that it does, not only you access memory and all those good properties that they describe, but since it's transactional, it also gives you the very transparent notion of parallelism. You, the application developer, even using lower level language like CC++, you can enjoy transactional access to your data. And it simplifies a lot of things. And for me, it was this combination of the two things. So it's a lower level, relatively low level platform, low level in the sense that the original, so whatever we build so far, it does not have a SQL interface. At least it does not have a direct SQL interface. It is possible to access the Gaia database using Postgres, but the Gaia platform itself does not have any query language yet. We have plans to add query language, query language later, but as of now, if you look at the code, there is no query language. But we still wanted these two things, transactions and declarative access. And I'm sure everyone knows what the Microsoft link is. And it's an attempt to bring something akin to SQL power to the normal programming language, right? So the original promise of, and this is just an example of the object notation in the link, and this is the so-called SQL comprehension. If you are used to SQL, this looks very natural to you. And I would also say that the programming tool that used to be called Embedded SQL, I don't think it's as hot now as it used to be, but this is an attempt to do something like Embedded SQL used to do in a way that is just embedded in the language, right? Anyway, so this was one of the inspirations for the Gaia platform. So we have the .NET link, Microsoft link. There is also Java Streams framework, which kind of saw similar addresses, similar set of challenges. One thing, though I must say that I, so to me, link was somewhat of a disappointment because originally it promised to, originally it promised to do more than it's capable of doing now. For instance, you take this either this expression or this expression. And the promise was that if you used it without making any changes in the source, if you replace the so-called provider, data provider, then the advanced data providers can take this expression and before running it, they would also be able to optimize it, optimize it using the optimization techniques that are typical for databases. For instance, first of all, of course, it's all schema-driven and statistics-driven, right? But unfortunately, I'm not aware of any link provider that actually does it. I mean, yes, there is the link to SQL driver and then yet the optimization does happen in the database, but the original promise was that it doesn't matter if you actually have a database behind your link expression or not. Even if it's in memory link provider, the promise was that the architecture and the framework allows you to have a link provider that can do the optimization, but I'm not aware of any. So for Gaia Platform, one of the goals was to be able to do something like this. So we introduced our dialect of C++. We call it declarative C++. There isn't not much in it yet. I don't know if it's going to be evolving dramatically or not, but it has some elements, some good elements of link, and it somewhat tries to bring together the best of databases with the ease of programming using just general purpose high-level language, right? It's schema-driven. You just go ahead and create a schema for your database. And this is just an example of scripts that creates a database, a couple of tables for you. It's pretty much no different than SQL except this thing. And this is how the links between objects are created. So in this example, I have two tables, building and room, and each room belongs to some building. And this statement, it creates a relationship between building and room. So each building can have multiple rooms. It's one-to-many relationship. And each room has a link to a building. So it's a two-way relationship. Room-to-building is many-to-one, and building-to-rooms is one-to-many. And these syntax, I mean, we debated a lot about how exactly we do these syntax. Originally, it looked pretty much like declarative-referential-integrating databases, but many engineers with the background, with no database background, they didn't like it. So I think we borrowed it. I don't remember which system it uses a similar syntax. And there are other little syntactical things that I'm not going to talk about. I'm just trying to give you the essence of it and the spirit of it. So this is one element of the program. Another element is that we have so-called rules. A rule is an action gets triggered by a data change. So it's very similar to a database trigger, or it's very similar to event programming. The difference between these and database triggers, for instance, database triggers are databases provide triggers that run in the context of the transaction that actually did the change. So in Gaia platform, it's not like that. So although we have plans to add database triggers that work in the context of transaction that did the change, but the triggers that are implemented, they're a post-commit. So only if you commit, only then the rules, the code, the logic gets triggered. And again, so we did a little bit, it's a dialect of C++. What we do is, for instance, this thing, it means that it's an active field, which means that when the front-end language, front-end detects the add sign, it knows that, okay, so although we didn't explicitly say that this trigger, this rule gets triggered by the change of vehicle entering, it's a field in a tables come. This, the add sign means that implicitly, we automatically generate code that triggers this rule every time this thing changes. Although we didn't explicitly, anywhere in the rule, we never said there is another syntax that says on change, on update, on insert and delete and so forth. So this is one thing. I think that's it, there are no more elements. There are elements in the syntax that also automatically allow you to navigate from the navigate between the entities that have relationships like this. For instance, there is a syntax that allows you to go from building to a room and from the room, if room has a relationship with something else, then depart from building to whatever room references is also automatic. So you don't need to actually need to specify the exact path between the related objects, the system will figure it out if there is a unambiguous single route from related object, even though when the relationship is not just direct relationship, the system figures it out and generates the code that automatically allows all of those things. A quick question from the audience from Alex, you want to unmute yourself? Sure, yeah, I just didn't want to interrupt. But what was the motivation for, you mentioned, if I heard it right, that the rules are handled post-commit and then I was just curious what the motivation was versus you were comparing it to more traditional databases wherein triggers, for example, are akin to triggers, they are done within a commit transaction, I guess, a logical transaction. Yeah, so the triggers that are traditional database triggers, there are two groups of two large types of triggers. One is the triggers that help you to maintain constraints, triggers that validate something. And another kind of trigger, if you need to do a cascading action. So you insert something into one table and you need to make sure that some other action happens in another table and it's all transactionally consistent. So this is what you normally do in databases. The post-commit trigger is more like a feature of the so-called rules engines or decision engines. Because the logic there is different. So a transaction is something that has happened. So if there is a sensor and you get a reading from the sensor and you register it, it's something that already has happened and it cannot be taken back. So I don't know if I'm explaining it well, but it's just different use cases. One use case is there when you need to maintain some constraint. And when I said the cascading actions and cascading action is also a part of your higher level constraint. Like for instance, I insert the transaction record in my general ledger and it automatically updates the balances on the accounts involved in the transaction. That's a typical transactional trigger. Because you don't want to create a situation when you have a record of a transaction that is committed and the balances are not updated. For instance, this is different. This is post-commit. The event already happened. I need to react to it. Sometimes you don't want to be dependent on the result. You may or may not be dependent on the result of the prior data event or data transaction. I think that's the use case that I'm hearing from. Okay, thanks. Again, so one element is the DDL. You inform the database of the shape of your objects, although we call them all tables, but table is a collection of objects of the shape that you specify. There is this rule definition file and it has a syntax that is not your standard C++ syntax. And if you really wanted to go really low level, this is how it would look like. For instance, in one of the earlier implementations of the system, if you wanted to just directly access your database, and this code, I mean, we provide all the classes, of course, the basic classes. But these codes, I mean, I didn't want to, I didn't want to bother with you with a lot of code. But this code, if you look at the implementation, is actually very simple and very straightforward. It directly manipulates the memory. The memory that is set up for your transaction, when you call this method, what happens is that in the virtual memory is created a special range of pages that is in order to maintain the illusion for you, that the memory that you're looking at as a part of transaction is not affected by the changes made by other transactions. So what we do is, when you say begin transaction, what you have the memory view from this moment on is akin to snatch resolution. So this is the lowest level of it. You just begin transaction, you do some actions, and then in this case you roll back transaction. Similarly, again, this code is nothing spectacular, of course. It's Python. And you could do it in Python as well. So originally we had two interfaces, one for C++, another one for Python. The Python one was quite handy, because since the Python has this interactive character, the Python application, when you run it, the actual Python executable, the Python executable itself. It has this wonderful interactive nature. You don't need to write a program then compile it and then execute it. You can just experiment and play with your database live using Python. Like you would, of course, do with various SQL clients for your database. They all are, I mean, every database provides some sort of interactive client. You don't need to write an application. You don't even need to write a script. You just go ahead and type the statements one after another and see what happens. And Python, it was a wonderful option, because it allowed you to do exactly this with Gaia database, which provided even more real database-like feeling, although you were using really low-level direct memory access constructs and so forth. So these are the elements of the platform. Most of the time that I spent with Gaia, I was doing exactly this memory model thing. We can talk about it if we have time, but I don't know if we have time. We have 20 minutes. Keep going. 20 minutes. Okay, great. Okay, so let's talk about the memory model, because I'm not big on the language front ends and so forth. Of course, I know a lot about those things, but I'm not an expert. My expertise is in the storage engines and transactional consistency, memory models, and all of those wonderful and magical things. This is how the platform works. This is roughly the layout of the memory. There are two sections in memory, two important sections. One is called record locators, and the other one is called record payload. Record locator is just an array of pointers. And if I want to access this object, I first find the record locator for this object, and I follow this pointer. So if I want to make a change, for instance, if I wanted to add an object, then my transaction also has the so-called write buffer, and this write buffer is in the area of the record payload. And when I create an object, I first create the actual object, then I create a record locator for it and set it up so that it points to the actual record payload. So in this case, two records are created by a green transaction. So this is just the high-level view of things. This is how the applications see the memory. What happens underneath is this, or this memory that has the record locators for each transaction is actually managed by the system in the way that if you try to change a pointer here, your transaction, what happens is that it triggers the memory management. It triggers the memory manager. You get your own copy. So in the top part of the slide, this is what the transaction, so the transaction that does the change, it sees this. So originally, this last pointer was pointing to this object, which was white blue. So now I changed it. I created a new version of this object, and now my record locator pointer, my record locator point to this. This becomes a garbage. However, all other transactions, they still see this, because whatever my transaction did, it does it in memory that is managed by the system in the way that other transactions, if I start another transaction before this transaction is committed, they still see the old pointer. So this allows you to have transactions with arbitrary length, and there is also these transactions, these transactions are not visible to anyone, and I can have, I mean, I can have the complete illusion of snapshot isolation. So this is one element of the memory management. And by the way, this memory is not, so it's an expensive thing to cause a page fault, because when you do copy and write, what happens underneath, it actually causes a page fault. The operating system will create a private copy of the page that contains the bytes that you touched, and you only see your private copy. No one else will see it. However, since the memory that we need to manage like this is much smaller, I mean, we only need to manage like this, this part of the memory, which is just a record locator. It's a 64 bit pointer for each object. It doesn't matter how large your object is, the only, we only touch 64 bytes, 64 bits, I'm sorry, for each object update. We never need to do anything special with this memory, because this memory, again, this memory, the record payload is in a shared memory that is accessed, can be, that is accessed directly. And there is no virtual memory, special virtual memory treatment for this region. Well, there is a little bit, but it has nothing to do with, there is nothing expensive like handling the page faults, right? So, and the only thing, the one thing that we require the applications to do is when they manipulate objects, they always go through the, this record locator array, right? And if you are writing, if you are writing it using writing your application C++, the only thing that you, one thing that is different from, if compared to, if I wanted to just write plain C++ program for this, is that the object pointers are actually pointers to pointers. This is the only change, right? It's a little bit of, we will bother you with this one thing, right? So whenever you want to access an object, you are supposed to use this de-referencing array, right? We manage the de-referencing array in the way that creates for you an illusion of a snapshot. And since all the payload is, we ask you to only access the payload using this indirection, it also creates an illusion of a snapshot in the payload as well, right? Again, it's now going back to this picture. So every time I change an object, in this case, I change this object and this object, I create a new versions, I repoint the record locators. We also capture the change law. So there is a special range in the shared memory that is called write buffer. This memory, by the way, is read only by default, right? Whenever you open a transaction, the system creates the snapshot of this memory first. And of course, it's an incremental snapshot. You don't need to snapshot all of it. You just need to, we just use the operating system primitives that create a private view of this memory. If you change anything in here, no one else will see it. If anyone, someone else wants to change, then it's done in a controlled fashion. I will talk about it a bit later. When you want to change an object, then the system, when you open a transaction, the system allocates the write buffer for you that your process or your thread is allowed to write to. Then you just create the payloads for your objects here directly. You update these pointers. This change is not visible to anyone. This change is visible in the shared memory, although these blocks are not reachable, because there is no pointer in the, there is not a single pointer in this memory that would point to these objects. So these objects are not reachable. Again, if you follow the rules, but the rules are very simple. So we capture the change, change lock, and then when your transaction needs to commit, it actually goes ahead. Then whatever we capture, it goes to the central transaction certifier that makes sure that there are no contradictions without transactions. And by the way, it's snapshot isolation, not serializable. We only catch the update conflicts. The transaction data anomalies that are, the snapshot isolation data anomalies, they of course exist here. We designed the system so that you could add support for serializable, but it would be much slower, because it uses a different way of tracking with, it uses, what it uses, it uses, it tracks the pages that got updated and that got read as well. And it's quite expensive to do it using the stock operating system, but it's possible. And if you really want serializable, you would be able to do it. It's not in production, so to speak, in the code, but there is, the system was designed to allow this as well. Anyway, let's go ahead. So in this case, these two objects got changed, and these are the new versions of the objects. This is the committed view of memory, both the record locators and the payload. And we have a list of garbage here. So these two versions are now obsolete. They might be still needed for some transactions who still have these record locator memory that points to these objects, so it's not garbage collection is timed correctly, so that only when these objects are no longer reachable, then these objects, this memory gets reclaimed. Are you just doing a simple epochs with this, or how do you track what's reclaimed? So it's a combination of epochs and tracking of individual transactions as well. But in essence, yes, it's epochs. Okay, so now, so all this is good, but we also need persistence. And one thing about persistence is that if you really want to, I mean, at least it used to be like this, the best database systems, the persistence is designed, you can't really design the persistence layer without designing it in the way that also gives you the best concurrency in memory. So if you take the, I mean, let me qualify this statement. This statement was true, I don't know, 10 years ago, 20 years ago, 30 years ago, the best systems, if you wanted to, when you were designing the in-memory concurrency protocol to access your pages, you have to do it in the way that would also not be in the way of your persistence algorithms, right? So whatever you do for your redo loggy, whatever you do in order to track the undoes and so forth, you have to do it in the way that does not contradict your concurrency model in memory and vice versa. And the reason for that was that again, there are multiple reasons, but the central fact here is that the layout of your data on this and the layout of your data in memory, they were exactly the same. You wanted, and it's a great simplification because suddenly your page replacement algorithms are very, very simple. Just bring the page to memory and you're done. You don't need to unpack anything, you don't need to do any transformations and so forth, right? But this system, the Gaia platform, is different. It's an in-memory system, in-memory meaning that your working set hits in memory. There is, you don't need to have any page replacement. That was the requirement to the system, which means that you are free to design your recovery subsystem and recovery and persistence completely in the way that you like. And this is the first time that I tried it and it just works. I mean, whatever happens in memory does not, the memory layout here and whatever we, whatever ends up being on this, it's completely, they are completely unrelated. You can think of it like this. I tried to formulate it in the way that you immediately get it and one formulation that I settled on is the following. We all know what CDC is, change data tracking, change data capture, I'm sorry, right? So you make changes in memory and you capture the log and you send it to the CDC target and the CDC target materializes it. So it is, if you think of your recovery as a persistent, as your CDC target, then suddenly you are free to design the in-memory algorithms the way you want. I mean, I don't need page replacement. The only time when I need to read data from this is the database startup. I can afford doing something more complicated than just reading the page and just slapping it into the memory, right? It's just a startup and it happens it doesn't happen often. So I can actually afford doing something slightly more complicated than copying the data from disk to memory. So one thing, this is one thing, another thing. So we didn't, I didn't want to invent anything here. So since, I mean, LSM is a great technology and RocksDB is one of the best implementations of it. We just used RocksDB in a somewhat unusual way because one thing that RocksDB does well is, so first of all, it's partitioned. You can actually create multiple partitions in your files. So by adjusting the number of partitions, increasing the number of partitions, you can actually control how expensive the mergers are. Yes, you would need to do more mergers because you have multiple partitions, but you can control how long each merge is. So this is one thing that RocksDB does well. Another thing that RocksDB does well, but we didn't need it, RocksDB manages the memory. So RocksDB does not assume that you never, ever read from disk while the database is running. So they have a quite sophisticated memory manager that serves the pages from memory, but we didn't need it. And what we ended up doing is that we configured the RocksDB with very little memory. We only gave it enough memory so that it can do a reasonably efficient mergers. And for reasonably efficient mergers, you don't really need much memory because the way the LSM does the mergers essentially does sequential reads and writes of the levels. And if it's a sequential read or write, then read buffer that is small enough is still, still allows you a pretty efficient algorithm here. We never, since the Gaia database never asks RocksDB to deliver data when the database is running. This allow, I mean, this whole thing about separating the in-memory from the on-disk and plus the fact that it's in-memory database, it allows us to use this dirty tree. And it just worked really well. And most importantly, the whole thing was very simple. So this part was very simple. Yours truly, I was the first engineer in Gaia. I actually built this system and it took me two months without persistence. In order to add persistence, it took another two or three weeks. And the end result was that it was very easy to use database. It was reasonably fast. I mean, it's not the fastest database, of course, because the fact that you need to manage this memory and every time you start the transaction, there is a processing wall here that sets up this memory in the way that allows you this transactional semantics. And it's not cheap. It takes a few microseconds in the best case scenario. And if you have a really large footprint in memory, if you have 100 gigabytes or so, it might take, I don't know, maybe 50 microseconds. I don't remember the exact numbers now. So a single thread transaction rate, it could be somewhere between 20,000 transactions per second to, I don't know, 10,000 transactions per second, depending on the size of your database or of your working set, right? However, it scales really well because you can have multiple threads running simultaneously. And until you hit the limits, until you hit the first limit when suddenly multiple threads are servicing multiple transactions, it scales pretty much linearly. So we were able to, on a simple system, eight core or 16 core, I don't remember, we were able to drive the throughput to hundreds of thousands of transactions per second, which is very good. I mean, it's not the best, of course, but it's not bad at all. So, yeah, I think we're out of time. There is not much more left here, and there are a bit more details about how exactly the transaction commit happens and what happens in the transaction begin. But I mean, it's just the details. It won't change any essence that we talked about so far. So I guess this is it. Okay, awesome. So I'll clap and have everyone. We have one or two questions from the audience. Krita, do you want to go first? She's still here. Sorry. I think she bounced. Her question was, I mean, is the core intuition, I guess, motivation of Gaia that you just want to provide better APIs for embedded devices over shared memory? That's sort of my main question too. It sounds like these embedded, people that want to build these IoT devices, they don't want to write SQL down to SQLite or pick whatever engine you want, but they are, you know, embedded data is the one that they want to write in an object or in a programming language like this. It's not ORM. And that's the core motivation behind Gaia. Yes, so that's the core motivation behind Gaia, correct. And Gaia also attempts to bring the power of declarative programming to the masses, so to speak. Two things, declarative programming and transactions done in the way that are friendly to people who don't want to learn databases, don't want to learn SQL and so forth. So this is the main motivation, one of the main motivations, right? Because in my experience, I built my first database application 30 years ago, and I can tell you that I had to learn a lot. When I joined Microsoft and when I interacted with application developers, what I've noticed is that the qualification of the application, the database application developers, it gradually declined, not because people are not as smart as they used to be, but there are a lot more people that are doing databases now. And they have a lot less time to actually learn things, because everyone is under pressure to build applications faster. And yes, sure, if you know your tool, for instance, your database really, really well, you can do more efficient applications. But I mean, it's not always the most optimal thing to do, spend five years learning and knowing in the end, you're an expert, then you can quickly write applications. But in those five years, you could have written tons of applications, perhaps not the best applications, but they work with the applications that would work. So and we have the community of engineers, the embedded engineers who really are conservative in this sense, they don't, they tend to stick to low level languages. I mean, if you are writing a DSP, if you're writing device drivers, CC++ are your typical tools, right? And you are really expert. And you don't want, you probably don't want, or you don't have time to spend on learning something new. And this was an attempt to make sure that whatever you need to learn is very little of it. Got it. So the one thing is, I think it was similar to sort of that vision was extreme DB. They're down in, and they're a bit older, but they're, same thing, they want to run a really small devices, they think they expose a CC++ API.