 So, thank you for being here. First things first, I'm coming from Spain, I'm Spanish. So we typically, well, we wish at least, we don't have, but we typically wish that we could have some siesta after lunch. So even though this is very early for us, we don't have lunch until 3 p.m., I would understand if you would fall asleep, so no worries. Well, I hope that doesn't happen. So I have a ton of material to cover here. I hope it's going to be interesting. Unfortunately, since this is going to be recorded, I cannot move a lot from here. I tend to do so, so it's not that I'm strictly staying here. It's just I've been asked to do that. So let's get started. So just a little bit of background about myself. Alam already helped me with this introduction, but I work for a company called AK Data. First point, if you guess where this name comes from, yes, it's a little bit Postgres-related. We are a company that are focused on doing crazy things with databases. It's basically R&D on databases, which basically means we do whatever we want, we have fun, and some of those things turn out to be maybe useful for someone. We also offer professional services in Postgres, and you can find me here on Twitter, LinkedIn. It's easy to find on the internet. So what I want to present here is one problem that we see in the market, and is that some people are using MongoDB or started using MongoDB and they might or might not be happy with it, but at some point they want to move somehow to relational for whatever reasons. And that is a hard problem to solve. And I'm going to try to present all the solutions available for trying to solve this problem. And of course, very biased from my point of view, one of those which I believe it could be the best one. And this opens a new opportunity for Postgres shops to help customers work migrating from MongoDB to Postgres. So before that, let's explain why would anyone want to migrate data from MongoDB to Postgres? Because there are a lot of different data limitations. There's, of course, many obvious reasons that you might find on the internet. You might know yourselves, but let's briefly look at them. So the first one is that a lot of people want ACID. And yeah, it's a little bit bitter not to have ACID. So basically, in MongoDB, there's only atomicity if you're operating within the same document. So if you have documents in different collections or you have split data into separate entities within, even within the same collection, you cannot perform atomic operations. And might be a problem for some applications, might not be a problem for others, but it definitely helps to have atomicity other than the document level. So if you're trying to refer some data to another data, any change you do that is not going to be atomic in MongoDB. And this is something you have to do on the application side. This typically leads, we did a small research, and people said that developing an application that requires transactions without transactions is 10 times more costly. So this turns out to be more cost development time and more likelihood of bugs in your software. So if you need transactions, this might be a problem. The second problem, and it's not obvious to many, is that there's no consistent reads in MongoDB. In MongoDB, a query, a long running query, a Corsair, if you want to call it that way, it is not stateful. Corsairs are stateless in MongoDB. Basically, the query could be restarted again when your query, when your process, when your Corsair yields the CPU and the logs to other concurrent processes and you restart the reads. But in the middle, some operations, some write operations could have happened in the database. So what this means, to put it very simply, is if you have, for example, an application where you're doing accounting and you have a balance, and a balance has always had to sum zero, it might have sum zero. So again, this is something may or may not be important for you. If you want to fix it, you have to work it out on your application. And I can tell you working without consistent reads, it's a little bit hard. So if you want consistent reads, it is a hard problem. This is part of a blog post that appeared last year. Basically, someone detected that when doing a heavy update operation on MongoDB, the number of results, if you just perform updates and deletes or even inserts, and the amount of documents in a collection remains the same, then your query, you found that the account on that collection yielded many, many, many different results, where that operation should have always returned a consistent number of documents. So that's, again, this is a problem may or may not affect your application, but if it does, then it's hard to solve. And finally, and probably most importantly, is that there is a lot of problems in terms of how BI aggregate data warehouse query performs. And this is not specific to Mongo. This is inherent to most, if not all, no SQL systems. So let's think about this a little bit. What means no SQL? Or what means this term that I hate, but I'm still going to use now, which is called schema less? Schema less means that each document in reality comes attached with its own schema, and you're repeating this schema all the time. But there's no central place to query this schema. So when you perform an aggregate query, let's say, I want to compute the average salary on this company. Well, the field salary, you don't know if whether every document has this field or not. So you have to scan document by document as represented here. You have to take one document by document, start scanning it, and find this field called salary, which might be embedded within nested documents within the JSON, depending on the data model that you also had to design first place, right? And the field might or might not be there. Even if it is there, then you need to check what is the data type. It might be a string. It might be a timestamp. It might be anything. So basically, this is a picture representing how internally this works in MongoDB. So MongoDB, its key value could be three tuple or four tuple elements, where you have the key, the length of the key if it's a variable length, the type of the value and the value itself. So for any BI query, for any aggregate query, you have to iterate all documents key by key, decoding the type, decoding the key name, decoding the length if it's a variable value, just to see whether that matches the predicate of your query. This basically means, and I'm going to prove this later, that any type of aggregate query will need to scan the whole database. Any query that you do in aggregate fashion needs a whole database to scan. And we all know that this could be a slope. So let's imagine, not that unusual case, right? Because I'm going to take this a little bit to the extreme. So let's imagine that you have developed, and this is a real case, by the way, that we've been helping some customer with. So let's imagine you're developing an online game, are your storing all the results and all the intermediate stuff in MongoDB database, right? And the game is difficult to play. So people who basically pass a given level, it's a few percentage of the total games. But you store this in a separate collection. And then you want to get, sorry, in the same collection, but in separate sub-document. But then you want to query some statistics about, you know, what is the average level of the people who pass a given level. That was, again, a business case. So if you look at the design of this schema in MongoDB, it might look like something like this. You have a set of documents in the collection, and within the documents, there is a nest document called stages complete, which contains the information about level completion. Now, think about how would an aggregate query need to perform this. An aggregate query will go document by document, and then it has to scan all the keys until it gets to the sub-document, the stage is complete, which might or might not exist. And if it does, it might contain or might not contain the information we're looking for, it might be the key type we're looking for or not. And you're scanning a lot of information, all the external and closing document that you really don't need for the query. That's a lot of wasted IO. If we would do this in a relational way, it would look something like this. There would be a relational table, we'll have a game table and a stage complete table. And notice that the stage complete table in this case would be very small, very small, because only a few people complete the level. So if you want to perform that aggregate query, you will not require the game table. Maybe just to join with the idea if at all. But if not, you're just gonna go to the stage complete table, and this table is gonna be small. So the performance of the query is gonna be significantly better. So let's demonstrate this. We took some data, which is public from the GitHub Archive data set. A small data set, 1.4 gigabyte. And we perform a very simple aggregate query, which is find the top ten actors. And we did this in MongoDB and the equivalent query in Postgres QL. And you can see the queries used at the right side of the screen. Now, we use IO top, which is a tool that measures the IO required by IO used by a process. And we measured exactly the IO required by both MongoDB and Postgres to answer this query. And guess what? Mongo did exactly 100% of the database size to answer the query, proving what I was saying before, that it does a whole database as can. You can see that the IO required by MongoDB is 536 megabytes. And that is the size of the collection, it's compressed. So that is the real, so the original data size is 1.4 gigabyte, but compressed. You can see at the top right corner that a compressed size reported by MongoDB is 536.37, so it's the same IO. So MongoDB basically read the whole database to answer this query. However, in Postgres, to answer this very same query, only 88 megabytes were required. Why? Because there's a relational design behind this, and we don't need to scan the whole database to perform this aggregate query. Things are even more radical. If we just throw columnar storage on top of Postgres with a C store columnar storage, we're only 6.5 megabytes of disk are required to answer this query. So it is not difficult to infer that given this significant advantage in terms of IO required to answer the same query in a relational system, performance could be significantly better. Of course, they're done in a relational way. So, okay, also some benchmarks later. But for now, let's assume I am right and performance is better. So, okay, let's migrate, let's migrate off of MongoDB. But how we do it? Because basically, there are two main problems that we need to figure out while doing a migration from MongoDB, or typically any NoSQL system to a Postgres system, relational system. The first one is, how do we generate the schema? Because if I firstly hand to you a NoSQL database, I hand to you a MongoDB dump, right? Here it is, and I ask you, tell me what's in your MongoDB database. How do you do that? How do you answer that question? It should be very simple, right? Well, it's not, because you can store anything. So, you should scan document by document, that could be millions or trillions or whatever. Anyone could be different. So, you don't know the structure of it. Even MongoDB has a tool called the BI connector, which is a proprietary tool that somehow breaches them with the SQL ecosystem. That tools perform a random sampling of your dataset to infer the actual schema. So, it's not complete, this is random sampling. It's a solution, but not complete. So, this is the first problem. Any tool that we might use to migrate of MongoDB requires to infer the schema. Second problem is related to this one, but a different one. So, how do we react to schema changes? So, let's assume that at some point we've been able to infer the schema present on the original document, the original MongoDB system. Now, what happens if someone inserts a new document with a new structure with a new field? The schema has changed. So, how do we manage these dynamic schema changes? So, let me show you a brief of the available tools that are in the market to fulfill these migration purposes. So, the first one, and most known in this regard, is called Stripe MoSQL. It is an open source tool developed by Stripe, the payments company. Yes, they store all the payment information in MongoDB, but then they ETL this to Postgres for performing analytics. So, here, the schema you need to decide by yourself. You write the schema on a jammel file, and then the tool performs a conversion. Needless to say, anything that is outside of this schema will not be transferred. And it can also not react for the same reason to changes in the schema. If a new field appears in MongoDB. Other than that, it creates the tables and the columns, but only that you have specified in your jammel file. So, you have to tell the schema, which comes back to the first problem that I was saying, that you may not know what is a schema. It also doesn't support starting. So, if you have started MongoDB, there is a problem, and it's not maintained anymore. So, probably it's not a good choice as of today, because it's just continued as an open source project, I don't know internally. There was a MongoForeignDead wrapper developed by EnterpriseDB and CytosDB in first place. So, this one uses the techniques available in the ForeignDead wrapper, Postgres interface, which is a great way of accessing external data. So, this is a great way to present things that are otherwise MongoDB as a relational table, or relational tables. So, again, we need to specify the schema here. And if we want to do complex things, like basically present the user a relational type of a schema, we need to do several conversions. We will need to present several tables, foreign tables. And basically, this has some own problems. So, typically, people who use this end out flooding in the structure. So, you end up with a huge table with many columns where the arrays might be exploded, where the nested fields are all present there, which works, but then it is not a very good human relational schema. There's no relations, there are no separate tables. You may need to perform self-joins to answer aggregate information, which of course doesn't perform very well. So, it's a good solution because it's embedded in Postgres and uses the foreign wrapper interface, but from a human consumption perspective, it's not ideal. And again, you still need to specify the schema and react to changes in the schema. There's another project called Quasar, which Quasar's team starts from a project called Slamdata, which is a BI tool for MongoDB. And they released this also as a Postgres foreign wrapper. From that perspective, it is very similar to the other foreign wrapper with the same advantages and disadvantages. So it has some extended options, but other than that, it is more or less the same thing. Then there's the MongoDB BI connector. The MongoDB BI connector, first of all, this is a proprietary solution that comes only with your enterprise MongoDB version. There's two versions of this. The first version was very popular, at least in the Postgres world, because it was based on Postgres. So, the MongoDB BI connector internally used Postgres and used foreign wrapper to map between MongoDB and Postgres. The main problem is that at that time they were using Postgres 9.4 and support for pushdown of query predicates through the foreign wrapper interface by 9.4 was not very good. So the performance was not very good on top of MongoDB performance. So, I'm gonna demonstrate later that MongoDB performance for aggregate queries is not ideal. If you add a connector on top of it, that introduced some extra performance degradation, performance is worse. So, it has a tool called MongoDRDL which samples your database to try to infer the schema. You typically need to tweak it manually afterwards. So, you edit the file and perform the schema the way you want it. Specifically, it does not handle type conflicts, whether a same field in sometimes is an integer, sometimes is a text. Doesn't handle that, you have to do it manually. If you want to do something with that. And the second version of it, right now, they drop the Postgres foreign wrapper interface and instead switch to a pure in-memory proxy and they implemented MySQL protocol. So now it behaves like a MySQL database but it's performing this translation live. And performance is, I can't speak because their user license agreement prohibits me to speak about performance. But I'll say there's no news in that regards compared to the first one. Then of course you have the classical ETL tools or the ETL market is available. That could help migrate also data from MongoDB to Postgres. But ETL tools are typically battery ended. So, first of all, you need to specify the schema anyway. Then you need to specify the transformation of that schema. Then you cannot still react to changes on the schema and it's not gonna be real time. It's gonna be batch oriented. So, good to work, depending on your use case. So, there's several options here and at least my opinion, none of them are ideal to fulfill these two goals that I was presenting the two main challenges which is finding the schema and reacting changes into the schema. Fortunately, there is this open source tool that we have built and we are still building of course which is called TorDB Stampede and precisely is designed to work on these goals. So, TorDB Stampede basically, it is an engine that turns no SQL documents, typically let's think of a JSON document but it really needs not to be a JSON document and turns it to a set of relational tables. The goal here is that the relational tables that have been generated by the TorDB engine are as close to what a human would have designed as a DDL schema. So, it's not just flattening out everything on a single table. It aims to develop something like, if you wanna call it that way, like automatic DBA or automatic DDL generation. But it is that it should look like something like this. So, if you insert JSON information about books and authors, you would end up with a book table and an authors table. So, how it works because this is completely automatic and there's nothing you have to specify beforehand. So, how is TorDB able to guess the schema? Well, it's not that difficult, the schema is there. So, when people say schema less, I say schema attached because a JSON document or any other document in a similar format, it's basically data and schema. That's why it can change from document to document because the schema is still attached. It is repeated there. So, if we look at this example, everything that it's blue is a schema and data is white. So, we just need to separate this and the schema information is all there. So, the way it operates is that every incoming JSON document it is analyzed and metadata, it is separated from data. And then, TorDB will dynamically execute DDL to create tables or to add columns to tables in such a way that they will match the incoming schema of the document. So, for every document, at least one or more tables will be created if they don't exist beforehand. And also, the columns will be created as necessary to match the fields in the original JSON document. There's also a catalog that Postgres, sorry, TorDB creates on the database that you can also query and it's very useful to query that catalog to find out how many tables, how many documents, how many fields, how many collections have been created and what is the mapping between the original source and the results that have been created in TorDB. So, think of like Postgres catalog tables, but for TorDB. And there you can find the mappings between the original data, no SQL world and how they have been transformed into the relational world. So, to look at an example, if you look at this example document that comes from MongoDB, here you can see on the left how it would be transformed by the TorDB engine into these three tables. So, if we look at document on the right, there is, we need to specify that MongoDB does this, by the way, you specify a database name and a collection name, this case is called TorDB in people. Databases map to schemas and collection names map to the prefix of table names. So, it's very simple mapping. Then, basically any property, any field, any key is mapped to a column and nested documents will result in one-to-one relationships and nested arrays will result in one-to-n relationships. Which, if you think about it, is very natural design. So, in this case, for example, address, it is a nested document, so it's mapped as a one-to-one relationship. It will create a table called people address and it will have the same columns that are present on the original field, like in this case, street and number. And then, the hobbies, which is an array, will be exploded into a relational table and you'll have all the values there. There's no limit in terms of the document shape, the levels of nestedness, the complexity of the JSON document, arrays inside of arrays, non-uniform arrays, any level of nested documents, there's no limits. Everything will be transformed into tables. This ID, by the way, that's an implicit value of MongoDB, so just in case you were wondering. Now, how do we react to these dynamic changes in the schema? So, let's imagine we have these two documents. So, what do we do in this case? Well, what we do is we create an extra column and we basically do an alter table, name of the table, add column, the new column. And that's it. Most databases have, with some noticeable exceptions, adding a new column with a nullable value, it's very cheap, possibly just the catalog entry. So, this is not a problem for us. And you'll create just new columns dynamically when required or new tables when required. How do we react to conflict types? So, if you look at this example, surname in one document is an integer and in the previous document it is a string. So, how do we handle this problem? Well, it's actually very simple. We just suffix every column name with a character that indicates the type. So, in this case, there's a surname S which is a string and a surname I which is an integer, a number. So, if you think about this, this is like a sparse table. Very mind that null fields in Postgres takes almost no space. They're almost free, sometimes free. So, this takes almost no space and you can think of it as a sparse table. And then, once this is done, it can very easily create a view and coalesce these data types together, cast it to a super type. You can do whatever you want with the information here. Okay, that's a good question. The question is whether we start without suffixes and add them later. No, we always add the suffix. So, it's always consistent. The mapping pattern is always the same and you're very welcome to create any views you want to rename the fields or reorder them or anything that you can do. The underscore, yeah. So, if the underscores are reserved character, the answer is no. You can still use it, no problem. Yeah. It's kind of escaped with an extra one. So, the way this product works, it's actually, TorDB Stampede works as an extra node of MongoDB replica set. Replicate set is the name of a cluster in MongoDB and it just works, it would be added as an extra node. So, you just plug it there and it will start replicating all the data from MongoDB and then you can query with SQL. Postgres is behind. So, this is very simple to set up. I'm gonna try to do that if we have some time and you'll see it just hooks there, starts replicating all the information, will create dynamically all the tables and all the information, you'll have it there. So, it's not intrusive in terms of adding it to MongoDB existing replica set. Just internally, to give you a brief overview of how it works, it is structured in four layers. The topmost layer is the protocol layer. We have a second layer called the abstract doc or KIVI doc, which we call internally, which is an abstraction of a JSON, of a document, properly speaking. So, this allows us to implement more protocols. Right as of today, we also only support MongoDB, but we're gonna be adding others, like Cassandra, Couchbase, Marklogic, such that we would be able to transform from these database protocols into our abstract layer and from there perform all the transformation to relational. So, this is basically an all from no SQL document databases to relational. Then we also have a backend abstraction layer and we can plug in all the databases too. So, soft to date, it works with Postgres, but we're also working with some specific data warehousing databases, which are also very closely related to Postgres, to be able to scale out the database layer. By the way, TARDB supports also sharding. So, a single TARDB can connect to several replication sets from MongoDB that belong to a third and it's third aware. So, even if you perform, if the sharding algorithms in MongoDB perform a balancing, and we will take into account all that, and you will end up with the whole information in a single Postgres database. And Postgres database, and you'll see performance right now, scales significantly better than MongoDB for BI query, so this is not a problem. Postgres scale is vertically very well. And if not, we'll have all the backhands with to scale out underneath. So, yeah, that's covered too. So, the question is about the sequence of data ingestion. So, let's assume there's an already existing with data MongoDB cluster. So, the first time you run TARDB, it will connect to the cluster, and it operates in two separate stages. The first stage is called initial sync. So, in that stage, it will connect to the master, or to the secondary, it doesn't matter, and we'll do a basically select star from all tables in Postgres terminology. We'll copy over all that information. Once all the tables are copied over, and of course transformed to relational, it will enter sync mode, which is like streaming replication. So, we'll start replicating the deltas, and we'll start applying them. So, we'll always have a live feed after the initial sync. And any document that you will create in MongoDB will automatically appear in TARDB real-time. Yeah, so the question is whether this node acts as a pure MongoDB node. It behaves as of today as a hidden slave. So, it does not perform in MongoDB elections, and cannot be promoted. There's a lot of work to do that, but it will eventually be. Okay, so, I believe this kind of presents an opportunity for Postgres users, and of course Postgres shops, basically, to go to an extra market, to go out of a little bit of our well-known, let's say comfort zone even, and offer services to MongoDB market. It is a market that is growing very significantly, and surely I've seen some reluctancy in these areas about working with MongoDB, but I think it presents a clear opportunity. There's valid reasons why they want to migrate, or have their data also copied to the relational world, and there's a lot of difficulties in doing so, and here's an open source solution which you can use to help them. So, go out and make some money out of it. Cannot put it more clear. So, let's show some numbers to at least demonstrate that this works and it's worth the effort. So, we took data from the GitHub archive, which is almost unlimited amount of data in JSON format with the proper JSON documents. I mean, they're complex. They have nested documents, they have arrays. They have stuff. They're not the typical data set that we find out there, which is basically a CSV converted to JSON. That would be awesome for us because it will map to a single table, but this is not the purpose that we are trying to show here. So, this is proper data set, almost unlimited, and we're gonna compare here running some queries, some typical BI aggregate queries between running them in MongoDB on one node, running on MongoDB on three nodes, and running on Postgres with a table structure that has been generated by ToroDB, ToroDB Stampede. We run six queries here. All of them are documented. We can, there's a whole post on our blog site on akdata.com. You can see all the queries both in MongoDB and SQL. You can also compare how human-friendly are those queries in MongoDB and query language and SQL, and you'll find which ones are better for you. There's many more queries that we could perform. So, this is just a random sample of those. So, with a small data set, 500 gigabyte data set, here you can see what the performance looks like. Blue bar is, of course, Postgres. Screen is, of course, MongoDB. Half better, gamma correction here than Oleg had before, and jello is MongoDB with three shards. This is execution time, so less means better. And as you can see here, Postgres beats MongoDB on a single server in every single query, and beats MongoDB with three shards in almost every query, basically except for the first one. Postgres is faster than MongoDB on three nodes and by a huge margin. So, like, for example, query D, Postgres is 57 times faster than MongoDB on a single node. So, it is not 57% more. This is 57 times faster. So, a query that in MongoDB takes four hours, more or less, it's just taking four minutes in Postgres. If we take a smaller data set, surprisingly, the difference is even bigger. So, here's a query, query number D, which in MongoDB takes 20 minutes and takes nine seconds in Postgres. That's 267 times faster. Again, it's not 267% faster. It's two orders of magnitude faster. Yeah, yeah, yeah, this is MongoDB 3.4. So, yeah, yeah, the latest version with compression enabled, you know, all that goodies in there. Oh? Deerable. I mean, this is read-only query, so it doesn't really matter. But, yes, with durable settings. So, this is GitHub. Some people say, oh, you're not using proper indexes, or you're cheating with indexes. Well, indexes typically are not used in BI queries, because you're scanning a lot of data. So, they're typically not used. And specifically, if you connect to BI tools, they're almost never used because you cannot predict the query patterns of users. But still, you know, we wanted to try without indexes, and well, there's some changes, but still 100 times faster. So, given this, we don't really know how much to say toward the V is faster compared to MongoDB, so we just settle on 100 times. But, you know, it's a rough number. Might be wrong. So, sometimes, yes, 100 of magnitude, sometimes it's two. So, you know, we settle down on 100. We're not too pretentious. We've seen 300 times, but let's say 100. Postgres parallel, it's not benchmarkier, by the way. So, yeah, it could be faster than this. But, you know, we don't want to do benchmarking, even though that wouldn't be. It's a fair point. But this is without, this was 9.5. So, yeah. This is, yeah, it was SSDs. It's not, it is documented on the blog, but it was a, by the way, this is a small instance. It was an Amazon, I think it was 16 gigabytes of RAM, two cores, local SSD disk. So, I mean, nothing really fancy. So, yeah, see, this is the kind of performance. So, I think I still have some time. Let's try to do a demo. It will, of course, fail, because the most fails. And it's not prepared, I promise. And the reason I don't prepare it is because if it fails somehow, I want to show you how it fails. So, you can, because that might happen to you, too. So, the only thing I did is download TorDB before because I don't want to mess with the network. So, let's open this. Is that okay? No. Maybe now? All right. So, probably, if I'm lucky enough, there's some TorDB here. TorDB stampede, yeah, okay. So, let's uncompress this. There's nothing here, like this, right? TorDB, no, okay. So, let's uncompress this. So, this is the binary you get from the website. Just go tordb.com and download this. Then you uncompress this. It shouldn't take long. And here you have tordb stampede folder. So, there's some folders here. So, the first thing is that tordb uses configuration by convention. So, you can generate a config file and use that config file to run it. But if you follow some conventions, you don't need a config file. So, first of all, to generate a config file, just do this, this L. And this will generate a config file, for example. You can use either Jaml or XML config files. So, this will just generate one sample config file for you, and you can edit. And it's not very difficult to fill it in. Just specify postgres parameters, Mongo parameters, and so on. So, this is just one option. And then you just run it with a C parameter. C parameter, and this will just use the config file and start tordb with this config file. But for this demo, I'm gonna follow these conventions. You're documented on the website. But basically, the conventions are that you have a postgres database running on localhost, that there is a database called tord, that it's owned by a user called tordb, where the password has been set on a file called toro pass, which is exactly like pgpass. And it will connect to MongoDB that it's on localhost and will have a replica set called rs1. If you follow all those conventions, then you don't need a config file. So, these conventions are already here in this case. So, there is MongoDB here running with a replica set called rs1. So, that's in place already. And let's create a database I was referring to. So, let's create a database. This is also to show you that it's gonna be a purely empty database. So, let's create a database called tordb with the owned by user tordb. Sorry, this is gonna be tordb. And there is this toro pass file on my home directory, which contains the entries to connect to this server. So, with this all, it should be able to start tordb without any config file when I'll use a screen for this. So, let's just run it with our arguments. And if all goes well, it should start. So, here's the log info. We can see that it's already replicating some data that it's available in MongoDB from all the other talks that I've been doing in the past. And you see that right now, it's already on secondary mode. So, it did the initial sync first. So, replicated all the data that was available. And then once that was done, it started recovery mode, which is basically replicating the stream. So, let's look at this. There's probably, if we connect to database tordb, there's gonna be several schemas here. So, about the data sets that I've been using in the past for all the conferences, right? So, these are the schemas. And there's gonna be some data on those like that I have been replicated from MongoDB. But before showing that, let's do something else. Let's import some data into MongoDB and see how it would be replicated to tordb live. And we'll look at how it's generated. So, I have a sample collection of small documents here just to show that it works in real time without spending so much time here. So, let's do a Mongo import. Let me just, yeah. So, let's do a Mongo import because again, I'm gonna import data into MongoDB. I'm gonna create a database called pg.com.us. I'm gonna create a collection called bookmarks. I'm gonna be using a data set from deletious which is a set of bookmarks from deletious data set. And files is here, data sets, deletious, 10k. This is of course very small but it will work for this purpose. So, MongoDB has imported this. Let's look into postures right now. So, if we query for the schema, there is, yeah, pg.com.us, there is there, right? So, there's a schema called pg.com.us. It wasn't existing before. So, let's look at this. So, if we look at the pg.com.us, yes, so there's definitely some tables there. They were not existing before. And let's look at how this looks like. For example, pg.com.us bookmarks which is the root table. Yeah, there's 10,000 documents there. They were important to MongoDB and replicated real-time or almost real-time to Postgres. Typically it takes one or two seconds to replicate this number of documents. So, it's fast enough. And now let's look at how they look like. So, this is how it looks like here because the font is very big. I'm presenting this in this column or format. But you can see that everything looks like much more ordered than it was before. I have maybe a nicer screenshot here. This is basically what the data looks like in MongoDB. Makes some sense of it. And this is how it looks like in Android B. Exactly the same tables I was showing up right now with a small font. So, yeah, this is more or less the same thing we have here. And we can create all the information here. Even if this is not enough, we can use tools like Schema Spy to generate a graphical representation of the tables. Let me just do that very quickly. I may have some, no, I don't. Well, I'll just show the results. This is how it looks like if we just generate graphical representation of this with a tool called Schema Spy. And it tells us all the table structure, all the fields, all the types, the number of rows. This is with a bigger database. And you know how data looks like. So, we've been able to just even just by using this tool, we've been able to infer the structure of our original MongoDB data. So, this is it. I think there's five minutes more. So, public questions? Okay, so the question is, why don't we use JSONB? So, if we go back to the beginning of the talk, where I was explaining that the problem with aggregate queries is that they have to scan all the data all the time to find whether the key accessed, whether it's of a given type, whether it matches the query criteria. The same happens, the same thing that happens to MongoDB would happen with JSONB or with JSON, pure JSON. You still need to parse everything. So, it's just a matter of, needless to say JSONB is way better than MongoDB. But the problem remains the same. It is unstructured. What I'm showing here is the power of structuring data for aggregate queries, for analytical queries, versus having unstructured data. So, I mean, if there's even on a very nested part on a new field, yes, we will do an alter table at column, but our operation is almost free. No, because all the columns are always nullable by design. So, there's no need to rewrite a table. It's always just an adding to the catalog. Yeah, yeah, but we never execute those. So, you basically need to rewrite a table if you set a default value, or you set a column not nullable, or change type. We're not doing any of those. We just, so first of all, all the metadata, all the DDL that we do is, let's say append only. So, it's immutable. We never change anything that we have created, and we only create new stuff, and that stuff was always nullable so that that will never happen. Yeah, so if you remember, if you have what we call conflicting types, you'll have column names with the first prefix be the fill type, and then the data type. So, yeah, you may end up with three or four columns referring to the same field in general, but with different data types. Yes, yes, they're separate. You want to join them? Just create a view and coalesce them. They will, out of those, we have the same name, the same prefix. Only one will be non-null at a given time. So, that's an invariant you can use to perform the queries. Yeah, so, since we are infiring the schema and placing it in just the metadata tables only once, the amount of space used compared to a pure MongoDB solution is significantly lower. Now, if you use MongoDB with compression enabled, which of course has its own trade-offs in terms of speed, the space on this used is typically now a little bit lower in MongoDB because their compression is pretty good, but this is only on disk because in memory, MongoDB still has everything uncompressed. So, in memory, the memory consumption of MongoDB is significantly higher than this. And if you don't use compression, it is significantly lower disk space. Good, thank you very much.