 All right, so let's get a start. I'm sorry for this initial delay with usual stuff on presentation sometimes. So I'm very happy to see you all here. What I'm going to be talking about today is an open-source solution that we have built. It's called TorDB. And basically, it presents a way of running no SQL loads on top of a relational database. So this is something that can be used if you're, for instance, you're a DBA or you're running a relational database and you would like to offer to your users some no SQL interfaces. So it's more than just having an instructor data. This is something that's already possible on many relational databases, like especially Postgres. It goes far than that. So I'd like to talk about this issue. A little bit about myself. Before I work for a company called 8kData, it's a research and development company on database, the database area. So we try to come up with crazy stuff and this is one of those crazy ideas which we came up with. It happens to be right now our main line of development. So this is our core product right now. I also work as a programmer. So I'm both kind of a DBA programmer and I'm based in Spain, Madrid, where we also founded the Postgres Spanish community which has become a really large Postgres community. We're very proud of that, more than 500 people currently. And this is a little bit about myself. You just wanna connect with me, find me on Twitter or LinkedIn or any place. If you have any question after this talk, I'll be around but just in case you wanna write me, you know, this already coordinates. So let's talk a little bit about this problem that you may face. The world has definitely changed a lot in the last five to 10 years. A lot of things have changed and databases are no special. I mean, in general terms, technology has changed in so many areas. I'm not gonna talk about that. But in the database field, it's been kind of a revolution in the last five to 10 years. There's been like 40 years where the relational technology, relational theory has been developed and advancing and advancing it's has become probably better and better and better, but it was kind of stable. The players were more or less the same, the databases were more or less the same, the market was more or less the same. But now in the last years, everything has changed. Many databases have started to appear. Many new technologies have been introduced. I don't know if you can see, can you see this picture? I don't know if you can see this picture. Me neither. I'm quite close. I can read this picture. I don't know if we can turn off some lights maybe or something. Or maybe those are you who are in the back. Maybe want to move to the front a little bit. I would say that's not bad. You can sleep if so you want. Now, yeah, probably not. Anyway, feel free to move to the front if you want. I think it's gonna be better. You do some extra exercise too. Well, so, well this picture, this picture is just an event and in 2005, there were no lights, there were no pictures taken by by phones and tablets as in 2013 where everybody was with their own phone and tablet. Things have changed lately. So also in the database as I was saying. So let's imagine you were a happy DBA. You were running your stuff. You were running your users. You were even happier if you were using foscrash, right? You were taking some time and you were buffing your users when needed. And so it was a happy life, right? It was easy. The only problem that you have to deal with is basically those programmers who haven't met before Mr. Bobby Tables, right? So it was a happy life. Now what happened? That at some point, no sequel came into place and people start yelling at you, hey, we want no sequel. Hey, we want MongoDB. Hey, my app is web scale. And problem started. Our peaceful life was kind of ruined, right? And now we need to know we start defending like, you know, this is kind of crap, you know, durability concerns, transaction, you know, a lot of explaining. So rather than doing all this, why can't we say to your users, okay, do you want Mongo? That's okay. You want no sequel? Sure. You want no web scale? Whatever that is, sure, you can have it. And this is the problem that we are trying to deliver today. And the way to do this is not by, you know, installing MongoDB or Cassandra or any other no sequel database, because that's a huge problem. It means if you need to install a new stack, that means that you need probably new servers, you need new support contracts, you need, you know, probably people trained and certified, you need new backup procedures, new security measures, new firewall rules, new network rules. And of course, once you have done that, you still have a lot of problems. Like how do you synchronize data from one database to the other one? How do you make that consistent? How do you make sure if one user or one application is querying when data source, that the other one is gonna be up to date too? How do you move data between those? How do you ensure consistency? So this is a huge problem. So basically answering this question, satisfying these needs, these requests to have no sequel on your data center, probably you have experienced that, it's a problem. So fear no more. Here comes DoraDB. And DoraDB is basically, you can think of very simply as Postgres plus Mongo on the same place. So no new servers, no new stacks, no new certifications, no new, but even backup procedures. You have all that on the same place and it's open source. So let's put it very simply. What is DoraDB? DoraDB, it's a document database. It's a no sequel database. It's a JSON database if you wanna call it that way. But happens to run on top of a relational database, data store, which is Postgres, and it's open source. And the most important thing is that it is compatible with MongoDB at the wire protocol level. And of course, that means also the API. This basically means that as long as we speak the protocol that MongoDB speaks, there's no need to have, you know, some different drivers, some different tools, the same tools, the same drivers, the same programs that run on MongoDB. It's a little bit of a at risk because of, you know, compatibility, but basically all those work the same in DoraDB, it just happened to have Postgres behind. So, first question. Since MongoDB is non-extractor, this hierarchical data store, and Postgres is relational, how do you map that? How do we store the information into Postgres? Well, this is probably one of the most important aspects of what we've done with DoraDB. This, how do we store the information? And we don't just take the information and store it as a blob on Postgres. We could have done that. But we saw that there was a huge improvement, a lot of benefits of not doing so, and rather transforming in some way the data from an unstructured way into a relational way. So, how do we do that? We take a JSON document, oh, you're not gonna see this. So you take a JSON document, and we split that document into pieces, where each piece contains at most one level of nested documents. Because you see, you know that JSON documents can be nested, can be all the documents within those. So we split them into pieces in such a way that there's no nested documents inside a single piece of this JSON field. We call that sub-document, not very original name, but anyway. Now, we analyze the type of a sub-document, because MongoDB or any NoSQL data store, documents in general, they don't have a schema, right? But they have types. Indeed, the MongoDB internal format, the BISN, it's typed. It specifies for each key what is the type of the value associated with that key. So we analyze the set of types that are within that sub-document, and that, if you think about it, pretty much looks like a table. So we have an attribute name, an attribute type, and a value. And indeed, that's a table for us. So we take each of these sub-documents, find the candidate table which has the same attribute names and types, and store it there. Store, just store, yes, the values. What happens if we don't have such a table? Well, we create, we will create the table dynamically. So you don't have to create a schema beforehand. You don't have to do basically anything. You just use an empty database, connect to RGB, start inserting documents with MongoDB API, MongoDB tools, MongoDB programs, and then, 4DB will take care of creating any needed tables, splitting the documents into these pieces, and basically storing the information there. Let's look at an example, because it's gonna be easier. Can you read this on the back? All right. So this is a sample JSON document where it has like some levels of nested documents. So there is one field called name. I don't know if you can see the pointer, anyway, for those who can. There's a field name, and a field data, and a field method. These three form the root level of the document. So this is gonna be one of the pieces in which we will split this document. Then there's another one here in horizontal, called A with keys A and B. A, 4, 2, B, hello world. This is another level with no nested elements. So this will be another part. Then there's another one with a J and deeper, which is inside nested, which is another level. It's gonna be another sub-document for us. And finally, this one, A and B. So basically, the first step that we do is that we take this JSON document and split it into four pieces like this. The root level, which contains name, and then both data and nested, which are gonna be kind of placeholders. They're gonna contain the nested distractors. Then we have A and B, we have J and deeper, which is another placeholder to basically say that there's a nested distractor there, and the other A and B. Now, we find candidate tables for restoring this information, which match this column name and the data type. This is gonna be an integer here, or a number, and the column B and a text field. If the table is not existing, we will create that table automatically for you, no problem. The placeholders, we don't need to store them, because that's basically a pointer. We're gonna keep that information in another place. So this is how it looks like once we store this information into TorDB. We use some extra fields called DID and index, which are not really important, but it's basically a way for us to pull back all the documents together. And then here's the information. MongoDB has an internal underscore ID field, which is a 12 byte integer, 12 byte array, sorry, which is hidden, but it's actually there, so it also gets stored. So this is the root level, where if you remember, we have the name TorDB. As I mentioned, the placeholders need not to be stored. So basically, I just take the name and TorDB on the root table. This is the data table, it's called P underscore three, it's been created automatically by TorDB, you don't need to do anything. Then we have the AMB 42 and Hello World, and notice that this table is gonna be reused, because it has the same column names, the same data types, so both sub-document will be stored on the same table as we have here. And finally, the J42. So this is what we call the data tables. They contain the data that MongoDB document had before. Now notice a couple of things here. This is just one document, if we store like many documents like this one, here in the JSON document, in MongoDB, all this A and B, A and B, and the data types associated are repeated all the time. That's meta information that is attached to these documents. And we are repeating that only once, it's gonna be on the table definition. So this means that the way of doing this, first advantage, we are saving a lot of metadata, we are avoiding a lot of metadata repetition. Now, these are the data tables, but how do we pull back together the document to form the original JSON document that we had? Well, there's an extra table called Structures, and this table, which is very important for us, basically is a tiny JSON document, small JSON document, which basically resembles the structure of the original document. It tells us where is the root level, what are the nested documents, and some other extra information. So basically, this structure says that the root information is gonna be on table number three. To go back to table number three, we see that this is effectively the root level. Then it says that the root level contains another field called data, which is a nested document, which is stored on table number one. So data is gonna be on table number one, we go to table number one, and this is the data. This means null, and there was no index here, which means that this is the data. And then there's a nested field which is stored on table number two. If we go back to table number two, this is what was inside the nested. And finally, nested contain a deeper element stored on table number one with index one. And if we go back, we see on table number one with index one, that this is the other field that was there. So with these data tables and this structure table, we are able to reconstruct the whole document back again to where it belongs in this unstructured JSON format. The good thing is that, again, we were saving before some metadata because it's repeated many times within a normal JSON collection. Now, this structure also happens to be repeated by itself. Documents in a large collection, most of them have more or less the same structure, the same shape, because they are alike. Otherwise, it would be really difficult to consume all the information in a NoSQL database. NoSQL means that you can change the schema, but you're not gonna change it per document all the time because that would be crazy. How would you query that? So a lot of structures are repeated. And hey, in relational theory, we have, well, not in relational theory, but in relational databases, we have foreign keys, right? And that means we establish a relation. So we also establish a relation on this final table, which is root table, which creates an association between documents and structures. So we have many documents which have the same structure. We will not need extra entries here in the table of structure. We'll just point to that structure. That's why we have a field called SID, which is this tractor ID. If you think about it more visually, this is how more or less data is stored on a NoSQL database, basically any NoSQL database stores data like this. It looks a little bit like a mess, doesn't it? Well, it is, because documents are just stored one after the other one. It doesn't matter what information they contain. It doesn't matter how many fields or nets of documents they have. They're just stored one after the other. Sure, there are indexes, but for any reason you cannot use an index. Then you're basically screwed up because you have to do a whole database or a whole collection scan to find your information because information is stored like this. After the process that we do in TorDB, the information is more or less stored like this. Just intuitively, you can imagine that it should be better because we classify documents by their shape, to be more precise, of their sub-document. We have analyzed exactly the data types that they have, the attributes, the columns that they have, and we have put them in separate bins. Now, this is what we call also partitioning by type because we're taking the data and we're classifying it depending on the type of the data and putting that on separate tables. That's partitioning, right? So queries will be more efficient if we are targeting a given set or subsets of document types because we will only be looking at those partitions which refer to that data type, data structure. Now, this is a very typical question. In Postgres, there's JSONB. It was introduced in Postgres 9.4. It was significantly improved on Postgres 9.5. So why don't you use Postgres, JSONB type, data type, this is data type for storing JSON. You know, you use JSONB, right? Well, I guess that's a yes, anyway. So there are many reasons not to use it. I mean, don't get me wrong, it's absolutely cool. I love it, I love JSONB. And it's one of the most important features that came with Postgres, but it's not enough for our purposes because our goals are, first of all, we want to get the data normalized. We wanted to get the data partitioned by type as just shown you, but JSONB will do the same thing as NoSQL does. Store every document one after the other. Doesn't matter what type with structures they have, they will store it in the same way. It, of course, does not provide a NoSQL API. So the problem that a lot of people are facing in the relational world is that they are asked to provide a NoSQL API. So basically, they have to install Mongo. JSONB will let you store an instructor data, but will not run a MongoDB program. TorDB will allow you to run a MongoDB program on top of your Postgres database. Plus, it is, again, it's not compatible with MongoDB and it can also not replicate or chart as MongoDB does or NoSQL does. Thing is that scaling, or isn't all scaling in Postgres and relational databases is too hard because the set of use cases that they represent is very wide and doing a general solution for such a wide use case is very difficult. Now, NoSQL is using a smaller subset of that use case which happens to be easier to scale. It's not freelance, definitely. It has a lot of problems, but for whatever reason, people like it and use it and expect it. So I'll talk about that a little bit later, but TorDB implements MongoDB replication protocol too. So it can replicate from a MongoDB. So you can participate in a MongoDB cluster. That will not be provided by JSONB. And finally, JSONB is tied to Postgres. So far, TorDB runs on Postgres, but it will run on all the relational backends soon. So we want to make it run in different backends too. I already mentioned this. There's a lot of metadata repetition. If you look at the given collection, some documents have the same type, the same shape, and so this creates an overhead. If you use JSONB or NoSQL in general terms, you'll get a lot of this repetition, and this costs you disk space, IO, memory, insert buffers, and so on. So this brings us to what advantages does TorDB have over MongoDB? And the first one is related to this metadata repetition. As long as we are classifying the data and repeating the definition of those attributes and types at the table definition level, which means only one place, rather than every single document, we're saving a lot of disk space. And well, honestly, disk space is not very expensive today, but IO is. And you can trade today disk space and price by IO. So that means that basically if you're using less disk space, you can either pay less or you can pay the same and get a faster IO disk. And then of course you're using less memory. So this is really significant improvement. And basically if you look at the numbers here, where we compare MongoDB with TorDB, basically TorDB just requires from 30% to 68% of the disk space that MongoDB requires. And this is not with compression. That's why we're using MongoDB here without compression. This is before compression. This is not compression. This is just saving metadata, avoiding to repeat extra metadata that is repeated all the time on any NoSQL data store. All right, so this is the first advantage. Basically IO. The second advantage. It is surprising that NoSQL is trying to get back to SQL. That's why it's a badly chosen name, right? The problem is that either as much as they are trying to get back to SQL, they're doing a poor job of going back to SQL, which is even more shameful. First of all, because what they call SQL, it's not SQL, it's SQL-ish. Looks like SQL. Like if you look at Couchbase, Nickel and one QL, it looks like a lot like SQL. So developers are good for it, but it's not SQL. So tools are not that happy. Tools basically don't work. SQL tools don't work. BI tools don't work. GUI tools don't work because it's not SQL. So it's good for the developers. Definitely not good for tools. And then even if they were absolutely compatible with SQL, that SQL is just a tiny subset of SQL. It's the basic select where join, you know, offset or something like that, but doesn't go further than that. And we know that SQL is way more than that, right? Especially Postgres SQL is so advanced. So as a friend of mine says, hopefully you're not stuck on Windows 3.11 days. And Windows 3.11 is when SQL 92 standard came up. And so this no SQL database, they are trying to get this SQL-ish language. They are trying to make it compatible with the subset of what SQL was in year 92, which means basically you're not getting a lot of power. But Postgres has one of the most advanced SQL, most compatible SQLs available on relational databases. And guess what? It's free in our case. I mean, of course, Postgres too. But I mean, it comes for free. As long as we're using Postgres as the relational database back end, you get all these powerful SQL. So even though we support MongoDB API and you can insert data with the MongoDB API, MongoDB programs, drivers, whatever, that's good. But then if you want to query in a more special way, if you want to use the power of SQL for querying data, sure, just go to the database. Don't go through our layer per se. Just go to the database and run SQL. Tables look a little bit weird, might be. There are solutions for that I'll present to you later. But anyway, I mean, it's SQL, it's tables. Just go and do it. And it's no SQL-ish and no subset. It's pure SQL. So that's why some time ago, we introduced what we called TorDB views. And these are machine-generated. I mean, TorDB generates this, creates some views which kind of try to pull back together all those tables in which documents were split by TorDB in the process of insertion, and pull them back together so that they look like an entity that is easier to query. This views, so, well, basically, if we create something like this, which by means of varying fields, it will create different tables in TorDB. So these two documents will require four tables in total. And if we look at the views created, it all looks like it's been put together. So it's very convenient for basically doing queries. And in order to create these views, you just need to issue MongoDB command, which is called createViewPath that we invented. So it's not, you won't find that in MongoDB, but you will find that on TorDB. You can create that from your MongoDB program, and then you can use these queries. We also created a new MongoDB program. You can run from the console or from the Java driver or whatever, which is called SQLSelect. And you can just put a SQLSelect in that, and it will go to the database, issue a SQLSelect for you, and return the data as in MongoDB as a JSON document. So you have the full power of SQL at your hands with TorDB. You can even use tools. I mean, tools are compatible because again, it's SQL. It's not SQL-ish. Then we have what we call Toro query by structure. If you remember, we are partitioning the documents by the type that they have. So this partitioning, if you issue a query to TorDB, and that query cannot use an index, then in MongoDB, you'll need to scan the whole database. Rather here, we will look at the structure of the documents that fulfill your query and only scan those tables. So you basically can get a huge performance improvement. It can be as high as the inverse of the likelihood of your document. Anyway, you know, varies from query to query, but it can be really, really, really high. Even negative queries, which are queries that return zero results, they can be resolved sometimes, but just by looking at the structure. So, and the structures are in TorDB, they're a cache to memory. So we could probably resolve negative queries just from memory compared to a whole database that's scanning MongoDB, which is of course almost infinitely faster as much as we want. There's of course the possibility that you can mix and match relational and no SQL data on your same database. Some people requested this and basically there's nothing we can help with. I mean, you can do it. You don't need any kind of support. Just put TorDB and your relational data on the same database. Probably on different schemas, just to make things clear. And don't write to TorDB tables on your own. You could be, you know, just in case you don't screw it up. But other than that, you know, you can have your own relational tables and the Tor automatically generated data tables and then query both to joins between your relational and no SQL data, no problem. Then there is atomic operations. So MongoDB doesn't have atomic operations. Well, a single document is called to be atomic because it's held by an exclusive lock. So document operations are atomic. But if you do a batch operation, if you try to insert 10 documents at once, it's not atomic. In MongoDB, basically, you have to iterate through all the documents to check whether they inserted correctly. If one fails, you have to probably, and you want that operation to be atomic, you'll need to delete the other documents that really got inserted. But then you'll need to check the results of that deletion because it may have failed too, on some of them only. And then you'll need to kind of reverse that operation too. And it's a chicken, an egg, endless problem, unless everything works well. It's of course not all the time happens. So, I mean, we know the value of atomic operations. In MongoDB, there's no atomic batch operations. In ToroDB, it was hard for us not to support atomic operations. So, you know, it's everything runs on transactions and transactions are atomic units. So, we get them for free. There's another point. MongoDB, there's a huge debate whether MongoDB supports clean reads or not. Well, the reality is it doesn't support clean reads. A clean reads means a read on a consistent view of the database, on the data. I'm not gonna get into the details, unless if you want to, just let me know, raise your hand. But basically, MongoDB runs on read and committed, which means that any new data can pop up in the middle of your queries. And you'll, sorry, is that okay? Yeah. And you'll lose, you'll see new documents popping up in the middle of your query. You can see a document twice on a given result. So, and well, of course, Postgres, it took us just two lines of code, you can see at the bottom, two lines of code to implement clean reads. We basically said, hey, query transactions are gonna run on repeatable read mode in read-only mode. That's it. Clean reads for free. I mean, not for free, thanks to Postgres. It's a great database. This is very fun. You know the MongoDB 3.2, which was released recently, and now support for a connector to connect to BI tools. That's why I said they are trying to come back to SQL because they realized that SQL-based BI tools are better than the non-existing BI tools in MongoDB, or no SQL for that matter. So, they announced this connector. Do you know how it works internally? This proprietary, by the way, connector is not available on the MongoDB open source version. It's only on the enterprise version. But do you know which is the critical piece that makes that connector work? Yeah, it works in Postgres, really. Yes. So, behind the scenes, this connector uses Postgres for in-data wrappers to convert from MongoDB to relational tables. Guess what? That's called TorDB. And performance, well, I can't speak because I have signed a license agreement which basically prohibits me from talking about that performance. But it is dot, dot, dot. Really. I mean, just imagine, there's no pushdown, a lot of pushdown support on Postgres for in-data wrappers. So, well, when for in-data wrappers in Postgres are gonna get better, MongoDB BI connector is gonna be better. Anyway, so they need this connector, it's proprietary, it's a slow, oh, I said that. Well, whatever. And, you know, it requires Postgres. Why don't we use TorDB? There's nothing else to add. I mean, MongoDB, the Mongo BI connector, sorry, the BI connector is not needed, it's Postgres. A lot of tools, most of the tools, BI tools work with Postgres already. So, there's nothing else to do. All right. So, what about performance? Because after all, we're doing a lot of stuff. We're receiving the document, speaking the Mongo protocol, which is not native for us. We're transforming, we're going for tables, we're checking those table axes, we're creating tables, we're, you know, splitting the data out. So, I mean, there's no free lunch, there are no miracles in the world. I think there are a lot of advantages that probably outweigh the sum. Of course, there's advantages that TorDB has. Now, I don't wanna play the benchmarking game because I hate it. So, let me be quite upfront. MongoDB, when you look at MongoDB benchmarks, they don't benchmark what you're going to be using in production. MongoDB can have, like, many tunables, and the performance varies greatly from one tunable to the other one. Most benchmarks that you see is running MongoDB in a completely unsafe way, which you will never want to run that way. You'll lose data, for sure. The database may be coming consistent. So, don't look at those numbers. Look at the numbers who run MongoDB on what is called safe mode. Imagine what would happen if you don't use the safe mode. With journaling enabled, I don't see any scenario, unless, you know, for some very particular use cases where you don't want journaling on your database, and with replication enabled. Because, basically, if you don't use MongoDB with replication, I mean, if you use MongoDB on only one node, that's not very interesting, you know, proposition. You better off using Postgres, or, yeah, of course, TorDB. So, when you enable all those things and then compare the performance with TorDB, honestly, with a very patched version of TorDB that was released, not published in GitHub yet, but, you know, released, like, six hours ago, the performance running I.I. Bench, which is a very standard benchmarking software for MongoDB results look like this. Higher is better. The line on top is Toro. Yeah, the reason, by the way, the reason why this line ends up here, that's the question, is because this is just inserting 300,000 documents, so we required less time. So, test is finished here. No. Oh, sorry. So, in other words, we are faster. This is pure insertion. So, if you wonder that, you know, all this stuff is gonna be really expensive, it's not. And the most important reason why we are faster is because Postgres is fucking fast. It's not us, I mean, we're not smart, it's Postgres. It's really, really fast. So, if you enable, if you disable journaling, or if you disable replication, MongoDB is gonna be faster. But if you use MongoDB in the same way you would use it in production, we are faster. That's simple. This is MongoDB 3.2 latest version, so this is not, you know, an old version, compared with ToroDB running on Postgres 9.5. It's significantly faster than Postgres on 9.4. All right. Let's quickly move over how we're doing with time. All right. Replication. I mentioned that before. I'm just gonna put it simple. We support replication. So, one very interesting use case for this solution is rather than replace MongoDB, which I'd be happy if you do it that way, but it's just replicate from MongoDB. So, let's say some of your users or your yourselves have a MongoDB replica set running already with live data probably used for OLTP. Now, you can set up ToroDB to just lose on the replication protocol, connect to the MongoDB primary nodes or any secondary nodes, which can also serve the replication data, and then have ToroDB start replicating all the data, live, like it's asynchronous, but it's usually fast, and you'll get another copy. And then you can run SQL on that. You can of course run the MongoDB API too, but you know, you can run SQL. So, this is a very nice way of doing very cheap and fast ETL from MongoDB to SQL. Just use Toro. This is, by the way, this is present. It's gonna be released as ToroDB 0.4. We're currently on 0.4 alpha one snapshot, but it's gonna come anyway. If you look at the devil repository, this is on GitHub, anything that's on GitHub, it's open source, AGPL. So, just look at the development branch and all the code's gonna be there, except for this latest benchmark. But you know, it's gonna be on Monday. So, what about charting? This is where everything becomes really interesting, because we want to implement charting on MongoDB, which means implement the MongoDB charting part of the protocol. This is not done yet. However, and we want to do this in such a way that ToroDB will work exactly as MongoDB on a charting environment. It will talk to the Mongo OS, which are kind of the coordinators on a charting environment, and we'll talk with the other MongoDB nodes to move data around, just to participate in normal charting environments. It's not a very difficult thing to do. We'll come next version. Now, what if we, apart from doing that, let's say again, we're gonna do that. What if we try some of the ways of charting information, charting data, within the relational world, and chart below ToroDB level? Well, we can do that. One option could be like PgChart, the charting extension that Citus data developed in social open source. We're also considering all the databases that are already good at charting at this level, like Green Plum or Redshift or Citus data too. So if we really pair this concept of charting at the database level, we have the concept of replicating from MongoDB replica set, what we're effectively building is a new way for node SQL users to perform data warehousing, data analytics. So basically, this is a technology that will enable data warehousing for, as I say, those poor souls in the node SQL world who are struggling to do this. If you try to do data warehousing in node SQL, you're basically out of game. It's so terribly slow. You won't believe it. It's basically unacceptable. That's why most node SQL users are using tools to ETL from MongoDB or whatever to Postgres or all the databases just for doing the analysis. So with TorGb, you could just replicate the data because it talks to speak the replication protocol, get the data to TorGb and then use a backend which is already charting at the database level. So we did some experiments and we tried Green Plum. It's a great data warehouse. It was presented here yesterday. So we did some benchmarks with that. Green Plum was open sourced late October last year and we kind of already hiked TorGb to work on top of Green Plum. It's not released yet, but we did some benchmarks. The goal was exactly this. To take TorGb, connect to a MongoDB replica set, replicate the information from this replica set to TorGb. TorGb will talk to Green Plum. Green Plum will chart the data across all the segments and then we run SQL. We'll run distributed SQL, we call. SQL that will get pushed down to all the charts, to all the segments in Green Plum terminology and get back to the results. And then we will compare how these results obtained with SQL will compare to those results run on MongoDB. Our rate hold is very slow so you can already get the results. So that'll be as surprising, but anyway. So we took some data set, the full Amazon review data set, which is a data set obtained by basically web scraping, a lot of all the reviews of all the products that are in Amazon. Run on Amazon Cloud, use C for Xlar, Xlar server, which is not a big machine. We want to show that, this is what you may have at home. We set up four charts inside the same host so that network traffic's not gonna slow us down. In MongoDB case, that requires three comping nodes. Comping nodes basically do no jobs, just metadata, so it doesn't add to the load any way they are required to run the cluster. So you have to hop them anyway. And four charts, and four segments in Green Plum. Same size, same memory, same everything. Data set contains 83 million records and 65 gigabyte plain JSON file. So, if we import this into both MongoDB and TorDB running on Green Plum and compare the disk space required to store this data set, the results are quite surprising. This is comparing Mongo 3.0 with WireTiger and Snappy compression enabled, which is kind of the, you know, in terms of disk space, it's the same thing as MongoDB 3.2. Just didn't repeat the test, but it's the same thing. And on Green Plum, we were using Culinary Store and Compression 2 to make this test fair. So I don't know if you can read the bars over there, but it's basically this big bar is Mongo, this is small bar is Green Plum. And this is the same thing. These are kind of more or less the same, the index size. The table size is significantly, significantly smaller from 20 gigabytes to 70. That's more or less the difference. So very significant. And why is all this? Because we are storing the metadata separately from the data and we are not repeating the metadata all the time. And when you take that into account and you store it in a columnar way and then you compress it, it compresses very, very well. Because we have aligned data, we have classified the data, we have put it nicely on their own bins and now we have compressed those bins. So it compressed very well. What about the benchmark itself? So if we take some queries, and we're comparing here, if you remember, what we're trying to show here is to compare a query on MongoDB API versus a query on SQL once those tables have been generated by TorDB and important via replication, right? So if we want to obtain like, you know, like the distinct reviewers of products, the query in SQL looks like this, simple. Query MongoDB looks like this. Weird. But it basically does the same thing. So this is query one. This is another example of the query. Please let me know which one you can read and understand easier. The one on the left or the one on the right. Especially nice is this MongoDB allow this use true. This is because MongoDB has a limit of 16 megabytes return on an aggregate query. So if you have an aggregate query on MongoDB and that aggregation produces a result larger than 16 megabytes, like you're doing a string concat or whatever, you're out of luck. You need to spill to disk if that's bigger than 16 megabyte. And it's really been automatic. You have to explicitly say, please use the desk. Well, so this is the benchmark. I don't know if the ones on the back can see these tiny, tiny, tiny bars over there. Yeah, that's TorDB and Green Pump. Basically, first query that we're simulating took 969 seconds to execute to complete on MongoDB and 35 seconds to complete on Green Pump. That's a 28X improvement. The second query took 1,007 seconds on MongoDB and 13 seconds on Green Pump to execute. That's a 75X improvement. The third query took 31 seconds on Green Pump to execute. We don't know yet how much did it take on MongoDB because it crashes consistently, very consistently. We don't know the error. We don't understand it. So, you know, this is the idea. This is a real enabler for data warehousing on NoSQL. So, this is basically, please go to GitHub, download the source, try it, read about it. If you like it, please start it on GitHub. And of course, go and check out our FAC. There's many questions. Probably some of those are gonna be asked now, but the rest of them are gonna be there. So, just go and check them out. And that's it. Questions? Thank you. That's a good point. So, the way we have laid out the source code, it's very modular and has some abstraction layers. And one of those abstraction layers is the incoming protocol. So, that's one layer. And so, we right now have one layer called MongoWP, which is MongoDB wire protocol. Then it goes down to another layer called KiviDocument, which is an abstraction of a document. And we transform from MongoDB representation to an abstract key value document store. And then we process and do all the sub-document stuff and so on. So, it's not hard to speak another protocol and transform to KiviDocument. And then the rest of the thing will be exactly the same. So, it's not hard. Now, this is prepared for, I don't think we'll run well with pure key value stores. It will run better with document stores. Next on our list is gonna be couch base. More questions? I don't know, the order is pop up. Okay, here. I hope nobody, I mean, this is still on development phase. So, we're gonna release mid-February version 0.4. And the version after that is gonna be 0.8. We're going powers of two. And yeah, the first version was 0.1 and then 0.2. So, this one is gonna be 0.4, then it will be 0.8. And very soon after, it was gonna be 1.0. So, it's not for production ready yet, but I hope it will. And there's many people just trying out non-in-production loads. More questions? Over there? Okay. So, this is basically very, very straightforward. Because first of all, in NoSQL, in Unstructured Data, there's no type change. If you remember from what I was saying before, the metadata which specifies the type is associated with the document itself, comes with the data. So, it's just a different type. It's not that it changed, it's a different type. So, the way we process this is that if you remember, we just analyze the sub-document and find a matching data type and value. If there is no table for that combination, we will create a new table. In other words, we will create a new table. Now, next question. Isn't that a problem? Are you going to be created thousands of tables? Well, sometimes yes. Sometimes yes, that may happen. If there is a lot of kind of combinatorial explosion of different keys and values and so on, which sometimes happen, we'll end up creating a lot of tables. But fortunately, this is not a big problem, neither. We have, we did a previous R&D project some three years ago in Postgres where we created a billion tables inside a Postgres database. And it worked quite well. And table creation was very fast. We were doing 12k, 12,000 tables per second creation. So, you know, it's not a big deal. More questions over there? I'm sorry, can you repeat it again? We're a little bit screwed up. We just cut that key. There's nothing else we can do. Yeah. Yeah, there's some restrictions with identifiers. But, you know, we escape some values and we cut keys. Not really, as I mentioned before, yeah, React is mostly key value. So, we are kind of currently, we're targeting more document data source than key value source. Because definitely, a key value source is a completely different list. Even though some key value source is stored documents, which document is a value, right? But the other way around, it's not on our roadmap. I won't say, can't say it can't be done. Maybe it's great too, but I don't know. More questions? Sure. All right, anyway, I'm gonna be around. I'm gonna be also on the Postgres booth next two hours or so. Feel free to pop in if you want to, and of course, if you have any question about Postgres. And just let me know if you try it. Let us know, give us feedback, and hopefully you like it. Thank you.