 My name is David Becker and I'm presenting this session about why the database drivers have been moved to their own modules. My name is David Becker. I work for Finalist and we are a Drupal agency based in the Netherlands. For Drupal Core I'm the subsystem maintainer for the database API and I have over 500 core commit credits. So why didn't we do it? I need to stand over here. So why did we do it? Why is, what is now possible? First, if you have a contributed database driver, life changed for that a lot. First, in the past you had the copy, you had the first run composer install and after that was done you had to copy the database drivers by hand to the Drupal root drivers directly and that's not very composer friendly. Also we added the possibility to extend one database driver on another. That sounds very abstract but we're going to make it possible to use another PHP extension for which we make another, which we use in another database driver that will extend the base my SQL database driver and that will make it possible to do parallel database queries instead of what we now have in sequence. And when you have a page with multiple blocks on it, the queries for those blocks can run in parallel instead of a sequence. That will speed up that page a lot. And last that's for me very important is I want to make the database driver for MongoDB and use its document storage instead of the relational database that we are using now. Why did they do that? I want to fix a couple of things in Drupal. Normally Drupal is slow and Drupal doesn't scale. Why is Drupal slow? Drupal is slow because getting data out of the database is slow. It is stored in such a way that it takes a lot of time to get it out of the database. And it Drupal doesn't scale because the database is the limited factor and a relational database is just a single server. It doesn't have horizontal scalability. How does a Drupal on a relational database like we all use now storage data? It's divided. Every entity instance is stored in a lot of different tables and when you query with it, it collects from all those different tables a piece of data, put it together and then you have your entity instance like no TID4 or in this case with a user. It queries the base table, it queries the translation table, the revisions table and when you use the first name and the last name and LinkedIn letters and a job title, it's all combined together in one big join query. The more data you have in your database, the slower it gets. And the more fields you add to an entity, the slower it gets. So you can also store it a different way by using JSON objects. And if you use a JSON object, you can store all entity data for a single entity like no TID4, a user tool in a single object. Everything you store in a JSON object, like an array in a range, all user data, the first name, the last name, it's all stored in one JSON object. And if you can store it like this, it's easy to get it out of the database because it is a single row from a single table. It's always faster than when you have to do a complicated join query over multiple tables. If you have a JSON storage running the code like this, it's a single row from a single table. And with a relational database, you're going to query a lot of tables, do a big join query, and you get your result. Because it's stored in the JSON field, all data for a single entity is stored in a JSON document. You can also do a lot of different queries that all are just a single row from a single table, like get the latest revisions for no TID4. For no TID4, get the 35th revision. That sounds easy, but if you try to do that with an SQL query, it's very, very difficult. You can also load all revisions. If it's stored in a JSON field, you just collect a complete JSON field and get all the revisions out of it. It's a single row from a single table. Or get, no, this is also a single row from a single table if you want to have translations for the Dutch language. Or get all translations for no TID4. Or get, for no TID4, the 35th revision, the Dutch translation. Or get all translations for the latest revision. Or get all translations and all revisions for when you have that many nodes in your table. It's still a single row from a single table. It's always fast, independent of how much data is in your database. Also, it doesn't matter how many fields you add to your node, it's all stored in a single JSON document. So if you have 100 or 200 or 1,000 fields with a node, which is not possible in a relational database, because of course you get a query with too many joins in the database, we'll give an error. When you store it in the JSON field, it's not a problem. And this is the way it's stored in MongoDB. It's called document storage. But what is a node? We are used to load a node that you get the current revision. But if you store it in a JSON document, you can also say a node is all revisions and all translations and all field data. That's a node, not just the latest revision. And when you have that, then you can, with an easy method call, get just the latest revision. Then you're going back to what you had with a relational database. Or you can get the latest revision of the current revision minus 1. The revision before the current revision. Also this is extremely difficult to query in a relational database. That's why we don't do it. But sometimes it's really handy. You can also get a revision count. How many revisions are there for your node? It's also very difficult to get out of a relational database. Or the number of revisions between the current revision and the latest revision. So why don't we use JSON storage with MySQL MariaDB Postgres? It's simple and it's a bit sad. As an example why it failed, I made this example. It's a custom entity with no revisions and no translations. And one field, and that field as in cardinality of two. On the right you see a very simple entity query for my entity and with the single condition on it, searching for BBB. Only when you run this query, it does not only search for that field value, it also search for if the field is not deleted. And if you see that the value BBB has been deleted. So this instance should not be returned when you run this entity query. And yet when you run it on MySQL MariaDB Postgres, it will because it has the value, my value BBB. And it has on the other field value has deleted zero. It's a very simple example, and yet this is for the back core support database, it doesn't work. With entity query, which we can do far more difficult things. And yet this simple thing is already too much. Yeah, I forgot the exit check. Happens to me a lot. So why MongoDB? MongoDB is all about JSON storage. It's what it's good at. It's the only thing it does, and it does it fantastically well. And it fits perfectly with Drupal's entity system. You can, every entity we have in Drupal, like now it's users, everything, it fits in the document storage system for MongoDB. MongoDB is a no SQL database. It's not a relational database, it's a no SQL database. And therefore, it has horizontal scalability. It's not a single server database. You can mirror and chart tables in different physical servers. And together, they're the complete database, sorry. This allows us to make it not stuck on a single database server, but because it has many machines, you can spread out your database load on many machines. Like for instance, Apple has a database with 100,000 machines. It's really, really big. Not saying that we should do that with Drupal, but the special thing for MongoDB that no other no SQL database has is an aggregation pipeline. That allows us, a normal no SQL server only allows very simple queries. That's why it's fast. Only MongoDB has added a special query pipeline that allows you to do very complicated queries. It allows you to do the same complicated queries as you can do with SQL. And the other thing it does, it is also very good at searching very deeply in JSON documents in a way that no other database can. And it has a nice functionality to do everything that Drupal needs it to do. It also has a special storage for storing uploaded files for users, so storing user uploaded files. Now we store them on the web server with Github S. You can store it in the database. And it can replicate and mirror itself to any machine anywhere else in the database. And that doesn't have to be on the same physical location. If you have one server here and the other in Tokyo, it mirror itself in the background. It doesn't do, the database does it. You don't have to do anything yourself. And because you store the user uploaded files in the database, it doesn't have to be stored on the web server. And that changes the web server from having persistent data that needs to be r-synced if there are multiple web servers to one that doesn't have persistent data. You can just switch it on, switch it off at web servers as easy and just as much as you want without having to r-sync all the user uploaded files. In the past, another user, 10 years ago, tried to do the same thing as I did to make Drupal run on MongoDB and he failed. And a lot of people think, because he failed back then, it's not possible to run Drupal on MongoDB. And yet it is. I've created a proof of concept site that runs Drupal on MongoDB. And I looked in the code and it took all the Drupal of all the MongoDB operators from the accuracy query pipeline for the complicated queries. And those are the ones I found that were already in MongoDB when the other user tried it. And these are the ones I used that were added to MongoDB after he was finished and failed. There are so many that it's not really fair. You couldn't do it. It was 10 years ago, MongoDB wasn't ready to support. Had it didn't have the functionality that Drupal needed to make it work. So for him, it was mission impossible. And by the way, his name wasn't Tom Cruise. Sorry. I don't think I allowed to say, but if you come some outside, I will tell you he's, by the way, a far better programmer than I am. Do we have to do when we use document storage when you develop a site, program a site, you have to use MongoDB. But there is an option to not use MongoDB in production. To use another NoSQL database like Asana. Again, we have the same simple entity query. And with MongoDB, it uses the aggregation pipeline and for the complicated query. But you can also fix it in another way. You can create a materialized view. And it's just like you use views in Drupal, only then the result is stored in the database as a table. And the table can then be queried with a simple query. And that is something another NoSQL database can also do. How can we, how else can we speed up Drupal? And when we change it to a document storage, getting data out of the database will be quick. Whereas something else is also slow. And that is when you load Drupal. When a user request comes in, the auto loader loads all the files. Then it starts a Drupal kernel. And only then it starts with the user request. It handles user request, sends a response back and it's finished. This is by the way the index.php file in the Drupal root directory. If we can use another tool called Swool. And that's what it does is it does the auto loading. It loads the Drupal kernel and then sits on wait for a user request to come in. And then it starts immediately after the user request comes in. It handles the user request and sends a response back. And it doesn't have to do the auto loading and starting the Drupal kernel after the user request comes in and does it before it. So that part of handling the Drupal with a user request it doesn't have to do that. It skips it and makes handling Drupal faster. And the tool is called Swool. It's a bit like Node.js and it's also multi-twitted. I didn't invent it, I didn't made it and I borrowed the ID from Laravel Octane who uses exactly the same solution. We cannot only preload the Drupal kernel into Mary. We can also preload all other kinds of data from the database into memory so that it's ready to be used for memory and not from the database the moment the user request comes in. It can also run database queries in parallel. I've made a proof of concept site for running Drupal on MongoDB. You can try it over here. It's not ready for production. It's just to show that you can do it. And on the left side you have an overview of how user one is stored in MongoDB in a document storage way. Does anybody have any questions? One question about, you mentioned something like, you mentioned that you could load everything about a entity. And my immediately thought was, would that not make it extremely memory intensive in most cases? Would it not make it really memory intensive in most cases if you load everything like the common use cases that you just load? What do you need to display them? It depends on how you load it. If you say that you only want the current proficient like we do now, then it comes in. You take the first, the current proficient and drop everything else. Yeah, so you load the size of what you don't need though. If you only want the current proficient, take that, and drop everything else. I know this is prototype and early in the effort, but can you imagine a way to make this work? Like so you both have MongoDB and an ordinary database because there's a lot of modules that might not be perfectly entity-true. A lot of modules will need to be made compatible, yes. But this is MongoDB only storage. There's no relational database needed. No, but what I was thinking was, is that possibility, could you imagine a possibility where you can have them co-existing? I would rather not, but technically it would be possible like everything else is possible and doable. Even things you don't want to do. Anybody else? Then let's go to the closing session downstairs.