 Okej, hello, everyone and welcome. I think we'll get started. I think it's about time, so welcome again. I hope that you are enjoying the Drupal console far. Here today, we're going to talk about revisions everywhere and specifically in Drupal core. More exactly, what are we going to talk about? We're going to start with a problem statement and going to explain and lay out what I think is the current problem. We're going to look at how other systems approach this particular problem when it comes to revisioning. We're also going to look a little bit about some Drupal history, past discussions on this topic. And then we're going to look into what we can do about this, how we can approach this moving forward. We're also going to cover a very quick demo of how we approach this currently in Drupal 8 contrib. And then hopefully at the end or even during the presentation if you want, we can have a discussion about these things. So again, the goals to discuss some improvements to the entity API, discuss inclusion in core over a long term perhaps. And again, feel free to interrupt with any questions. I want this to be an interactive session. It's a core conversation, right? So if you have any questions, please step up to the mic and feel free to interrupt at any point. Okay, first, before we start, quickly about myself, a long time contributor to core and contribute space as well. I'm from Sweden. I'm working for a pharma company trying to make Drupal work essentially. So it's really quick about me. So what is the problem? I think we've all been here just trying to update the node and suddenly it's gone or you deleted it. You want to undo, you can't, you're stuck. And we all know that this is not necessarily the solution to the problem when it comes to undoing a delete or regretting, right? This can't be undone. And another problem that we have since our revisioning system doesn't support concurrent editing, we have to do things like this. That we see if you open up or if two users modify the same node at the same time, we simply have to let one of the users know that, sorry. We can't save your content. Someone else already updated it. So now you have to go through hoops and copy paste or redo and it's a little bit messy. I mean, after all, Drupal is a CMS. We're supposed to manage our content. For instance, as we just covered, it's not possible to undo a delete. At the moment we are blindly overwriting content when we're updating your changes, they can be gone in a second. If you do the wrong edit, if you edit the wrong node, if you don't have revisions, turn on your node, they are gone forever. Your changes could be big changes that someone has worked on for a long time. And the answer is not to revert to your backups. We should do better than that. And again, concurrent editing is not supported. Again, we're content management system. These are sort of the things that perhaps Drupal should be able to support. Don't get me wrong, our entity API is good. It's really good. But we don't really use it enough in core. We don't even have it turned on by default for the default content types to come with the standard installer. Perhaps we should do that. Another thing and another problem I should say is that Drupal is distributed. But we haven't realized that our content is also distributed, right? Content might be coming up from local development if you're preparing a campaign, if you're preparing some content. Content might be coming from an editorial staging environment. It might be user generated content in production. We might integrate with other systems where we pull in content from. And there might be things like API clients writing and pushing content. So our content is distributed. It's coming from everywhere, okay? And we don't necessarily consider that when it comes to core. So some of the use cases that are impossible or at least difficult to deal with at the moment are simply being careful about our users data. That's difficult because we don't do a very good job at taking care about our data, not losing it. Concurrent editing, workflows, content workflows, content staging, konflikt management. When it comes to conflicting revisions, etc. These are some of the use cases that are really difficult to solve. Yet, I think very, very important for a CMS system like Drupal to deal with. And these are really things that matter a lot to our end users. We simply want to manage our content, manage our workflow. We want to be able to move and share our content around different systems. Or at least within the site, even. And it's about letting our users know that Drupal actually cares. We won't go and lose your data just in a second just because you update or make a wrong update or happen to delete the wrong node. We need our users to trust Drupal that your content is safe. We simply want to relax and just know that everything is going to be fine. That's the bottom line, really. And a CMS should do this, I believe. So before we just go and revision all the things, let's talk a little bit about systems that do care. How have other systems solve this? Other systems that care a lot about your data that just don't want to lose it. So two types of systems that I'm going to look at. Distributed version control systems and databases. There are capable of true multi-master replikations. These are two types of systems that have great capabilities to deal and take care of data, not losing it, revisioning it, and so on. So we're going to take a little bit if we look at this. So to start with version control systems, we all know Git. Git assumes that your content is distributed, that the content can come from everywhere. Every change in Git is a new revision. Even when you delete the file, that's a new revision. Because you don't lose files with Git. Every revision has a universally unique hash so that we can identify it across our systems. Okay, very important, because again, our content is distributed. We need to be able to not only identify the general content, but also what revision is in what system, etc. And what's also unique with Git is that we're tracking a revision history, a revision tree. Every revision has a parent. It's just not a coincidence that the revisions are after one and each other in a database table. They actually have a parent. Okay. So that's some of the capabilities and features in Git that makes it a good system to take care of data. The same. We're going to look at multi master databases. There's a list on Wikipedia, on databases that are capable of multi master replication. I'm going to strike out a few of them here. The two ones in the bottom, they are proprietary databases, so we don't care much about them. MySQL and Postgres, you can do multi master replication, but again, the revision management and conflict management in those implementations are not really good. So we're not going to look at them. And I'm also striking out Cloudant here, because it's just a enterprise version of the first database that we're going to look at. So Apache CouchDB. What does CouchDB do to take care of your data? Well, very, very similar to Git, it turns out. Again, we assume that our content is coming from everywhere. That content is distributed. Every change in the database is a new revision of your document. It's a no SQL or a document store. So even deletes or a new revision. And every revision can be identified universally across systems. And every revision also have a universal or a identifiable parent. So very, very similar characteristics to Git, as we see. Another thing that I won't touch on so much in this session here today is that Apache CouchDB also have a very nice and reusable HTTP protocol for doing replication between systems. Quite useful. We're not going to touch too much on that. We're going to see a little bit, perhaps in the demo later. Yet it's a very interesting piece for many reasons. So just a little bit of Drupal history and some past discussions on this on this very topic before we go further. We've been discussing this quite a lot for a long time, actually. Even back in 2011, when we started taking notes about how we should design the entity API in Drupal 8, there was a buff with very interesting notes there. And there was a feature in Watchdog that touched on a lot of concepts that we're going to cover here. We also have Steve Ekter here in the audience. A very interesting session from Denver, where it was discussed about revisioning and workflows and struggles with editing notes. So these are things that touch on the same topic. I also had another session back in Denver about content staging in core, which is one of the use cases here. Not the only one, it's just one of them. All the discussions that we've had, they touch on sort of the same concept, which is a quite interesting concept called CRAP. What is CRAP? I want to quote chicks from the Watchdog feature where this was discussed. So in Drupal 8, and this is just for some context, this is when we were designing and discussing how to build Drupal 8. So in Drupal 8, there are two competing approaches, CRUD, which is create, read, update and delete, and CRAP, which is create, read, archive and prune. The latter would solve our issues with revisions, and thus is my preference. So that's a quick quote from the Watchdog feature there that chicks put out. So just to summarize the discussions very quickly. We do want to manage our content. We want to be able to manage our workflow well. It is a very complex topic. It has a lot of complicated use cases. However, the need for revisions, that's the lowest common denominator across all of these use cases, concurrent editing, content staging, distributing, moving, sharing content. The need for revisions is the lowest common denominator across all of these. Other systems have solved this problem already. We touched on git, we touched on couch to be how they are very good at the revisioning and managing content. And then the concept of CRAP versus CRUD. CRAP is a preferred way to do this type of revisioning. It's a append-only type of way to work with data. So create with archive prune instead of create, read, update and delete. So what can we do about this? What can we do about this in core and in Drupal in general? Well, first we have contrib. We can play around how much we want there really. We can iterate and mature the ideas that I'm going to show here. I'm going to give a demo on three or I should say two modules here. We're not going to demo the later one here. We're going to demo a implementation of the multiversion module in Drupal 8, which implements a CRAP way of storing entities. And then we're also going to demonstrate the relaxed web services module here, which is used as the transportation mechanism. We want to focus on the first part here, the first module. That's what's central to this topic really. The transportation mechanism isn't just a side effect in this case. So without further ado, we're going to jump over to a quick demo here, where I'm going to show how we could work with this and how it could be implemented. So in this use case, we have two sites, Drupal 1 and Drupal 2, as we can see here. We don't see the video, I can see. So let's do like this. So again, we've got two sites here. Drupal 1 och Drupal 2. Both sites have the multiversion module installed and the multiversion UI module, and these are what we want to focus on. We also use the relaxed web services module, which is going to be used just to transport around the content and show off some functionality around revisioning. So we're going to start by adding an article on our first site. Hello world, simply, we put in the title, and we save and publish this one. We also then have a simple drash command that we're going to use to just replicate from site 1 to site 2. We're going to move this content around. So we have drash replication start that we're going to run, and we have the URLs to each endpoint here. So we run that command, the replication is successful. So we open up site 2 and update that here. And we can see that the content has now moved over. It's been replicated over. And just to show off the sort of revisioning capabilities here, we're going to do bidirectional replication. We're going to do an edit now on site 2. And we're going to replicate that back to site 1. This could be, it could be on the same site, it could be concurrent edits. This is just one use case where we need revisions, where we need a better revisioning system. So we replicate this back to site 1, and we can see that the title has now been updated, say hello LA. We now have two revisions of this, right? So we can go in and look at the revision tree, which in this case is quite simple. It's just two revisions as we can see there. And we have the same revision history on site 2. It's the same revisions and we have a global revision hash here that we can identify across both systems. I'm sorry that the colors, they seem a little bit vague there, but I hope that you can see. We now make two incompatible updates on site 1 and site 2. So we're going to create a conflict in our entity here. So we make one update on site 1 and one update on site 2. We're going to introduce a conflict here. And then we're going to replicate from site 1. To site 2. So we're going to introduce the conflict now on site 2. And then we can open up the revision tree for this, and we can see that we have a conflict. We have a split in the tree here. I'm not sure you can see the lines, but we have two. We have splits there in the tree. We can of course continue editing. So we go back to site 1, make further edits, and we replicate that over to site 2, to continue the revision tree for our edit. And we use the drash command to replicate our change over to site 2. And we can see now that the revision tree has been continued from site 1. And we have an open revision. That was the conflicting one. Further, we're now going to delete the entity on site 2. We can see that it's gone. And then what we're going to do is that we're going to go back to site 1. We're going to continue editing this and show that the node is actually not gone. It was just on site 2 just a new revision that was flagged as deleted. But we're going to continue and editing it on site 1, replicate it over, and thus undeliting, so to speak. So now we're replicating further changes from site 1 over to site 2. And we can see that the entity is actually not gone. As in git or any other version control system. We've now continued our revision tree here for the entity. We can see that we had the first split further up in the tree earlier. And then we have another split where we actually deleted the entity on site 2. But then we continued editing the node on site 2 and replicated that. We can see here also that one of the revisions are missing. And that's because we only replicate open revisions. We don't need the whole revision tree when we replicate. So this is just an example where a better revisioning system is very capable of handling these types of things. And again, this was content staging. It could be concurrent editing. It could be multiple API clients doing conflicting changes. We're going to show another system integration now. Another use case. So what we're going to do in this case is that we're going to replicate our content. We're going to use that HTTP protocol that I talked about earlier to replicate from Drupal to another system, because it's all an HTTP API. And there are multiple different kinds of systems that use this same protocol. CouchDB is one of them. We're not using CouchDB in any way in our Drupal site. We're just using the same protocol and the same HTTP API. So we have our CouchDB here for instance. So what we're going to do now, since we speak the same protocol, we're going to replicate from Drupal 1 to our CouchDB instance. So we're going to move the content around here. And again, as I touched on earlier, CouchDB is also a very capable database to deal with revisions and conflicts. And we're using the same protocol here. So we've now replicated the content over the CouchDB. We have adjacent representation of our entity here. And to further again show our bidirectional replication capability. We're going to update the entity in CouchDB for instance. And then we're going to replicate it back. To see if that works. So we're updating the content here. Just updating the title and replicating then back from CouchDB back into Drupal. And then we're going to propagate these changes over to site 2 for instance. Just to show that we can move around and we can sort of shift around the content. And we can relax because we know that our revisions or changes won't be overwritten. We take care of the content here. It's a revisioning system. So now we have the new revision and we're now going to propagate this over to site 2. Just to show how capable this revisioning system is. So we move it from Drupal 1 to Drupal 2 under command. And when we reload the page here we have our seventh revision on site 2. Förder, we're going to touch on another use case. Another use case here is frontend applications. So if you decouple your application you have a separate frontend application. And using Drupal as your back end. Perhaps you're building an offline capable website. A website that you can use offline when you don't have a connection. That's a prime candidate for needing a revisioning system. Because you will have clients, you might have editors working on the site concurrently. Everyone being offline editing the same article. And when they get out of the tube or get a connection again, step off the train. Then they upload their changes automatically. They might conflict. We need a solid revisioning system to handle these conflicts. So I'm just going to give a very, very quick demo with a library called PouchDB. Again using the same revisioning protocol. The same replication protocol here that is capable of dealing with revisions in this way. So we have a super simple frontend application here that will store entities in IndexDB, in the local store. So the app is just a blank page at the moment. And I'm just going to show the IndexDB store here. So when we reload this page, it's now going to pull down the content from our HTTP API. And there we have the content. It's just replicating down. And now we can edit it. We can manage it however we want. I'm not going to show that particular thing in the demonstration here. But we have the full entity with all the fields. And again, if we push back changes, we can deal with that in a very elegant way. When it comes to revisions and conflicts and these kinds of things. So that's about it when it comes to the demo. Let's see if we can switch around here. So I just want to, before I continue, just want to give some credit here to the work that's been done for this demo. I want to credit Andrei here. Andrei is a guy who's been working on a lot of the code here. He's also a new core contributor. He's been submitting and getting in multiple patches in the course. I'd please give a round of applause to Andrei because he's done great work on this. Andrei is not here today, unfortunately, but he's doing awesome work on this, on this module, alongside with me. OK, so what can we do? When it comes to core, inclusion in core. Of course, baby steps, a first simple thing that won't bring us this whole system, but they will get us a little bit along the way. Perhaps we should just enable revisions by default in Drupal. We have a good revisioning API in core, but we're not using it by default in the standard installer for the content types. Perhaps we should also set tick the check box for enabling revisions by default in the entity type form. Such a simple thing just to use our revisioning API because we're not using it by default anywhere. Once more baby steps that we can do in the longer term, being a content management system or framework, whatever you want to call it. I think we should assume that our content is distributed. We need to assume that content is coming from everywhere and that there will be conflict. So we need to have a revisioning system and entity API that can deal with that. I would even say that we should enforce all content entities. They always have to use revisions. There shouldn't be a way to opt out from that. We shouldn't just be able to blindly overwrite or delete content with no turning back. With crap, create, read, archive, prune. Then we can, of course, prune away deleted old revisions in background jobs or things like that. We don't need to store everything forever. That's not what I'm saying. We can prune away old deleted revisions. We can still do that. We should treat all changes as a new revisions, even delete. Again, crap comes in here or the append only way to do things. I think that we should assign a universally unique hash to every revision so that we can do these things that I just demonstrated so that we can identify revisions and conflicts across multiple systems. We should also give our revisions a parent, not just coincidentally put them after each other in the table. We should make sure that every revision has a parent so that we know where it's coming from. Then, perhaps, as another step, we should expose revisions over a RESTful API. All these steps you are mentioning here are supposed to be something that is not going into the Drupal 8, or maybe some of them could be possible even in Drupal 8. For instance, I think the steps that suggest to assign a parent to revisions should be fairly easy to achieve right now. I do think so. I agree with you that there are definitely pieces of the system that we can bring into Drupal 8. So, for instance, the parent is something that we can do. If we do the parent, I also think that we should bring in this additional revision field. That I've introduced in multiversion, the revision hash, so that we have sort of a UUID for our revisions as well. These are two things. So, the additional revision field and a parent, and then perhaps the deleted field as well. That might be a little bit more invasive because then we suddenly need to start changing the logic around in our save and delete operations. But perhaps the revision hash and the revision parent is something that we can do. It's really not that big of a change, to be honest, what we've done in multiversion. We're not changing around the APIs too much. We're just using our APIs differently. We're just instead of deleting, we're just marking the deleted field, set that to true, and then we save the entity. It's not an API change. Necessarily. And then, of course, there are some API changes we need to do here, of course, so we don't necessarily need to move the whole system. Perhaps the whole system can move into Drupal 9 or Drupal 8.1 or 8.2. This is something that I'd love to discuss moving forward. And with those words, that's all I had to say, actually. I'm really looking forward for your feedback. Please step up to the mic. If you have questions or feedback or ideas, I'd love to discuss these things. Thank you. First of all, this is extremely impressive. It's great what you've been able to do. I'm thinking through applications here and what I've come up with so far is the use case of single node workflow or approval steps, basically the workbench moderation use case. There's the use case of needing to preview a group of nodes together. Basically the magazine issue publishing all at once use case. There's the election night use case where you're trying to stage what does the site look like if this person wins the election? What does the site look like if this person wins the election? That could be a whole bunch of things. There's the offline changes use case. There's the site to site syndication use case. Basically the network of affiliated sites use case and the synchronous editing use case, basically the Google docs use case. Are there any others that you have in mind? I think that covers it pretty well offline. There was a long list. The follow-up question is probably not possible to, or necessarily a good idea to try and solve all of those use cases in one big UI. How can UI tools be built on top of this API in such a way that these different use cases which will get built on Drupal 8 coordinated or not coordinating with each other? How can the UI tools be built in such a way that they're using the API well and not reinventing the wheel each time? It's a very good question and I don't necessarily have the answer to it, but what I definitely know is that we have a common lowest common denominator here. Using the same revision API here makes a ton of sense. How can we build the UIs on top of it? I think they can be fairly separate. And from an implementation point of view, you just need to use the regular entity API. You don't need to do anything different. You're loading, you're saving entities. If you try to load a deleted entity, it won't get loaded because we have introduced an additional method of loading deleted entities. So that's perhaps the only addition that you might need to take into consideration there. It's a good question. I don't necessarily have all the answers there, but it's not that different. You can just use the entity API as is today. You don't even need to know that the underlying system is using a multiversion way of doing it. OK, one more detail. If you don't mind, is the entity's base table the default? It seems like in each of these use cases that you need to know which revision to grab. If you're asking for node one, two, three, well, which revision of node one, two, three? How do you know which revision to grab? So when there is a conflict, obviously only one can be the default one. By default, a new revision becomes the default one. When there is a conflict, that's a trickier situation, and how the system deals with it and how the replication protocol that I talked about, how that says that you should deal with it, is that you need to have a deterministic logic. It needs to be, if you replicate the same conflict to multiple systems, the result should be the same in all systems without them needing to communicate to each other. So there is an algorithm, which essentially just do an ASCII sort on the revision hash, so the longest revision tree wins, or if they are the same length, the one that sorts the highest. It's quite simple as that, so then it's at least the same deterministic logic applied to all systems, and then it's up to your application to say, was that right or wrong? You might want to create another revision to solve a different conflict if you want to. Great. Thank you. Thanks. Larry. Hi there. So this looks great. You mentioned in terms of things to add, adding rest support for revisions. When we were specing out the rest system in core, we actually did include how we would handle that, because that's actually an area where the ITF standards have a very clear set of links and standards to leverage there. That's just a matter of we never got around to it, so that is absolutely safe to do an 8.1 as one of the maintenance for the rest module. Yes, please, someone write the patch and I'll review it for 8.1, not for 8.0. But definitely integrating that kind of stuff into our rest support is pretty straightforward, let's do it. Yeah, I agree. Thanks Larry. Hi. Hi. So a couple of questions about the whole headless Drupal thing where you might have an offline application, your example, and it may store the data in a different way, not necessarily using node IDs and stuff like that. Have you thought about ways to adapt the API calls and everything to work like, okay, this is not called a node in my system, but it's still the same piece of content. How would you handle that? So that's a trick or what, right? Because if you push back things to Drupal, they need to look and work like a content entity, a node in this case. So you would need to have some sort of transformation if you have a need in your application for it to look different. Yeah, it needs to look the same when you push it back, right? So the easiest way is perhaps to use it as is if you can, otherwise you'd have to have some custom mapping, transformation logic there. We don't push the local entity IDs, we only push the UUID and the revision hash. So you won't see the NID field for instance in your JSON, you will only have the UUID. You can work like universal IDs for all your applications. Okay, so another thing about the Google Docs scenario where you have like a really cool YC week that accepts all of that, would separate revisions still make sense for like, you know, two people are editing the document at the same time and each keystroke would generate like a revision potentially that's maybe to wasteful or how would you? It's a good question. I haven't thought of exactly how that would be implemented. I think I'd leave that to whoever wants to pick up the ball on that. But a new revision per keystroke, I think that would work. The default storage, which is mySQL, can handle loads of rows, so I don't necessarily think that will become a problem. And again, you can prune away all revisions if you have the need to. Because whenever there might be conflict, having two revisions is a fundamental requirement, otherwise you won't be able to solve conflicts in concurrent edits. It's a great question. Perhaps there is a middle way somewhere. Right, so about conflicts, say you have a big body filled with a bunch of text that may be two different edits at the same time and two different systems, only edit like okay, I'm fixing this paragraph and the other guy's fixing that other paragraph and they could be merged like Git does. Is there like a way to implement an algorithm that merges that in a smart way? The system that we have implemented at the moment don't deal with merging at all. It's up to your application or your business rules to decide how you'd like to merge. The system can't know how you want it to merge. It's the same thing in Git, right? You can do simple merges, but when there's a merge conflict, Git simply tells you open up the editor and manually merge this and then commit back, create a new revision with your fix. So it's left up to the implementation to decide how you want to do the merge really. We just flag the fact that there is a conflict. Thanks. So any ideas or plans for handling revisions for groups of entities? So for example in commerce we frequently need to care about revisions for a product and its variations or for an order and its line items. So how do you mean you want to refer to, if you have a collection of entities, you want to refer to a specific revision? Is that what you mean? We want to track all of those entities as one. I see. Git branch. Yeah, basically. Yeah, it might come in layout as well. Just to repeat the statement here. Haven't thought too much of that. I think that you could just implement some sort of event system, hook system and just create new revision whenever there is a change to parent entity of some kind. Yeah, but you're creating the same revision then. For example, if a variation changes and the parent product didn't, but you create a new revision for it and that revision tells you nothing about what's changed in the child. Yeah, I'd have to take a rain check on that one. Yeah, baby steps, exactly. In any case, great work so far. Thanks. More questions, thoughts, ideas. Yes, hello. Can you just work at all with the concept of forward revisions and default revisions? If it's working with forward revisions. Was that what you said? Yeah, so can you say forward revisions with all of this matter? So you can definitely change what becomes the default revision. When you create a new revision, it doesn't have to become the default revision. As long as it's just a new revision, it doesn't have to be the default one. The default revision functionality is essentially what you would use to do forward revisions. And you could, with the normal entity hooks, just say no, actually this should not be the default revision. So it's up to you, really. There is logic, of course, to say that we suggest that this is going to be the default revision, but you can alter that in a normal entity hook. If your module don't agree with that fact, and that this should be a not forward revision, for instance. So yeah, it's possible, definitely. My other question is, it sounds like some of this may make it into 8.1 to something like that. Possibly some of it maybe is in a little bit more of a gray area. So if this is something that ends up living in contrib for a long time, I guess my question is how hacky is it? Like, is this something that's going to cause lots of problems with other things, potentially? Or is it something that generally should work pretty smoothly with the rest of the Drupal ecosystem? So, parts of it, I definitely think that we should move into core, definitely. I think there are, as we just discussed here earlier, there are parts that make sense to move into core, perhaps already in 8, or at least 8.1 or 8.2. I think it might be too late for 8.0. But I don't think over a long term, sure, this whole revision system could definitely move into Drupal 9 or future. How hacky is it or will it cause troubles? It's surprisingly clean, I would say. It doesn't do any strange thing. It's only using our awesome entity API that we have in Drupal 8. We are essentially swapping out a storage handler to another MySQL storage handler by default. We can create, it's all captured in a non-storage specific trait. So you can apply it to a MongoDB storage handler if you want to, to achieve the same system. And it's enabling revisions, the way that you would enable revisions using the entity API. So it's just creating additional tables with the schema handler that we have in the entity API. So we're not doing any trickery or hackery to achieve that. So it's very clean. As long as you use the APIs as they are meant to be used, then you're going to be in good hands. So for instance you need to use entity queries for all your queries. Because the entity query API then will take into consideration that this entity is using revisions. So we need to query the tables in a slightly different way compared to if you wouldn't have revisions. So as long as you use the APIs as they are meant to be used, then it's going to be good. And of course if modules assumes for some reason that your entity would not have revisions, then it will cause trouble. But I can't come up with any sort of reasons or cases that would actually cause trouble in that case. I can't see how you would make those assumptions to be honest. So I think it's good. Okay, no further questions? Thank you everyone.