 Welcome. Welcome to content staging in Drupal 8 on the advanced site building track. Good to see so many of you here. Please feel free to come up to the front. I don't bite. I'm not dangerous, so don't camp out too much in the back. What are we going to talk about here today? We're going to cover first of all what content staging is, what it's all about. Then we're going to talk specifically about the solution that is being built for Drupal 8, what we want from that solution, and again how it's being built. It's currently being built. It's by no means ready. As we all know, Drupal 8 hasn't come out in beta yet, so it is very, very close. The solution is currently being developed. We're going to cover a little bit about reusable protocols. I'll dive into more detail in that later on. I'm even going to make a short note in the end about something called headless Drupal that, believe it or not, is actually or can be relevant to the topics that I'm going to talk about here today. But first, who am I? My name is Dick Olsson. I'm from Sweden. I currently live and work in London, though, as a digital engineering manager at Pfizer. But I'm not on stage as a Pfizer employee. I'm on stage as a community member. I'm Dickson underscore on Drupal.org. Dick Olsson on Twitter. I love everything about the web, and I'm very passionate about contributing to Drupal. I do occasionally work on Drupal core. I've been working a lot on the entity API in Drupal 8. And I also maintain various modules, contributed modules, of course, the modules that we're going to cover here today. Okay, so let's dive into it. What is content staging? Content staging is actually quite a few things to many different people and many different organizations. Looking at it technically, it can be as simple as the workflow that we see on screen here. You have a stage site where your editorial team is working, writing their content, making changes, and then the content, only the content is being then pushed over to a production environment. So two separate environments, as you would have for any other development workflow. So very similar to that. You don't necessarily need to use it for editorial reasons, though. You can use content staging for other reasons as well. So, for instance, you can put your stage site behind a corporate firewall. If you're dealing with sensitive content or classified content that you can only publish securely, you can put your stage site behind a corporate firewall and do secure publishing. That's another use case that I've been working a little bit with when I was back in the Middle East working for Al Jazeera TV network. We dealt with a lot of sensitive content and we put a stage site behind corporate firewalls and then did publishing to the production site when the time came. Other things, lots of different workflows. You can use a similar workflow as you see here, multiple editorial sandboxes where multiple projects, big projects, small projects are being worked on in separate sandboxes. They can then be staged and merged together in a workflow similar to what we see here to stage and then further to production. Although we don't use any Drupal 8 solutions at Pfizer at the moment, we use a similar workflow for the type of stuff that we do at Pfizer at the moment with the Drupal 7 versions. All the changes are done in separate workflow, in separate sandboxes, excuse me, and then merged together. You can also set up a hub-spoke workflow. That's very common, actually. You have a central editorial site that powers different sites in a similar fashion like we see here. Doesn't need to be the same site, can be completely different sites, but they share maybe an article, content type, news, something like that that is powered by an editorial hub. Changes might go out. Changes might also go back to the editorial hub and out to another site, so to share that content. And again, another model. You can have a network of sites, no centralization. This is more of a sharing model, sites sharing content, pushing content back and forth between various sites. So these are some of the examples that people set up and think about when they use content staging or when they do similar things. So it's a lot of different things. It's content staging as some traditional things of it. It's content sharing. We can look at it like content replication or content syndication, for that matter. So what are we looking for when we need a foundation, a solution to build all of these workflows on? We don't want to just build a content staging system. That would be too narrow. We need a loosely coupled framework, a content replication framework that is able, capable of moving content back and forth, single direction, bi-directionally, back and forth between systems. It doesn't even necessarily have to be Drupal sites. It can be different systems even. We also, of course, want to learn from the work that we've already done in Drupal 7 with the UUID module, with the Deploy module, and a module called WF Tools, Workflow Tools. A module that we work a lot with at Pfizer, that builds on top of the UUID and Deploy modules. So we want to learn from that experience as well. We've gone through a lot of testing, a lot of usage, so we of course want to take that with us as well. And to list a few more things that we want. We want revisions everywhere. Having revisions on everything that you stage is very important. It's important for all the trailer reasons, of course. It's also very important in order to be able to detect and handle conflicts. Let's say that you have a bi-directional workflow. Two changes are being made to the same article at the same time, and then replicated. Of course, we need to detect conflicts. Another big problem that we're facing when we move content around is dependency management. We want to make that easier. Dependency management, just a short note on that. If you have a node, and you want to move that around, you also need to move its dependencies. For instance, its author account, or its user account tags, other node references, and so on. So we want to make the dependency management easier. We want to be able to do continuous replication, ad hoc pushes, bi-directional replication. I already mentioned that. Of course, a REST API is what everything should be based on. That's the best way to design a system like this. This is what we really are looking for. These are our requirements, if you may, for the solution. How is it being built? I'm going to cover, I'm going to dive into technical details here. It's going to give you a good overview of what modules you would be looking to use if you would need to build a similar system like this. So we start with a few core modules and core APIs. Serialization module, very important in order to do what we want with the REST API. We base it on the RESTful Web Services module in core. And, of course, heavily relies on the entity API for that as well. The entity API is very nice, much improved in Drupal 8 compared to Drupal 7, and enables us to do a lot of new good things. So the contrib module that we're going to cover here today is the multiversion module and then relaxed web services module. It's an extension of the RESTful Web Services module. And last but not least, the deploy module. We're going to cover what these modules do separately. So multiversion module first. The main purpose of the multiversion module is to track update sequences in order to make dependency management easier. In Drupal 7, we try to detect dependencies. It's very, very difficult to do that. Lots of contrib modules, they add things onto the entity object. They don't necessarily track things with UUIDs. It's very difficult to detect dependencies. We made it work, sort of work in Drupal 7. But in Drupal 8, we approach the problem differently. We basically track the update sequence and look at it more like a traditional replication, like a database would do. We more or less replay changes. That way we don't need to deal with dependencies. It also provides revision support for all content entities. So we go in and alter things in the entity API to enable revisions for everything, or rather for content entities only. We don't do this, of course, for configuration entities and things like this. And we can then, with both of the above, we can then track revision trees in quite a similar way like Git does it in order to handle support, to support conflict detection. So that's the main purpose of the multi-version module. We extend, as I said, the revision model in entity API. Revisions are not tracked with a universally unique ID. The revisions are actually calculated, it's a hash calculated from the actual changes. This makes it easier to detect conflicts across environments without needing to call over or do deep inspection of other nodes or other environments in the system. So the revision itself is a hash based on the changes. You can see an example there of how the hash is being calculated. And to look at an example of an entity serialized to JSON, you can see here that we add a few metadata fields, if you may. We track additional revision information. You can see the revision hashes there. We also tracked a local sequence number for each entity in order to be able to accurately decide in what order an entity was saved. We can't necessarily rely on the entity ID. Different content entity types might have different IDs. So we need to globally be able to, or locally I should say, be able to detect the sequence in which we save. And then lastly, you see another field called deleted. It's a Boolean field, false or true. Sounds a little bit strange maybe. But we actually need to do more crap instead of crud. That's where the deleted field comes in. Crap stands for create, read, archive and purge as opposed to create, read, update and delete. We never delete entities with this new system, with this new revision system. We need to do this. We need to avoid deleting entities so that we can handle conflicts so that we can accurately replicate changes. Very much like Git does it. If you delete a file with Git, it's actually never deleted. It's just saved as a new revision with a deleted flag set to true. This way, we can revert deletes and we can then, if two separate changes are being made on two systems, one is being deleted and one is updated, then we can actually handle that conflict if we don't delete entity itself. To save space in a database, we can run a compaction job that actually purges old revisions if needed. So there is that functionality. But we still keep the sequence index and we still keep revision metadata so that we can track the revision trees and take decisions upon on that when handling conflicts later on. To sum it up, more crap instead of crud. Sounds a bit funny. To go into some technical details, in order to do this, we switched out this storage controller for all content entities. By default, we switch it to a separate SQL content entity storage controller. We do consolidate all of our generic changes into traits so we can apply this to a MongoDB controller if you want to or wherever you want to store your entities. So it doesn't necessarily depend on SQL storage. That's just the default. And we need to change, again, the storage controller in order to change the semantics of deletes and so on. We have a service for tracking the sequence index. It stores the sequence index in the key value store. So it's fast. We have other services, a conflict manager and a compaction manager as well in order to handle conflicts and in order to handle the compaction job to remove all revisions if needed. Next module, the relaxed web services module. It provides a restful or relaxed JSON API. It's an extension of the restful core module. It provides endpoints for all content entities and file attachments because we need to replicate those as well. Looking at it simply, the biggest difference from the core rest API is that this API we provide, we identify each entity by its universally unique ID. That's the biggest difference. We also have endpoints in the API for comparing revisions to understand what revisions are missing in this environment and what revisions are missing in another environment so that we can do efficient communication and replication. We also have endpoints for starting and stopping replications and some other administrative tasks. There's also an endpoint for listening to changes in another environment so that we can do real-time replication between systems if needed as opposed to just ad hoc replication, someone pushing a button or something like that. Then there's going to be a separate Drush plugin for running these replications so that you can run them as a background task and not block the user when it's working on the site. So there's going to be a separate Drush plugin for doing the replications. And how do we implement this? To dive into some technical detail, the rest module in core provides a very nice API so it's very easy to define additional resources for your API. We have a very quick example here, a resource for all the documents or entities. We also have an endpoint for doing bulk updates. And then we also have the changes resource, an endpoint to listen to changes to another site. And of course also a revision-def endpoint, as I talked about. And then lastly, we have the deploy module. The deploy module provides just a simple UI focused on content staging. If you find a use for multi-version and a relaxed web services API without doing necessarily content staging, you don't need to use the deploy module. The deploy module is going to be a UI specific to do just content staging workflows to handle replications and conflicts and so on. So how do the actual replication work? There's a protocol around this and this is roughly what the protocol looks like. Step one is to identify the source and the target environment. We do that by UUIDs as well. So each environment has their own UUID. Then we get the current checkpoint from the target site. So from what point in time did we replicate the last time? Then from the source site we get all the changes since the last checkpoint. The checkpoint here basically is the sequence number that we talked about before. So stored in the sequence index. Then the result of those changes are then being passed to the revision-def endpoint on the target site so that we can compare what other revisions do you have and what revisions are missing because revisions might have been replicated another route to the target site, not necessarily from this source and target replication. So revisions might have ended up at the target site for various reasons. So we pass that so that we only get the missing revisions. We collect the missing revisions and we post it to the bulk operation API on the target site. That's roughly how it's working. And for those of you familiar with the Drupal 7 version of Deploy, you can here see that we do a lot less communication over HTTP here. We do that because HTTP is inherently a very unreliable protocol. So we limit the number of calls we do over the network here to make the replication more solid and less fragile. And then last, we save the new checkpoint to the target site. So next time we come about and do replication, we only take what changed since then. And I did promise in my presentation or in my description that I was going to do a demo today. There's been a lot of work on Drupal 8 core APIs. Unfortunately, I'm very sorry to say that I won't be able to demo here today, but I'm very optimistic that during the sprints on Friday and Saturday here, if you grab me in the corridor or at the code sprint, I will have stuff to show you. Very exciting things. We have a good test coverage. Lots and lots of the functionality is already there. The last missing piece or the two last missing pieces is actually the changes endpoint to listen to changes and the replicator itself, the drush plugin. Everything else is actually there already. So instead, I'm going to move on and talk a little bit about reusable protocols. The revision and conflict detection model that we use is more or less taken straight from Git and a system called CouchDB. CouchDB is a document database that is based on HTTP only. It's doing its replication over HTTP. We don't borrow any other technology other than the protocol and the API itself from CouchDB. So it has nothing to do with CouchDB storage or anything. We just use the same protocol and the same API specification. So it actually, your Drupal site, when you install the relaxed web services API, your Drupal site actually looks exactly like a CouchDB database. And why do we do this? Because someone already figured out how we should do replication over HTTP. So why event our own protocol? Why try to solve the same problem? They've already figured out how we more efficiently do HTTP replication. Because in the end, when we talk HTTP, your Drupal entities are just JSON objects, exactly as CouchDB stores them. So we can reuse the same way of doing replications. And reusing API specification lends the framework to exciting and unexpected use cases that otherwise would have not been possible. So for instance, in Drupal 7, the Drupal 7 version of deploy, we have our own way of doing replication. We have our own API. Only deploy and the belonging modules knows about this system. And no other systems can interact with this naturally if you don't do custom development to this API. So reusing the API can be quite exciting. There are a few exciting projects that do implement this API. CouchDB, world of the mentioned that. PouchDB. It's a portable CouchDB. So it's a browser database. It's for your client side. So you can write documents on the client side and replicate back. There's also TouchDB for touch phones, for smartphones. Implement the same APIs again. And this is where I'm just going to make a quick note on headless Drupal. With this system, you could do something completely, I shouldn't say completely different from content staging, but something quite separate from content staging. You could use, for instance, a completely separate frontend. You could write your AngularJS application. You write to PouchDB on the client side. And then PouchDB understands this API protocol and can replicate from the client side back to your Drupal 8 site. And then you can replicate also. You can do a pool replication. So you can pull in content from the backend as well. So you could build a quite exciting headless solution with these modules as well. Because the API that we have designed here is a lot richer than the standard core API. So just a quick note on that. So to come to some conclusions here. We can understand from looking at this, it's not a straight port of the Drupal 7 versions here. We are redesigning the protocol in quite a big way. And how we go about content staging. The Drupal 8 APIs that we have, the new entity API, allows us to do things in a much better way. So why port them straight when we can improve things a lot better? We're creating a loosely coupled system. So we cover more use cases. And we probably haven't discovered the ways that we can use these richer REST APIs yet. But I think it's a good way of building a system like this. We're also implementing battle tested protocols. So we know that the system already works. It's being tested already. So as I mentioned, the solution is already being built. We're going to work a lot during the code sprint on this solution on Friday and on Saturday. So I want to say a quick special thanks to Andre. I think he's here in the audience. Andre has been helping me building this solution. And I need more people like Andre to help me on Friday and Saturday to build this. We have the community code sprints on Friday. Please grab me. I'll be there. Come and say hi. You can write documentation. You can work on implementation. You can test. You can write tests. You can talk to me just to ask questions, maybe present to me some of your problems if you do content staging. Just come in and we'll figure things out together. They're all going to be mentors at the code sprint. So they will be able to help you out to get your environment set up and everything like this. And it's actually with these words that I would like to say thank you. I'm very sorry that I couldn't show a demo today, but please come by on Friday or even grab me in the hallway here afterwards and I can show you the test cases that we have. We can browse through some code if you want to. So thank you all. We do have two mics up in the corridors here. So please feel free to step forward and ask any questions if you have any content staging, editorial workflows, APIs, headless Drupal, anything like that. Please. No questions. It was clear. Okay. Question up there? Please step forward to the mic here. Using the Hub workflow, can you push to Drupal Save in the sites as well? You could potentially push to a Drupal 7 site. It will allow it. It will need you to do a lot of custom development. We haven't implemented this API on a Drupal 7 in a Drupal 7 module yet. You could potentially do it yourself. I have no plans in doing so, but since it's just an API and HTTP, you can of course do that. The tricky part with Drupal 7 is that only nodes have really solid revision support. So that is where you're going to run into problems. But potentially you could, yes, with the REST WS, the REST web services module for Drupal 7, you could extend that and implement something similar to push to a Drupal 7 site. So that would be possible. There's not a solution for that today, but I would say it's possible with some work. Yes. Another question. Do the contact times need to be structured equally? Can you repeat the question? I didn't understand fully. If the sites are different, they may have different fields. Can this solution be used for that to push through slightly different structures? Yes. It can definitely be useful. Obviously only the fields that exist on the source site will get over to the target site. But if your fields on the target site that you have in addition are not required or have any other sort of required business logic around them, you can definitely save entities with different fields in them. The entity API handles that. You can handle that. There might be a problem, a slight problem if the entity type has a different name. You can probably hook in there and do some mapping, custom mapping if you want to. So that's possible with some tweaking for sure. I think we have another question here. Yeah. I was thinking about when you do some staging data and you move some updates, on the target site, do you run all the hooks like save hook and update hook so you have to do those work? Yes. The save on the target site is as any other entity API save. So the update hook, the insert hook, all of them are run. So your target site doesn't make a difference if it's coming through the API or if it's being saved by a local editor. And I think you have handled a position that the target site is not pushing back to the source server then. Because you are updating one and then you push it and then it live feeds back. But I think you have worked it out. Yeah. So we're tracking the revisions, right? So it will have the same revision ID. So we know that this revision ID already exists in the source database. So it won't appear in the changes feed or it will actually appear in the changes feed. But the replication protocol before it replicates always compares the revisions and only replicates the missing ones. So it won't actually get into the replication payload. It will appear in the changes feed for any other clients that might listen to changes, but it won't be pushed through the same replication again. So the protocol is already signed to handle this. Thank you. Yeah, thank you. Any more questions? To repeat the question from the audience, when do I consider this production ready? First of all, we need to release Drupal 8. Some APIs need to stabilize. So that's obviously the first step. I don't think many people would say that Drupal 8 beta is going to be production ready. We probably need to move to a release candidate for Drupal 8 core to be production ready first. I do think that these modules will be ready when Drupal 8 is ready for production. As I said, we're actually not far away. There are a few critical pieces missing in order to make the workflow work, which is why I couldn't show a demo here today. But it will follow Drupal 8 core production readiness. I'm pretty confident about that. Okay. Thank you, everyone, and feel free to grab me in the corridor and ask me questions if you want to. Thank you.