 So, hi everyone. We are going to start. It's just a little bit past one. We picked a really wrong title for this session, so don't be too afraid about that. We are going to basically talk about the way forward for entity storage in Drupal 8. That's basically what this talk is about. So, I'm Damien Tourneau. I'm the CTO of Commerce Guys, and I'm going to leave the floor to Lucas Smith for the first part. We are going to try to do that in half an hour, so a quarter of an hour each, and try to have the floor open for questions for half an hour more. We do have a strategy coming forward there, but I'm going to leave to let Lucas talk about the stuff he has been working on since the past couple of years. So I'm going to talk to you guys about something called PHPCR, which is a specification for Condit Management Systems Storage APIs. And yeah, just bear with me. I'm going to explain a little bit how it works, what you can do with it, and then Damien is going to talk about how all of that can start to trickle into Drupal potentially. Normally for this talk, it takes me about one hour to do it, so I'm going to do it in 15 minutes. I'm going to skip a couple slides. We didn't throw them out because we do want to give people the opportunity to maybe stop me if they think that something should be elaborated on, and also just to make people aware that these slides exist. So if they want to go back, they can check them out. Actually now if you have your laptop open, you can go to phpcr.github.com, and there is a link to most of the slides that you see on this talk, except for the stuff that Damien is going to talk about. So without further ado, let's start. So if you're designing a Condit Management System, you have very specific needs for your data storage, and one of the specific needs is the fact that your data is going to be unstructured, and so it will not be a natural fit for a relational database. And obviously, today there are not a lot of so-called no SQL solutions which are sort of a perfect fit for that, and with Drupal 7, we already have a little bit of that with the option of at least putting some of your data into MongoDB, for example. So that could be an obvious choice. The problem, however, is that most of these no SQL solutions are not really meant to then also maintain a tree structure, let alone a graph of these different pieces of content. And there's sort of like a sub-community within the no SQL world called graph databases, which then try to address this. However, for the most part, they're optimized more for mapping like a social graph or something like that, not really for managing larger pieces of data. Now, the other thing that you want with the Condit Management System, you want to be able to version your data. You also want to be able to really closely manage the permissions on your data and things like that. And in the end, you want to have a system that is still workable for a normal human being without being confused. And that's actually one of the historic sort of issues with many of the Condit Management System, especially in the PHP world, is that their APIs for storage are more like accidents that you fix and then you keep going and you fix and you keep going. There was never really the moment where people said, this is the domain that we're trying to solve and this is the API we're going to use to solve that problem domain. But more like, okay, we have this and then we add that. Okay, no, we have to refactor this. And that's why these APIs become very complex. And again, I'm not a Drupal expert, but what I hear about having an entity API and a field API and all of that sounds a little very accidental, rather than really designed and with a clear concept in mind. Now, this is what PHPCR tries to solve. And I highlighted something here, it's a standardized API. So PHPCR is not an implementation of a storage API. It's just the definition of the API. And that is very, very significant. Because one of the biggest problems that you could try to solve with a storage API is being the perfect solution for everybody. It's not possible. It's simply not possible. What you can do is you can define an API that addresses a specific problem domain and then let people have different implementations of that API that are optimized for different use cases. So for some use cases, it might make sense to have a separate table for each node type, but if you do that, certain questions will become really hard to implement. Other questions will become very easy to implement. Now, if you're trying to do a general purpose solution, you can only go one way or the other. Or you go in the direction of an API, which allows you to then have different implementations that are optimized. But the higher level code that is using that API remains the same. So just quickly, PHPCR is actually based on the Java repository specification, which has been around for over a decade now. The Java guys are actually really, really happy with what we're doing. So we actually met up with the lead of the JCR specification. And it's been decided that the next version of JCR will actually include PHPCR, which I think is unprecedented, like Java specification containing PHPCR, PHP interfaces. As a European, I don't have to worry that much about IP stuff or software patents, but indirectly becoming part of the JCR specification also means that we will be part of the same legal umbrella, which means that for people in the US that unfortunately do have to worry about that, that they're also covered in that regard. One of the key things if you have a specification like this is you want to have multiple implementations. And we already have multiple implementations being worked on. One is Jackalope, which is implemented purely in PHP. The other one is called Midgard, which is implemented in C. With Jackalope, we have an additional concept for transports, so we have multiple transports being worked on. The reference implementation is the one using Java Jackrabbit. And the great thing, because Java Jackrabbit already has all the features implemented of JCR, it really made it very easy to get all these features really quickly to be able to play around with them and scale up to let's say, like the setup with 100 gigabytes of data and work with that sort of data set. But we do need to have a pure PHP implementation that's just, PHP and some relational database. That's what the Doctrine DBL transport layer is for. And in theory there could be, once we have that working, it should be very easily possible to do one based on DB, TNG as well. Now, here's just a high level overview of the features. So with PHP CR, you get tree traversal, so you can look up a path, you can get its parents, you can get its children. You can also optionally get a UUID for any node, and then directly find nodes based on the UUID. And you can reference other nodes based on the UUID. There's also a full text search API. There's versioning in there, some of the features may be optional. So there's a capability discovery API, which lets you figure out what is actually supported, like is versioning supported in this implementation or not. There's XML import and export, and as a matter of fact, PHP CR essentially specifies an XML database. So you can take any XML document and drop it into PHP CR without having to do any changes to the data itself. There's also support for locking and transactions, and also for permission control, and finally also observation, which is sort of like triggers for relational database. I think I have to go faster. So really, so you have a document storage API that is perfectly fit for tree structures, actually graphs. Nodes basically are identified by the name and their parent path. And I give a little example here. So if you have a node stored under my path underwater fish, then actually that path is constructed from the parent path and the node name fish. So what is important here to understand is that every node must have a parent, and every node must have a name. However, you can actually have multiple nodes under the same path, which again is something that is necessary for XML compatibility. But in practice, I usually disable that because it's not something that you want to use. On a node, you can have any number of properties. Here I have a couple examples of different types that are supported. So string, Boolean, long, double, date, and all that. References and weak references are very important concepts. The difference between the two is that with reference, you're basically maintaining referential integrity. So you cannot move a node that is being referenced. But if you have a weak reference, that can actually point into nothing. So that you can pick and choose what you want to use. What's great, though, is that in PHPCR, the referring nodes has a list of referrers, so you can always go sort of bi-directional. There's another concept called primary node types. Every node must have a primary node type. And with a node type, it's sort of like a schema definition. So you can mandate specific properties. You can mandate the node types of children and things like that. And the important bit here is that you must choose a node and you cannot change the primary node type after you created the node. However, there's another concept called mix-ins. And with these, you can add and remove these later on. So one example of this is Referenceable. You have that on the next slide. So once you make a node referenceable, it automatically gets a UUID. But you can also remove that mix-in later on if you decide that it shouldn't be Referenceable anymore and then the UUID is removed. In the same way, you can work with versioning. You can just make a node versionable after it's been created and you can make it remove that later on as well. And you can add your own mix-in types and primary node types. And you can extend from existing ones or from your own defined ones. There's another concept called Workspaces. I'm going to jump over that because it's not so critical. Here's a very bird's eye overview, how you work with PHPCR. So basically you have a repository. You log into that repository, which gives you a session. And then on that session, you can get node instances and then get the properties of these nodes and then interact with them. Via the Workspace, you can get some other objects as well for the full text capabilities and versioning. I have a bunch of code examples. I'm going to gloss over them, but they exist. The key thing I want to say here though is that you see these first seven lines. This is all that is specific for one implementation. All the code examples after that are generic. So they should work with any implementation of the standard. So here just crud operations, tree traversal, versioning, restoring previous versions, searching. There are different APIs for that. So there's one that's sort of SQL-ish. There's one that is object-oriented. And we have a fluent interface on the object-oriented model as well. The other important thing is that if you have a standard, you need to ensure that people are actually compliant to that. So this is the test suit running against JackRabbit. This is a test suit running against the Doctrine Database abstraction layer implementations. As you can see, that one doesn't have all the features yet. But the Midgard one implementation also works against the same test suit to really ensure compliance. And I think that's actually critical to have. There's also, similar to what you can have with an ORM, we have a simple solution to basically allow you to map objects to PHPCR nodes. But I'm not going to go into details on that. But that's a pretty, pretty nice solution that I'm using my daily work quite extensively. So the conclusion is that really, PHPCR tries to address the problem domain of a content management system. It doesn't try to be a general-purpose database. So as an example, if you're creating a web store, you may not want to put your orders and inventory into PHPCR because you might be doing aggregate queries on that, like you're going to ask questions like, how many t-shirts has this person ordered in the last month? And these are the type of queries you can effectively answer with relational database, but not so much with PHPCR. However, when you're talking about organizing your products into categories and things like that, that's a perfect fit for PHPCR. And so again, the key point here is that PHPCR really is focused on the CMS problem domain. And you can mix and match different database solutions. Also key value stores for some situations might make more sense than PHPCR. And yeah, really just use the tool that is ideal for whatever use case you're trying to solve. And I think that's, yeah, so more information, more information. Contributors, so we have quite a number of people working on this. Although I should admit that one-third of these people are actually coworkers of mine. A bunch of companies involved, a bunch of projects looking into that. Midgard and Symphony 2 have adopted PHPCR. Type of 3 have something that diverged a couple years ago, but they might come back. Nukku is a re-implementation of the Joomla core, and they're also looking at PHPCR in the future. Easy Publisher looking into it, and I hope Drupal as well. More links, more links. And so, sorry, I was a little too fast. So maybe we should just open the floor for questions right after this part, because you might have some, or maybe you are dead already. Any questions for that part? So I'm going to go into trying to figure out what does this mean for Drupal. This is all, this basically all started in London. In London I gave a core conversation about document oriented storage. It was a first approach of what needs to happen to actually refactor the way we store entities in Drupal, because from experience we have, right now, we have an architecture that we have not really planned for in terms of storage. The architecture itself really looks like that. So the color scheme is from messy to less messy. So we have, as you know, we have this concept of entity controller that in Drupal 7 only handle loading entities. In Drupal 8, hopefully it supports the world credit operations. And this entity controller is basically going to store some of the properties of an entity, but not all of them. It's going basically to store, is responsible for only the part of the entities that is stored in the base table of the entity. So it's responsible for, that's the system that is responsible for maintaining that data in the node table, incidentally also in the node revision table, in the user table, et cetera. On the other side we have the field API. And the field API is responsible for storing and managing everything that is the field. So it relies, it has been nicely designed in the sense that the way we store the fields is actually pluggable. So you have this notion of field storage that is per field and that is responsible for storing the field. The problem is that it's not, those are not two independent things. There is one system in the middle which is entity field query, which is our querying language for entities that actually needs to know about both. And depending on the use case and depending on the entity field query you are trying to execute, either entity field query is going to execute the query directly through the SQL database or it's going to delegate the query to the field storage. And the field storage needs to know both because it needs to be able to execute a query that applies both on properties of the base table and on field. And the problem is that it's not designed to do that. So what's happening is that when we have the implementations of the field storage we have in contrib, especially the MongoDB implementation are basically both a field storage and an entity storage at the same time. We don't, in the MongoDB implementation we don't store only fields. We store the whole entity object so that we can actually execute entity field query on both properties of the base table and on fields. So that's the current state of the game. So during my talk in London, I was convinced that we would have to reimplement a field storage by ourselves. Now that we looked a little bit into PHPCR I'm way less convinced than that and I think that PHPCR could long term satisfy most of our needs in terms of entity storage. Probably not all of our needs but we can work with the PHPCR community to have that implemented. So PHPCR is a potential long term solution for entity storage mechanism. The problem is we don't have that much time. The code freeze for Drupal 8 is something like eight months away. It's basically the end of the year. So we don't have a lot of time and especially we don't have the time to work both on the PHPCR side and on the Drupal side at the same time to make sure that they satisfy our needs. So what I suggest is that we go into a strategy with a set of steps and the first step is actually to clean up what we have right now. So the architecture I suggest is that we remove that dual storage and unify the storage into a single entity storage. So I suggest that we remove the field storage part and move both properties and field storage into a single unified interface. Interface that is pluggable so that you can replace it with other implementation. But the idea of this first step is absolutely not to touch the data model. We are not going to touch anything about how stuff is stored. Using the SQL implementation that is still going to be stored in a base table and in a one table per field for all the fields data. The only thing that I suggest is that we unify that into a narrow focus storage interface that is specific to storing what looks like a Drupal entity meaning something that has some of them are property of base table and some of them are fields. So I'm not suggesting we implement I'm not suggesting at this point we implement a generic document oriented storage not at this step. I'm just suggesting we clean the mess we have from Drupal 7 and unify all that using a consistent interface. The problem is that to do that we have to do a couple of things first and there are a couple of prerequisites for this step especially the way the taxonomy term entity is currently implemented is so very messy it hurts. And there are two things we really need to fix before being able to unify the storage. First thing is that the term hierarchy actually needs to be a field. In Drupal 7 we learn that everything that doesn't store data that something that want to extend an entity and doesn't store data in a field is wrong. We learn that, right? So every module that was in Drupal 6 every module was trying to implement its own storage separately. The idea is that the field is the primary way of storing something that extend an entity in Drupal 7. Except that core doesn't do that. Core stores the hierarchy between terms in a separate table that is not a field. So we need to make that a field which has some interesting consequences. The second thing is that for taxonomy terms we currently store the VID of the vocabulary not the actual machine name of the vocabulary which means that the storage for core is actually completely inconsistent. The storage for the taxonomy term is completely inconsistent and it means that you cannot actually do an entity field query on a taxonomy term filtered by the bundle of the taxonomy term because that's not supported. That just doesn't work because taxonomy terms are not stored properly. They should store their bundle name directly into the base table of the entity and instead of that they store the ID of the vocabulary. So we need to fix that couple of things before being able to implement step one. So step one is really the only thing that I want for Drupal 8. The rest is if we have time let's try to do something like that. So the possible step two is to adopt PHPCR and when you look at it the architecture is really, really similar to this one. Basically the box are the same. The idea is to just move the responsibility for a box elsewhere. So in step one we had this box that is the entity storage interface. Basically that maps to a PHPCR session or workspace. We have this box that is our entity field query language that actually maps really well to PHPCR types of queries or to the effluent interface provided by the query builder. These things that we have that is a document store in SQL that maps really well to PHPCR implementations or Jackalop transports. And Jackalop has a SQL based implementation that we could use that is going to store the data differently than our current implementation but that should be all right but that should be all right for most of the small use cases. But to be able to do that there are a couple of prerequisites which some of them might make it impossible for Drupal 8. On the Drupal site we are going to need to come up with a mapping between our concepts entities and fields into a PHPCR schema, so into a PHPCR notion of node versus properties. Also the PHPCR SQL implementations, the Jackalop SQL implementation is currently built on top of Doctrine DBAR. So either we adopt Doctrine DBAR as a replacement for DBTNG which is a large endeavor or we implement the SQL implementation of Jackalop on top of DBTNG which should actually be easier to do. So that's for the Drupal side but there are also some stuff that we might want to consider helping on the PHPCR Jackalop side. First thing is that the Jackalop SQL implementation is not very scalable. It's even less scalable in most use cases than our current SQL implementation which is a challenge. So we probably need to work on improving the scalability of this implementation. My ballpark figure is that for something that is less than 10,000 or a couple of 10,000 nodes or entities in a website, the SQL implementation should have the same scalability aspects than the current SQL storage we have. So that's my benchmark. My benchmark is to do not worse than what we have in core right now for sites that have less than 10s of thousands of nodes. There is also one of the things that we might consider is having an implementation of PHPCR that stores different nodes per type on different tables. Meaning that we could have a user table, a node table, et cetera. Right now the implementation stores every object you put in a single table which is going to freak people out. It's not necessarily bad in terms of performance but it's definitely going to freak people out. And generally we, as everything we adopt from elsewhere, we should really consider heavily contributing to those external projects. And actually that applies for, that would apply for this but also apply to all the components we have included from symphony. The Drupal core contributors really needs to become symphony core contributors because if we don't do that, that's never going to work. There are a couple of other steps we can take and one of them I would like to emphasize is search. One of the nice things about PHPCR is that you get full text search across all the fields of your entity by default. The problem is that this is specified in the PHPCR specification but it's not currently implemented in SQL. And the nice thing is that we have a super nice SQL based full text search implementation that's called the search module. We could take this implementation and contribute that back as a full text search implementation for Jackalop which is the PHPCR reference implementation. So that's definitely something we could do and we could remove most of the code from the search module and stop contributing and stop maintaining it just by ourselves and having a larger community of people maintaining it. There are a couple of other fun stuff we could do if we have time. But in a nutshell, the thing I want to emphasize here is that there is absolutely no way we managed to adopt the full PHPCR and migrate everything to PHPCR in Drupal 8. I'm not even dreaming of that but if we can come up all together and clean the API so that that type of stuff is possible for Drupal 8 in Contrib, I think that already would be amazing. So that's about what I wanted to say. We have a couple of resources for Jackalop, the two RSC channels, the GitHub project for PHPCR and the reference to the specifications. So that's the only thing I wanted to know. I wanted to say at this point we are going, I would like a bunch of us on Friday to come up together and start working on the first step. I think that if we have the right people in the room, the first step could actually be implemented by Friday night and that would be amazing so that we can move on with our lives and start building the rest. So if you are willing to join me on Friday to sprint on that part and get a set of patches together, that would be amazing. And it's now 1.30 and we open the floor for questions. Hi. So it sounds like having this kind of implementation in place would make it trivial in order to export content from one website and import it into another website, even if that website wasn't necessarily a Drupal site but was supporting JCR. Is that kind of something you have on the radar? That's not true. Okay. That's not true. JCR and PHPCR is a storage specification. So it's specify an API to access a storage layer. So it's basically the same as saying, it would be basically the same as saying because those two systems are implemented on top of MySQL database, it's trivial to exchange data between the two. That's not true. The way you store data doesn't really have an impact on the way you exchange data but there are specifications that could be used for that and one of them is CMIS. That's what? CMIS. Ah. And that's the specifications for external exchange between systems. The fact that two systems are built on top of JCR doesn't mean that they are going to be able to exchange information. Right. But once the interface is in place, then you could, it would be, you only would have to write that part. That you would have to build an interface for CMIS. Yes. So that's more in the focus of CMIS. That was in, we were planning initially on talking about CMIS but finally we decided that we, it's too boring. I don't know, it's kind of interesting to me. This looks very good. One of the key questions I'd have though, one of the limitations with Drupal sites that use Mongo as their backend right now is there's a lot of cross entity queries that become very complicated to do. The canonical example is, you know, artists and album and track for like music CDs and you want to pull data from all three entity types and you can do that fairly easily by just hitting SQL directly in views and if you have a Mongo back end, you can't actually do that kind of query. How does JCR handle that kind of problem space where you have these complex multi entity queries? So I think that what you are describing is a notion of joins. Yeah, joins or induced relationships. And PHPCR has joins. Okay, how does it emulate that then for an OSQL database? That's the problem with the no SQL database. So it doesn't fully abstract that then? Well, it does, but you don't have to concern yourself with it because you're using the API. So how they implement that is then the job of whoever implements that specific implementation. And yeah, it's possible to do it and there might be some work to do and again it might mean that some PHPCR implementations are better at answering specific questions than others but the key thing is that you can choose which one is the right one for your use case and that is something that is currently not possible. Okay, thank you. Hi, I'm curious about the workspaces and I think one of your slides compared workspaces to Git or SVN branches and the way I've been thinking about how content editors have to deal with content currently in Drupal is similar to Dreamweaver check in, check out which I had used for managing code years and years ago because when you edit a node, you go in and edit and if someone else goes and edit those two forms both can't submit. Is there an implementation currently that uses that Git or SVN concept that you mentioned to make that part of content editing easier? Okay, so workspaces are, the idea with workspaces basically gives you a separate data tree that you can use and you can merge data from one workspace to another and so you could have one for draft and one for production, staging, development and sort of, so you can move data between these things. It's, in a way, so yeah, it is similar to Git or subversion in that sense but really just be able to merge data from one workspace to another. There's another, like the versioning concept there's actually two different types of versioning there. There's the simple versioning which is just a linear version history and there's full versioning which actually allows you to have branches and tag versions but I wouldn't really use that for modeling like a draft workflow. It's probably not the right tool to use for that because only the latest version or the top version would be indexed for this full text search API and so it would kind of be weird like when you were looking at the draft data, you couldn't, it would sort of, yeah, it wouldn't really play well with the full text search but if you have separate workspaces then for these they have separate full text search indexes and then work just fine. Okay and what about branching for multiple editors, say a newspaper site that is preparing for two possible outcomes of an election? Would it be appropriate to use one workspace for one potential outcome and another workspace for another potential outcome? That would probably not be practical. So in theory it's up to the implementation how it implements the workspaces. However, it's very hard, like most implementations of JCR and I think that's also going to be our reality will actually do when you create a workspace you can clone an existing workspace but it actually will copy the entire data. In theory you could have copy on write but so it wouldn't really be practical because if you have like 10 gigabytes of data and you clone that just to handle one election switch that wouldn't work well but for that use case, yeah, it might make sense to then have a draft workspace and then to use full versioning to have two different branches and then when the election is done you basically say, okay, now I'm going to put this version at the top and you say this is actually the current active version then. Okay, thank you. Yeah, so I guess that's a way to rephrase that is that as far as the API is concerned workspace is the right object to do that but it's not magic. So right now it doesn't, most of the implementation of that are not going to be really practical for any of those use cases. They even believe that the specification doesn't even describe merging that data between workspace it does. Okay, I'm mistaken then. Okay, next. Okay, first of all I'd like to mention I really like the proposal it sounds like a really good plan in particular for aiming to enable it for Drupal 8 Contrib which really sounds reasonable in regards to storage. I think unifying entity-band field storage is really something that needs to happen and it has some consequences for like we won't have any mixed entity storage more. So we basically have to live with the assumption that we have a single storage back and per entity type and I really think this is a sane assumption to make and we really should go that way. And it also partly plays into some discussions we recently had in the entity API boss where we were thinking about unifying fields and properties on the entity API level such that the entity API only has a notion of properties which then the field API basically builds up to implement the fields. So how do you think about the relation between properties fields and all that in regards to storage? So just to reply to the first question. Yes, the assumption that we are making here is that the storage is going to be per entity type not per field anymore. That said, I don't know if there is really a use case of having a storage per field. So I think it's not a practical, a really practical that doesn't have any practical consequence. On the other hand, having the ability to completely store entities in a different storage including a remote storage has a lot of practical implications and that's really desirable. So yeah, that's a compromise we are going to need to make. For the second question, I don't believe, I believe there is a lot of things to do in terms of data model. And yes, unifying everything that removing the difference of context, of concept between properties of the base table and fields is desirable long term. No doubt about that. And I don't believe that this cleanup is related to any of that. In a way, we can do that. Those are two different, two independent discussions to have. But yes, I think it would be desirable to have only one concept that is a field. I mean, that is what we currently call the field, meaning it has a way to store it. It has formaters, it has widgets to edit it, et cetera. And if we have that under a single API, it would make the life of everyone way easier. There are a couple of things to consider around data consistency and the ability to have unique keys on fields and stuff like that. That is currently not possible, but that we would need to talk about during that migration. But I think overall, it's desirable. There are other stuff that could be desirable. We could remove completely the notion of fields, for example. We could make everything entities in a way and have a way to say that this entity has actually a parent to this other entity, which means that the nesting as we do for fields right now could be done by nesting entities. That's also something we could consider. There are many fun stuff that we could do. The idea is that there are stuff that are realistic to do, before Drupal 8 is released, and there are stuff that we are going to dream about for generations. So I would like to have something really real in core about all that so that we can build fancy stuff in country later on. Just had a question regarding the database backends. And have you guys considered having multiple simultaneous backends, so an SQL one and a MongoDB one, and having a denormalizer sit maybe at the entity controller level and so that you can have really, really fast queries and also have all the complexity of the joins that you need. So just wanted to speak a little bit to that. So that's exactly the vision. The way I see it, you are going to pick a storage per entity type. And the storage can be as specific to your use case as you want. So you could have very specific storage that is denormalizing the data in a specific way that is adapted to your use case. The idea is that there is no generic way of storing the data that is going to work for every type of query. So we should not talk about, we should not care about that at all. The idea is that we are unifying the storage into a consistent API so that you can do whatever type of storage is the correct one for your use case. Okay, thank you. Maybe I can quickly elaborate that. So really the vision is that right now we have the Jackalope which has three implementation or three different storage systems integrated. So Jackrabbit which is a Java implementation of JCR which is clustering and all that is really fancy and really, I mean, it scales awesomely to 100 gigabytes of data. We have the database abstraction layer one to store into relational database. And there is very early work for a MongoDB one that can be one for couch to be. There can be a hybrid one that uses MongoDB for some stuff, relational for others. And then there's the Midgard one which is implemented and seen also uses SQL database. Now the great really the thing is that we wanna see, I wanna see a bazillion implementations for every specific use case. And when you have, you know, you look at performance issues in your installation or you're looking at the setup, you look at the things that you can offer like obviously a shared host user will not be able to use Jackrabbit obviously. So they have some constraints and features that they have available and they have some requirements in terms of performance. And you will find, you will look for the one that most closely matches that. And if you then still need to do more work, these are open source implementations. So you can still fork them and change them but all the modules on top keep working, right? And that's, and right now like a lot of modules, they're doing optimizations by running SQL queries, right? That's what they do. And that basically kills the entire idea of allowing people to do optimizations on a lower level. Like you cannot just, like when you start like changing your data model around to fix data, then, you know, modules break. And so you don't do that or you rewrite the modules. And when we have those clean separations, all of this is now possible. You have like a huge new dimension of optimization potential. On the topic of unifying properties and fields, does that mean changing sticky and front page to fields that could perhaps be disabled for a given content type? Yeah. Great. Right now in the conceptual data model of Drupal, there is absolutely no point in those properties we store in the base table. I agree. Conceptually, there is absolutely no point. Sure. So if we can unify that, we should. And it's going to be slightly hard. And perhaps a little fuzzier, what about URL aliases and menu links? With the concept of revisions, there could be a case where you have a published version of a node and a version that you want to be published at some time in the future, which could result in the URL alias changing. Right now, there isn't a clean way to do that because the relationship is between the URL alias and the node and not the revision. So should things like URL aliases menu links have a storage that relates to the revision, basically a field-based storage, in addition to the storage in the actual URL alias or path alias table that does the work of aliasing separate from the storage of the data? So there are two different questions, I think. The first one is where should those things be stored in the first place? Currently, they are stored in the database and they don't map really well to either the configuration storage or the entity storage. So those are part of the fuzzy thing. We don't really know where to put. There is actually an issue about moving menu links to entities, which would make a lot of sense. And the other question is about where we can relate those to something else. And yeah, in many ways, storing the URL alias of a node inside the node, so as a field, would make a ton of sense, that how do you query across that and how do you make sure that there is only one unique URL alias ever? So there are questions regarding all of that. The easiest way would probably be to make references in that case because nesting everything under a single document is kind of gross, too. So at one point, you really want different documents if the things you are manipulating is different in nature. So I don't have a precise answer for any of those that really needs to be studied, but those are really, really good questions. Especially URL aliases because as you point out, those are dependent on the content, which means that it makes absolutely no sense to store that as configuration. So I didn't explain that before, but I'm actually also the competition. So I'm the lead of the Symphony II CMF Initiative and we're trying to blow away Drupal now. But so we are also using PHPCR there and one of the concepts that we're working on is that the idea is that, one, you have your content and you organize the content in a tree structure that makes sense for the content and then second, you have any number of reference structures that reference that content that can be representations for your different websites. So for different languages, for different output formats, you just reference the content notes, but you can have, you totally separate that. And so, and this is like very minimal amount of data because really you're just building a tree of references to the actual content. So it's very easy to move that around and maintain that sort of thing and also prepare different versions of those trees to be used at will. But really the idea is there, then you can have like on your French website, you may not have all the content translated yet. So you may change the tree structure a little bit here and there, but it's possible. And same thing for your mobile website, you might have a flatter structure, leave out some pieces and that's all possible. So that's how we're using PHPCR. I don't know how that maps to what Drupal is doing today, but that could be a model that you could use in the future. Okay, thank you. So one thing I was unclear on with PHPCR in, let me know you were saying that we'd lose per field storage. I'm not entirely sure I agree that it's so worthwhile to trade off, but that's for another time. Does it still support per type storage? Like right now, you can put users in SQL and nodes in MongoDB. Is that something that you would be able to do with that? So that's independent of PHPCR. It's really a question of the Drupal plan to adopt that. And it's basically, the question you ask is on this thing. So on this thing, my vision is that storage should be per entity type. So you have possibly one different storage per entity type, including some remote storage for some type of entities. And so that we move from per field storage into per entity type storage. Okay, so you still be able to do, I've got five entity types and you store them in five different types of data stores. Yeah, you could store some of them in SQL, some of them in a read-only file, some of them query Twitter or Flickr directly, some of them are stored in MongoDB, et cetera. Okay, good. Hey, quick question on the JCR 283. Do you mean PHPCR storage can be swapped with any storage that support a JCR 283 like Afrosco is, I think it's JCR 283 compliant? So yeah, there are a bunch of different implementations of JCR. Jackrabbit is the reference implementation, there's another one called mode shape and so on and so forth. However, and that's different to CMOS. JCR and PHPCR, they define an API that you can use inside your applications, but it does not specify a transport protocol. So when we're talking to the integration that we have with Jacklobe talking to Jackrabbit, that's using an open source yet proprietary in the sense that it was developed by the Jackrabbit guys API called Davix, which works over HTTP. So Afrosco does not implement that same API. So we can't just use, we can't just put Afrosco behind Jacklobe, we would have to do a little bit of work to expose that. It's possible, but maybe CMOS can actually help there. And we can look into these sort of things. And since we kind of glossed over CMOS, I guess I'll explain a little bit about it. Like CMOS really, to me, is a specification for the protocol, whereas JCR is an API that you use inside your applications. One of the interesting things is that there's actually, there's a project called Apache Chemistry, which can basically, or was created for the JCR implementations to turn them, or it can basically take any JCR store and provide this or expose it as a CMOS repository. And we could do something similar for PHPCR as well. And then maybe based on that, we can have some generic solution to be able to call into any PHPCR implementation or any JCR implementation. So things like that can happen, but they're probably not going to be super performant. And, but there could be ways of just saying, I just wanna get some of this data in there as well. So you can call into stuff as needed. But you probably in the end, really your key data that you wanna use all the time, you probably wanna have a little bit closer together. Adobe CQ5 really successfully uses the JCR repository back end. Have you guys looked into any other big CMSs that are successfully using it to see a model and how they scale the workspaces and other issues they're running into? Cause I can for CS running into that similar issue or many similar issues cause it's the same business problem we're solving, right? Yeah. Yeah, we are. And we actually very, I spent there are a couple of Java condominium system that are built on top of Jackrabbit. So Adobe CRX is one of them. Magnolia is another one. Hippo CMS is another one. The last two are open source. CRX obviously isn't. And we're actually really closely talking to these people. And also, I mean, we've pushed actually a couple patches into Jackrabbit. We're improving Jackrabbit as part of what we're doing, improving the HTTP interface, Adobe interface. And we're also talking about advancing Jackrabbit to have faceting support and things like that, which actually Hippo CMS already provides. And so it does also enable, in theory you could, if Drupal adopts speech PCR, you could have an installation that actually has CRX talking to Jackrabbit and Drupal talking to the same Jackrabbit instance. However, and that's again, that's something that Damian already mentioned. Obviously, you can like the way that CRX might store translated content might be very different than how Drupal decides. So it might not like work perfectly and there might be some work to do in there, but in theory it's sort of possible. And of course also, and I think that touches also on your question a little bit, we can look at how they're using Jackrabbit, how they are storing translated content and learn from there. One of the resources I linked is a blog post by David Nuschler. He's the lead of the JCR specification. And also obviously works for Adobe. So he's active on CRX. And the blog post is called David's Model where he has like seven points where he talks about things that he learned in his experience working with JCR in the real world. So there's a lot of experience that Drupal could then also draw upon. Any other question? No big conceptual questions. Just how far along in development is this? Are there active PHP examples like working that you can go out and download and get an example of in running? So the Symphony 2 CMF project aims to have a first release this summer. There's already Sandbox, which actually already has inline editing that's way cooler than anything that Drupal has. And some back adding capabilities and things like that. So yeah, you can play with that. However, PHP CR really also works standalone. We have a tutorial that you can go through that's linked on phpcr.github.com. Again, the best implementation at the moment uses JackRabbit, but running JackRabbit is really easy. You download a jar, you type Java dash jar, name of jar, and JackRabbit is running. So it's easy to get going with that. But actually what we're finishing up some of the APIs at the moment, but the thing that we're most working on at the moment is the Doctrine Database Abstraction Layer Implementation. And actually just the other day, we finally managed to get our Sandbox working with the Database Abstraction Layer Implementation. So if you insist on not using Java at all, you can also play around with the Doctrine Database Abstraction Layer Implementation already. How far along is the MongoDB implementation? Very, very, very, very, very rough. And so the guy that worked on it, he did a lot last summer, and he was pretty much on par with the Database Abstraction Layer Implementation. And then since then he hasn't done much. And he told me that he wants to work on it and over spring again. Actually many of the things that we have to solve as part of the Doctrine Database Abstraction Layer are things that actually also help the MongoDB one. But yeah, right now it's not something that we're really pouring in our resources. We are more focused on the Doctrine Database Abstraction Layer 1. Okay, thank you. You already mentioned translation and I wonder whether the PHP, CR interfaces and the API already handles storing translated content anyway such that it's part of the interface? So that's actually one of these things that really surprised me. So JCR does not cover internationalization which I think is an oversight. I think it should be in there. But they don't have it in there and therefore PHP CR does not have it either. I mentioned that there's this PHP CR ODM. So like an ORM for PHP CR. And there we've integrated internationalization. So they're actually, it's a pluggable system so you can implement different strategies. We provide two strategies out of the box. One is basically, so yeah, I don't know. I'll try to explain without showing a slide. So when you're mapping out like you have with PHP CR ODM, you basically have plain PHP objects and then you can write mapping information how which fields should be stored and which type they are. And in that, when you're specifying this mapping, you can optionally also say, this is a translatable field and this is a translatable field. And then you can pick a strategy and based on the strategy, it will either store all the translated fields as a child node. So when you're storing an object, you would store it on the path X and then for every locale that you've translated this node in, you will have a child slash EN slash FR slash DE with all the fields that are translatable. And the other option is that we already implemented as well is that you can store them on the main node and then just like for example, if you have a field title, then it would be stored as title underscore on title underscore FR. Now the cool thing with the PHP CRODM, there's like a method find translation for example, so you say, here's my path name and the locale and then it will get that object populated, all the properties for that locale. Now you can manipulate that data and then say bind translation EN and then when you persist that, then you're going to persist that as English. So really it's a really cool system to work with translations. However, obviously this ODM approach really wants you to have fixed PHP classes which wouldn't work so well if you're saying like you want to give somebody an interface where you can say these are my new node type and these are the properties, you would have to generate a PHP class which is I guess a no-no in Drupaland. So maybe there could be a middle ground for still doing something similar to that but so that's basically the strategy we have at the moment for the Symphony CMF project. We have these two different strategies for storing translations and you can really easily pick the one that works best for you and you could plug in one that you prefer if you want to do something different. So in a nutshell, it's not that different than what we have right now in Drupal and it's basically you have the same program in every type of document oriented database which is that you have different ways of nesting the language and just deciding where you put the localization and there is different best practices. Solar is the same issue when you handle translation and different multilingual content in solar, et cetera. So there are stuff we could get inspired with in terms of API in the ODM project but in terms of storage, we are basically on our own. So we just need to pick our practice. Okay. So thank you very much everyone. There is a mandatory slide I have to show. Good thing. So, well, have fun.