 Five o'clock, so I might start got lots to get through my name's My name's Tarrant from SPS also here today. I've got Ari Cohen who's been a fundamental part of No, I'm gonna put your hand up Who's been a fundamental part of a lot of this work as well So if I don't get through everything he'll be available to at the end for lots of questions So a bit of a background on SPS mostly for the international Comers are We're a special broadcasting service for Australia. We're a TV and radio station predominantly We offer multicultural and multi-lingual content for Australians or not just Australians, but anyone in Australia And we are government funded about 70% of our money comes from the government and we're driven by a charter because of that So we have specific things that we need to meet for all of Australia We are also a commercial station. That's where our other 30% of money comes from SPS has a huge online presence It's relatively new. It's been around for five years the online department that is and had a lot of rapid growth in the last five years So we offer now over the five years a wide range of websites and applications You can get iPhone apps for our news service We've got on-demand TV available on pretty much all the mainstream devices set up boxes You name it We were the first to Windows 8 for a catch-up service as well in Australia and Just clicking through these are just a couple of our main websites like our cycling our world news Pretty much anything that's under the sbs.com.au banner is is our sites. We have I would say maybe 10 main websites but up to 60 other sites that are related to TV shows that we air What runs our sites at the moment? Well, it's actually not Drupal. We have a bespoke CMS written in Zend It's very hard and messy to maintain Takes a long time to make changes Especially when the network wide let's say Facebook changes how they implement their share widget if we were actually changed out and all of our websites It would take a long long time We also have no consistent look and feel over our current network So the decision was made that that sbs needed a new platform and Drupal was chosen So as I said, where's Drupal? Nowhere So our goals that we've set for us is that we want an easy to maintain and extend our sites especially from a developer point of view. We also want a Better user and editor experience our editorial experience in our current system is rubbish Even editing a basic node is a better experience than our current one Users the workflow through our websites isn't as good as it could be as well We also want to be able to repurpose our content across all of the network of our websites As I said, we've got a lot of websites predominantly news articles and these news articles can appear in many locations across our network Currently, it's virtually copying and pasting and re-uploading images. It's not ideal We also want a goal as one of our goals is new ways to explore the content and we need rich metadata to back that up Exploring the content not only from a user perspective, but from internal in the office as well editorial. They want to be able to find Content really quickly if they need to repurpose it and let's say the news desk or something we also wanted sites that multi-device compatible and Along with all this use more open standards and Most of all a system that ends up being modular and decoupled unlike our current system And again Drupal screamed all of this But I'm not really here to talk about Drupal so much and why SPS has switched to Drupal It's more about the back end behind what our Drupal sites will be running so Knowing that we have hundreds of thousands of articles We were trying to think well does this fit in one Drupal site or is it in many Drupal sites? Do we have one site that serves out all of it? Basically we said no we're not gonna have one Drupal site. It's too big of a site. It's too much risk It's not decoupled and that's that's one of our goals Drupal multi site. Yes, we it is using Drupal multi site But again raises the question of how do we share this content between all these sites? So we've ended up using Drupal multi site installs We've got a base Installation profile that runs across it a lot of our components are the same across all of our websites or our new Drupal websites Common modules editing interfaces Even a common theme that all of our new Drupal sites will be using and then each Sub or each site subsection of our sbs.com. They you domain will have a extended theme with slight variations on the design There's been a lot of work around the UX and D for the new platform We did evaluate some of the existing multi site stuff out there So I looked at things like deploy the deploy module and just how it goes around pushing around content There are a few others that can do shared domain kind of stuff Basically, none of them could do what we actually want when it comes to sharing content around the network So we had to put our thinking caps on and come up with a method that can actually work for us And we discovered that we need two parts to the solution and it's not just Drupal But we needed Drupal as the website base platform, but we also needed a common content repository so we went through the motions of finding out what is involved with a content repository and We found this great picture, which is pretty much every content repository out there And all the different standards. So you've got things like JCR. You've got CMIS You've got PHP CR, which is the PHP version of JCR. You've got things like Midgard, which are more C module based PHP. We could have used Drupal as the content repository as well And then you got all these expensive proprietary products as well There were many to choose from. So we had to come up with a list of you know How do we differentiate between them? What do we need exactly from our content repository because they all offer something slightly different So I'll just step through this list that we more or less came up with and we started looking at things of like What does our content actually look at? Look like, sorry. It's you know, how is it being stored? What kind of content do we have? We've got articles But we've also got a food portal which has things like recipes and restaurant data So that's not as straightforward then. We've got you know scoring data on some of our other sports websites We've had to really think about how we can store this and repurpose it on other websites Then we looked at things like well, how does this actually get stored behind the scenes? Does it sit in a MySQL database? Does it sit and that takes me on to the next point of is it in MySQL? Is it you know Tomcat running JCR? What kind of architecture sits behind this content repository? A lot of these solutions there'll be an application But then there'll just be this plethora of hardware and software that you've got to have behind it to actually be able to support it and it just seemed overwhelming and Really shouldn't have been that hard when we really went started looking down that path Then we had other questions of like well, what content types exactly need to be stored and how is it going to shape them? things like Alfresco Kind of you know they can store content in them admittedly they're better for documents like Word docs But they can store articles, but it's Alfresco is not really good for other kinds of content in my opinion Things like image storage a lot of these content repositories don't store things like file assets that that well That's any file storage and The next question is how do I search this repository with all my content in it? We don't just want it. We didn't want to just put it somewhere and struggle to get it out have to build our own Crazy search on top of using an off-the-shelf product So I don't end it up happening Not up to that yet. What about the communication layer? So a lot of those Other applications out there. They've got varying standards of how they communicate with each other Cmis is a commonly known one JCR has its own implementation They all use some form of XML or JSON or PHP modules, and they're all slightly different And they all vary from being very simple To very very complex the specs for some of them, you know, you could sit down and read for a long long time Especially when you start delving into version control and wanting to implement that over an API service so it was again all very very complex and Probably too complex for what we really wanted to achieve in a shorter time frame So we went down the custom path and There's many good reasons why we aren't a Java house, so we didn't choose JCR that one seemed pretty straightforward. We're a PHP house PHP CR too heavy the standards of how it stores all of its contents Very complex it uses things like MySQL and it actually ends up storing most of its content in a key pair format in the database Anyway, so that raises scalability issues and all sorts of things Our fresco is not really for a real website Midgard to had some good good ideas and theory But if we wanted to extend it it was potentially going to give us problems as well because it's more of a C application Drupal itself was a good good choice that we looked at open publish extension itself had a lot of features that we really liked But at the end of the day Drupal one out more on the front end rather than the content storage Drupal especially Drupal 7 would have not been up to speed and performance and its API services to do what we needed to do around our content repository So then we came up with a plan of exactly how we wanted our systems to interact with Each other and What we have is a whole bunch of websites sitting at the top of our stack Drupal serves each of these websites individual Drupal installs It then has an API that can interact through a messaging service or directly with the client With the content repository and there's Apache solar indexes in there to assist with searching the repository so now I'm going to go through many of these components the client API the Drupal integration the messaging service and The exact schema that we ended up using in our repository and what software we use to build our repository So our content repository itself we built out in symphony 2 of all things Even though predominantly SPS has been working with Zend we've chosen symphony 2 for the content repository and moving forwards It is well supported as we know has great coding standards. It's fast as doctrine doctrine is awesome and We have existing symphony 2 skills in-house anyway, so we're really just upskilling towards moving to Drupal 8 looking ahead So that was our base framework We then looked at things like how we're going to interact with Mongo we chose to use Mongo over MySQL Mostly because of how the data gets stored and I'll touch on that a little bit more soon But interacting with the database is always a big part of the content repository and how are you going about doing go about doing it? As I said doctrine is awesome. They also have a MongoDB. So they've got a MySQL ORM, but they've got a MongoDB ODM Which fits perfectly. We have documents. We have articles articles of documents a document could go into our repository It just all seemed to make sense at this point We did however start out without using doctrine. We were using Mongo and symphony, but we wrote our own methods of interacting with Mongo and what we found was We'll try to approach it from a schemeless point of view that You know, it's really cool. You can get a JSON data You can decode it and you can just shove it straight into Mongo and that's how easy Mongo is to do we prototype that in about 30 minutes But when it actually comes to dealing with things like validation and making sure You know your content stays good quality for many years You've got to actually still have some validation and abstraction there And if Mongo goes away in a couple of years, you know, maybe it just I don't know another product comes on the market That's even better than Mongo. We still needed the abstraction to shift away from Mongo to another system So that's one of the main reasons why we've ended up using doctrine MongoDB rather than just direct access to Mongo Why Mongo and not my SQL? It is very easy to use if anyone has played with Mongo, you'll understand what I'm on about And if you don't play with MongoDB, it's fantastic. You can run it up easily on a bit on a VM. It's very fast to use I've been shown a chart at MongoDB conference where they were comparing MySQL and Memcache My SQL is super slow Memcache is super fast And MongoDB is much closer to the Memcache end in terms of speed because it stores everything in memory And then slowly commits it to file system as it needs to it's a very memory hungry application But it is very very fast replication in MongoDB is amazing You can do it all on command line within Mongo's shell and within about five minutes You can have 10 20 slaves and they're not really slaves in Mongo They're primaries and secondaries and they can hand over to each other very quickly and very easily Replication in MySQL is a pain in the ass We have many many content types and this was another reason why we chose MongoDB and we know they look like articles We know we're going to end up with an XML or a JSON API and we really thought about how the Conversion process from whatever it ended up in in Drupal and the translation all the way through to how it gets stored in our content repository And MongoDB seemed to simplify this process because it is B Sun, which is just like Jason So it just seemed to make more sense of how we stored the data We wanted to keep things simple and that's again the emphasis around this whole repository is keeping things simple Many of our content types Have a lot of fields. We have recipe and restaurant content types that we ingest from third parties our Think our recipe content type maybe has 30 or 40 fields in our legacy system Maybe even 50 fields. It's a lot of data not just things like addresses, but opening times costs of different like entree appetizers, you know Mains desserts. There's just so much little data that we actually do need to show to the user on the front end and We weren't quite sure how we should have stored all this data. So We ended up looking towards Open standards of how we should represent this data and we took schema.org as a strong guide here Not only for the base actually show of hands of who knows about schema.org Yeah, that's it's good to see We looking on it answered a lot of questions really quickly up front for us, you know, are we gonna call our content type an article? Yes, we are we're gonna call it a recipe. Yes. We are we're not gonna call it something crazy like SPS article We're gonna try and stick with Something out of the box, but then we took it a little bit further and looked at all the fields on it and thought well They feel names already. They've been defined for us Why not call a lot of our fields from schema.org as well? It Makes the naming conventions Much much easier through all of our products and less develop a mess in the long run Having said that though fields in schema.org are limited I know on the recipe content type the ingredients field is just one massive text field. That's how Google loves it They just look at that field and they work out what all the ingredients are That's actually not how ingredients are stored in a in a content repository generally You'll probably use taxonomy terms for an ingredient or you'll have extra meta with it of quantities And you'll store them all separately in different fields So it didn't answer all the questions of field storage But it really to help a long way and how we can standardize across different applications of how we call things We also used Freebase as a source and a guide we've actually ingested a number of Taxonomy terms from Freebase to enhance our system and make it a lot richer. We've also used open Calais We've passed every single news article that SPS has ever had online through open Calais And now we have a massive vocabulary of everything that will make our content really really cool soon But we also have SPS specific content types that Even though I mentioned recipes we have some other ones as well we have an electronic program guide that has a lot of Custom fields in it that come from our TV scheduling software that runs at SPS So again schema.org wasn't the answer for everything But it's been a good guide to helping us shape out a nice strong system So there's some of the content types that SPS use across all of our network It's nice common common types So the next question that we came up or needed to answer was we've got Drupal, you know that that'll be running all of our sites. We've got Symphony 2 and MongoDB in our content repository But what's in the middle? As I mentioned before, there's a lot of standards on how you can communicate with a content repository and CMIS was a big one That was you know open and we really did look into her thinking of maybe we really should implement CMIS We didn't We looked at XML and soap and thought well, you know XML is actually got a lot of overhead with it So we actually went down the path of Jason and we'll even started on Jason LD link data But we haven't had the time to properly implement all that yet So what we've ended up with is a JSON API that is nice and lightweight. It can be easily used on mobile devices It's structural looks close to MongoDB as I mentioned before and we can easily change our Our JSON API to be JSON link data So it has a lot more schema around it gives a DTD feel that you come to expect with XML or soap APIs and that data can be used to really generate or Pretty much generate all your content types across your network from one single content repository in theory anything could be dynamic You created in the content repository and then it would just end up in Drupal Let's say you had a cron job that checked the content repository and automatically created new content content types That's the ideas that were just overflowing when it came to looking at Jason LD But we haven't implemented it yet because we're taking baby steps So our service looks pretty straightforward We use the usual get put delete post that you come to expect with the restful service We have a document path. We also have a vocabulary and a term path and we use you IDs in our content repository We use you you IDs because our content can be recreated on any Drupal site at any time It's not created in the repository. It doesn't get created in Drupal, which then pings repository for an ID It just gets created in Drupal at some point. It'll end up in the content repository And I'll come a bit more about that process soon Our Jason service is straightforward. It looks very familiar to what schema.org looks like It's just we have extra fields like you ID and Is published extra extra fields that just help out the content repository and the Drupal sites right so the next big thing was Dealing with how we push this content around our system at this point. We'd conceptualize that we can create nodes in Drupal We can through a rest API put content into our content repository But what we hadn't worked out was how do we manage Changes our end goal is to be able to have site a push content through to site be somehow We didn't quite know how so we came up with an idea for a messaging queue which Is quite straightforward really it's messaging queues have been around for a little while I like to think of this one as more of like a binary log if you think about my SQL we have a Point in our messaging queue Actually, I'll just skip through to this one go back So we have a stack of you IDs in our messaging queue site a will have a pointer to One part in the queue in site B will be another and basically they just process through This Jason feed constantly we did look at things like pub hub sub They kind of do this as well But if your site goes offline it misses notifications. We needed something that could be Decoupled and modular so in the event that site B goes offline and events keep getting added to this message queue We needed a way for that site to catch up So the messaging queue was Crucial to the design of the content repository and it has a basic concept around it of we have a uuid for the actual message We have the source uuid that it's referring to it could be a document could be a taxonomy term could be a vocabulary It could be an image be anything in the repository We represent that by source type Time stamp of when it actually that message got created not the document, but Not the documents like create date or written date publish date, but the event date and we have an action So this queue can have creates updates deletes. They pretty much translate to what the rest will service does and The only knowledge that our content repository has of all the other websites is this one field that says source site It's the only knowledge there's no knowledge in that application that you know This is the IP address of this website. This is the database. This is how I should push through content to Drupal It doesn't happen in that direction content doesn't get pushed from our symphony service to Drupal Because that's not decoupled enough It requires more maintenance if we bought another site on board We've got to go back to the repository and make sure it's all working It just added a lot more complexity and a lot more risk We wanted something that could be written at the start of this project The messaging queue and then just left and it'll work. We can add sites on as we go So the messaging queue again has a JSON API. It's just like the others. We call them events And basic it's it's read only and we just say events since you ID and give it a you ID and we get 200 events back from then and we can process We'll take the last event we'll pass it back to the API and we'll get the next 200 events if there's more to process If we don't know you a you ID potential onboarding of a brand new website Well, we had to come up with a way to just say give us events since this timestamp Relatively straightforward that one I'll come back a bit more to the messaging queue as I talk a bit further about how he integrated Drupal to the CR And the modules that we wrote for that But I'll just go sideways into taxonomy So taxonomy is a longtime strength of Drupal. It's a it's really awesome Not many content management systems have it quite down pat like Drupal does So the CR naturally required a way to store terms and vocabularies in it and What came up with this is that let's say I tag an article on site a with I don't know cheese. That's that's my tag on that article And then I at the same time tag an article on site be with cheese What we end up with is actually two terms two individual terms that have had you you IDs generated in two separate Drupal Installations because everything's detached. Let's say the queue hasn't processed yet hasn't ended up in the CR We end up with two you ID. So there's a bit of complexity in our system about synchronizing taxonomy terms between sites And they're a little different to how articles work But at the end of the day taxonomy terms are unique by their name rather than their you ID So that was an easily resolution to that one. We go site a Okay, chicken chicken instead of cheese must have been eating chicken at the time So side a chicken Will push its term to the CR CR will end up getting the term chicken and store it at some point whether it's in the next minute Five minutes two days whenever site B is online and ready to check the messaging queue Site B will pull the queue from the CR Process that list of messages see that there's a term there that says Create or see that as a message there that says create and it's got a you ID It'll go off and fetch that term and then all of a sudden our chicken has ended up in site B At that point It's going to then do the synchronization of I already have a term called chicken CR says it's this you ID and I've already got this one. So I'm going to ignore the one I've got in my Drupal and just replace the you ID with the CR CR is treated as gold in terms of you IDs even though it doesn't create them It is the one true repository for you you IDs and that was I think a an important Important point in the design of when when we're thinking these out and it came came to us not so quickly surprisingly So the next big thing around our constant repository was search at this point We had a nice little API we could push articles in articles with taxonomy as well and We're still doing things through your mock-ups not necessarily much Drupal integration at this point So wondering well, how do we how do we find this content and how will Drupal find it long-term? So mongo is okay for basic searches. It's it's got some nice free texts or regular expression text matching It's all right when it's on basic fields or array fields keyword fields But ultimately we needed something a lot more stronger. So we ended up using Apache solar on with symphony I Used Drupal solar schema XML just because it's actually a really awesome schema XML has a lot of dynamic fields already defined in it and Just saved me scanning the solar documentation and building my own also use the solarium library which is Totally different to what you'll find on the Drupal community of Apache solar Library or the search API library there their own implementations Solarium library is it's on github. It's easy to obtain. I think it's the best solar Integration library I've ever used having used both of the Drupal ones as well It's much more heavily classed and can easily swap out different HTTP connectors as well We ended up writing annotation drivers for this particular library for symphony for doctrines specifically so that in our class where we have an article being defined in symphony Just below our doctrine mongo ODM annotation We'll put the same kind of annotation for solar mapping that field in the class directly to either a dynamic field or a Non-dynamic dynamic field in solar made the mapping really really easily We knew what our content types are gonna look like and we just add like a dynamic string annotation on it and It uses that field name and by way of annotation drivers. It ends up in solar I'm really keen to contribute that back to the symphony community and Wondering if there's a way to get that into some more Drupal 8 Contrib So the CR indexing process itself actually subscribes to the event queue and again This is why the event the messaging queue or the event queue Became so important in the design is when it comes to processing batch jobs in the CR It was perfect candidate. We have a full list of creates and updates of content of all these articles and What what better to get to subscribe to it than a search indexer? It was made it very very straightforward to just process a batch list of Content to then throw it solar and all of a sudden it's indexed Another benefit to using Apache solar was that Drupal's integration with it is really great There's a couple of modules on the community called Sarnia Sarnia works with search API, which works obviously with Drupal Sarnia allows you to use a non-Drupal schema With Drupal and views and just creates all those entity relationships for you. So that's allowed us to use a non-Drupal search engine even though it's using the same schema and Easily provide the mapping through to our front end So now moving on to Drupal. We've ended up with One main Drupal 7 install on our servers. We have one install profile, which we've called SPS distribution And we have one main theme called global for all of our websites as I mentioned before our new design is is Much more standardized across our network. I have a screenshot of it at the end of the presentation for a bit of a sneak preview It's what we've ended up with is sub themes that more or less just replace some imagery and see it and minus CSS colors and things like that as We roll out more sites across the network and so what we've built into this global theme is full Responsive system as well to meet that one of those earlier goals where we wanted everything to work on a mobile or a desktop or a really massive Desktop etc We've got multi site installs happening as well. So some sites will naturally have their own modules Not many so far But if it comes to think well things like features exports definitely they can live in their own sites rather than living in the base Distribution our food site has very different structure and listings of views compared to other sites So that's just a little bit on our Drupal setup But to get Drupal to connect to the CR we needed three parts We needed a Drupal module or a couple of modules in fact we needed a client API that is a bit more standardized that can connect to the repository and We needed our CR service. So we've got our CR service at this stage and We Have a client API in process So the client API we ended up building out as a standalone API a library that you can just drop in Drupal And the reason for this was that we also have our legacy sites They're still actually need to use this and put content into our repository it also keeps things decoupled we can upgrade things at different stages and The functionality and some of that API seems to push the limits of what you would want to do in the structure Of Drupal as well. It's more like a symphony 8 structure rather than symphony 7. It uses PSRO standards Which Drupal 7 does not unless you use something like X auto load to help you out there So the API handles the conversion from Basically the JSON that comes from the content repository And it converts it into a strongly typed object that you can then hand off to Drupal and So built into this API this client API this idea of build handlers and an object translator It has some very similar concepts to what the Drupal migrate module does in itself it's Needs a bit more work I'd hoped that it would be able to connect to This symphony service and get pretty much the schema from symphony and generate a lot of these Classes automatically and then push them through to Drupal, but again, it's just time is the only thing constraining us there So at this stage we had a client API that could connect To our symphony service it does a lot of the error handling throws exceptions when there's problems communicating with the repository this is This was a point that we had to really fine-tune and write a lot of unit tests around Because our content repository repository was going to be such a core part to our new network It just has to work and we couldn't have risk of just one little thing messing up everything because all of a sudden We're going to have 60 sites that can't connect to our repository So at this point though, it's still not connecting to Drupal So we needed a bunch of Drupal modules So we ended up with about five main modules. We have a Server provider or an API wrapper it imports the library and it has some Basic methods for pushing and pulling content to the repository. We actually call them push and pull in the system We have an entity integration or a field module and I'll show some screenshots of that shortly actually in action We have a module for an event Q processor Drupal's Q is mostly good enough, but we've had to extend it to put a bit more data into it as well so it's just class extensions and Defining queues of how we want so we have a queue that can push content Q that can pull content a queue that can subscribe to content and then process things in Drupal later It's just another queue module that Has kind of been back ported from Drupal 8 We also have Modules that helps out with search API solar and Sonia We didn't write them, but we've had to do a fair few patches on some of them Because they're pretty raw. They're pretty new in the community. They need a bit more work but we've also contributed those patches back as well and There's some custom code around facets as well when it comes to things like filtering on dates when you when you use search API as Sonia It's not as easy to Map I guess content or the structure of a field into what it should render out in in views So it needs a bit more a bit more hand-holding So our server provider module implements push pull and delete methods. It's hooked up to parts of the entity API It more or less just takes the entity object with a server wrapper or entity wrapper Sorry and pushes it through to the client API. It handles extra things like more more error handling Takes the errors that come up from the client API the exceptions they get thrown and puts them into Drupal's watchdog or any other logging system we're using and Provides a nice easier way for using the entity metadata wrapper for setters and getters so the next one was creating the Field or the entity field module which is probably the coolest part about the integration It sits on every single content type. We have taxonomy and like vocabulary is inclusive images file. We use file entity as well It's just one little field that we call CR status and call it whatever you want It has the ability to choose from a drop-down of the content source So we can hypothetically have multiple content repositories. We don't but we can and Our client API gives us a bit of schema around what what content types are available or what collections We've tried to use some common terminology to what gets stored in MongoDB as well So you this this is a field on a node when you're editing it in the administration section And this mapping just appears so you can I can be on a node of our type article and In here I can say well, it's going to map through to article in the content repository So we've got tried to keep this fairly decoupled so that we can in theory map any content type to another content type in Drupal That doesn't have to adhere to strict strict mapping On the left are the fields that come from the content repository the middle column is the target fields in Drupal and Then on the right we have this Field handler interface that I was mentioned before where some fields can map directly one-to-one like the type field in Drupal Or is published. I mean that's just an integer. There's no fancy arrays around that or language multi-lingual Metadata around it for other fields. We do need handles handlers And so a lot of the fields use what we've created as a Drupal field handler, which just has all those Cases around it. Is it an array? Do I have multiple elements? Am I multiple storage or am I a single element? Do I have different languages and inside me that kind of thing? some of the other fields are Need a bit more work around them We've got some custom fields that we've created for our restaurant content type Which shows like opening times and things like that so Those that kind of data stored a lot more generic in the content repository in Drupal It's quite different because it's a custom field in Drupal So we still wanted to be able to use the awesome power of Drupal in its fields But we still wanted to keep the data tidy in the CR and not necessarily being inflicted by Drupal's structure And this was the key part to keeping that separation And so the server wrapper Basically looks at this field whenever it gets to a node that needs to be pushed through the repository and Inspects each of these and works out what it should be mapping it maps it to the strongly typed class in the client API and Then pushes it through the repository It amazingly works really well so the next big module in Drupal is our queue processor Kind of already mentioned it exposes the update create and delete queues for our entities It does use the native Drupal queue, but it extends it There's a few extra fields that we've needed in our queue for things like Debugging if we end up with a site that has maybe 20,000 items that need to be pushed through to the repository and one of them is jammed in the queue We needed a bit more insight in how to actually Fix that issue and the standard Drupal queue actually just stores all your data as blobs in a column And doesn't really give you much insight unless you load up the item and then inspect it But that's annoying if you've got a lot of items in a queue our queue processor also subscribes to the event queue So on cron job or on drush cron whatever we want to run Processes our event queue in the content repository and steps through and actually stores each item From the CR event queue in its own Drupal event queue Which it then an axon and part of this reason is again What if the CR goes down at some point or what if there's a problem? Importing the content to Drupal. We needed a reference point where it would go. Hey, this this event had an issue We're not going to run anymore, but Here's here's the one that had the problem now we can go and inspect it It's all about if we have a problem. How come we quickly resolve it again central part of our system It's important that we had these worked out So the queue manager at the moment is fairly straightforward But it could have a lot more of an admin interface around it It just has a nice straightforward you can push a queue or process a queue and it uses through calls batch API to just run through the items It has a lot of logic Tied in with a server wrapper on how it handles failed items. So when you push an article Potentially things like taxonomy terms haven't ended up in the CR yet So it has to then find out dependencies add them to the queue beforehand and then process them says some clever logic in the Processor as well And that's a good screenshot of it actually working Which we only got to this stage in the last couple of weeks in our build We've started ingesting content from our legacy site into the new Drupal build we're working on the food site at the moment and We're testing out pushing all this content through to the repository and by indexed Three or four thousand recipes in an afternoon. It was just happily pushing the items through the queue So it was a really good moment to see that happen So I'll go over the big picture of how we connect to Drupal now. I've said a lot It's probably quite confusing, but I'll just step through it. So we have a node We save the node The CR field takes a snapshot of this node in its current state and Puts it in the Drupal queue And that's all that happens at that point when the user is interacting with that node We didn't want to push it directly through to the content repository at that time because what if the repository is down? or what if there are other problems that would be blocking the user's work and The site can continue working without it ending up in the repository might not be accessible on other websites yet But at least the particular site that this content is on can still function properly So then we have a divider a time divider on Cron We push the queue in Drupal through the server wrapper Which then converts the node by referencing the CR field does the mapping and the translating and it pushes a strongly typed object that the client API likes and can validate through JSON curl to the CR service Which will then either persist it which will come back with a response That's fail or success if it fails it'll end up back in the queue and it'll either work the next time because it's solved some of its dependencies or We'll have to go and inspect why it's failing So it's all simple I'm gonna put it open to questions now Because there might be lots. There's a lot that I wanted to cover in this hour session But at the end of the day, it's only an hour so questions, please Yep, okay, so I haven't really covered that because I knew I'd run out of time Because views is integrated with our search We have a page which allows the editor to just type in whatever the hell they want they can click a button and It'll just have like the title of the article for example They can inspect it on the current site that it exists on because it's a link and they can just hit import and That doesn't get no it does get added to a queue gets added to a queue that is immediately run In in fact an inventive failure It stays in like an import queue so that we as a developer team can inspect why it wasn't working But it's the same process. It's just reversed So it goes from the CR as a get request from that slash document slash uid URL through the client API Converts it from strongly typed object to a node through the server wrapper and say and that Commits it as a node through the entity metadata wrapper. It's Drupal's pulling and Drupal's pushing so pulling is grabbing stuff out of the CR and pushing is pushing it into the CR Yes Yep, absolutely. We had a number of criteria set out to us from our editors which Gave us scenarios of they would create content on one site and want to use it on another site But they would say that but then they might also want to change that content and not necessarily have a direct clone So we came up with a few methods of we can import the content The CR field keeps track of its source so we can deal with canonical URLs quite well and not get taxed for SEO and If the editor wants they can detach it from the CR the relationship is still there So we know that it's still got a loose connection But is its own entity in that point and that point the The CR field will then actually start to push that new content into the CR as whole brand new content Back there. Thanks Do you want to speak on that Matt? Put you on the spot. So Matt McEsteins the technical director at SPS. So he's my boss Yep, good question. So the question was how do we manage updates on other sites and let editors know our CR field has a plug-in system that we've written into it and It allows us to just write any plug-in from send an email As Drupal processes that event queue It'll see that it has that you ID in its system and it goes Hey, I've already got this article as I'm processing this queue Do I need to do anything on it sends it to this plug-in system? Which will potentially send an email to the editor saying this article's been updated Do you want to change it? It's been updated at its source. Sorry, and then the editor can go Mine needs updating now or it can be automatically updated. They can tick that box as a plug-in or They might not even care at all. So we've kind of captured those as well Are you using Believe it or not, we haven't actually started working out much of our editorial workflow our workflow varies from department to department But it's fairly fairly lightweight regardless of the workflow though It won't get pushed to the CR until it's ready for publishing Or finalized or when we get to actually building that in that's probably what will happen Yeah Domain access in terms of triple Horribly messy when you go down that path in my experience Didn't feel enough decoupled decentralized and wouldn't have given us enough Flexibility around our API's of what we want to do long term Having our repository as an API just opens up so many doors of how we deal with content on mobile applications as well Yeah My question is... Why didn't we do that? Because the CR, I mean, it's kind of nice and enterprisey and very fully thought out. But there is also that option there, which means you wouldn't have to go... So it comes back to the point of being decoupled. Our CR benefits from not knowing about any other Drupal website. Our Drupal websites don't know about the other Drupal websites at all. They just know they can get content from somewhere, they can import it and they can react on it if there's an update to it. And the point of it being decoupled is that potentially our food site could, you know, maybe it'll get sold off to someone else, or maybe SBS will merge with ABC or Channel 10. Who knows, you know? But we wanted to keep it in red. Building the CR itself probably took our and myself a good three, three, four months. So that's two days, three or four months. We had a little bit of time before that where we were conceptualizing it with a greater team. In terms of the Drupal implementation, we've had some assistance there from previous snacks just as Drupal resources. And now it's all in-house. Our in-house dev team is actually around... Was it 12 people? Yeah. They're all Drupal people now. Yep. Do you have any concerns with MongoDB being falling over, losing data? Not yet. And we haven't seen any issues with the stability yet. We've got a good relationship with TenGen here in Sydney now as well. So if we do run into any of those problems, we can ask them at that point. But MongoDB, whichever version it's at, 2.2, 2.1, 2.1 point something, it's fairly stable now with all of its replication. It had some bugs in the drivers of PHP connecting to MongoDB, which caused some issues with replication. It had rapid development in the whole replication area where it took a master-slave method, I think back in version 1.6, but now it's a lot cooler with the primary and secondary and failover. And you can set up off-site backups that can be invisible nodes, but come online if they need to be. So that's great for redundancy as well. So maybe one day if we didn't want to use Drupal anymore, that would be pretty easy as well. We'll be able to build up a new site in non-Drupal. The CRQ doesn't really know much about... If you create a new site, how would the queue know that it has to be? It doesn't. It gets one text word in there. Sorry, I'm just trying to find it. Or the site which is pooling? Yeah, so every site has a unique idea of... Every site has a unique idea of its site, and it'll look in the queue of, is this my content or is this another site? So that allows us to index it per site as well so we can actually see what site it comes from without applying specific domains, URLs, et cetera. Yep, I would love to open source this stuff. It's not quite ready for that. It certainly stuff potentially is. The Drupal stuff, not quite, because we need to do a lot more testing. But anyone who wants to help contribute to this or wants to be using it themselves, definitely get into contact with us at SPS. Very keen to contribute back to the community. Yeah, part of it. So all part of it, yeah. We'd want to contribute that back. Yep. Yep. Yep, yep, yep. It's just like a standard Drupal install. They have their own content types. Just build it out like a normal Drupal site. And then through that field mapping, that's where the connection happens of where content gets mapped through to Drupal. Yep. Search on the front end is GSA. We have a Google search appliance that calls our network for the majority of our network. But on things like our views, views are backed by Solar. And it's the same Solar index that the content repository is using. So the Drupal is using Search API with the Sarnia plugin to read the index. It never does any writing. The CR itself does the writing to the search index. Yep. Well, not just choosing for an editor, but even displaying, let's say this is content, this potential content could come straight from the search index and be represented on the page. Yeah. Anything you'd normally get with a Solar implementation is what we can do now. Images are stored in the repository. We built an image API path as well. It goes in as a JSON object. We use file entity in Drupal. And on push, it actually base 64 and codes the image, which then gets decoded. And that was the easiest way to push the metadata around the image and the image itself. So if any of you went to the services thing this morning next door and they were talking about how are we going to get services working with Drupal 8, I think that's the solution, is just to use something like base 64 to push and put and get your items out. We still actually have a natural path where you can browse the image as well, not just pull it out through JSON that's encoded files. We haven't built files in yet, but it'll be the same code. It's the exact same code in our repository. Anything that's attached to a node of a file entity can be pushed through to the CR. We do have CDNs at SPSCR. Any more questions? Awesome, that's six o'clock. Thanks for coming.