 So hello everybody. Thank you for coming. My name is Kelly Lucas. I am a lead digital architect at Pegasystems. And the topic of today's presentation is the promises and pitfalls of Drupal's cache system. A little background about me. I've been working with Drupal since version six. This is my ninth DrupalCon, which kind of bonkers, but my first time ever speaking. So keep that in mind as we go forward. So somewhat ironically, the only DrupalCon of recent years that I have missed was DrupalCon Los Angeles, which I attended remotely by watching sessions online. The conference organizers probably don't want me to mention that you can actually do that. But it's a great way to catch up on the conference later. And I got excited about the Drupal8 cache system after watching presentations from Wim Lears and Fabian at DrupalCon LA because at the time I was working on a client project that was an internet website where the bulk of the traffic was authenticated. And at the time in Drupal7 authenticated users couldn't get any benefit from Drupal's cache system. For the last two years, for me it was two years. For some people it was only six months. We've been rebuilding our sites at PEGA in Drupal8 from Drupal7. So among the great things that Drupal8 added like twig support, the config management system, CK editor, to me the Drupal8 cache system was one of the biggest value propositions and one of the biggest reasons to move to Drupal8. There's no other CMS open source or otherwise that provides such granular content caching and related dependency management and thus such support for automated CDN content delivery network in validation. So I'm going to go over the promises of the Drupal8 cache system in upcoming slides but just to set a baseline, why is caching important? It's to prevent repeated work essentially. PHP processing, database calls, calls to external services are all expensive and anytime you can save that work you reduce page load times which enhances the experience for users. Seems kind of obvious but worth reiterating. So as mentioned in Drupal7 and I apologize ahead of time some of these key charts are a little hard to read so I'll try to explain them but in Drupal7 we had the anonymous page cache which meant that anonymous users who weren't logged in, their pages would get cached. The problem was that the invalidation wasn't particularly smart. So while when you would update a node for instance the anonymous page cache for that node would get cleared but if that node appeared on other pages via views or blocks those other pages wouldn't get invalidated so you potentially have stale content on some pages. So Drupal8 added a much smarter anonymous page cache one that when you did update say a node in one place not only did its page representation get invalidated in the page cache but any other related page that happened to include that node incidentally could also be invalidated in the page cache. And that invalidation worked not just for Drupal's internal anonymous page cache but now works also for content delivery networks or reverse proxies like varnish using modules like purge. But probably the biggest additional feature of the Drupal8 cache system is that now for the first time authenticated users can get significant benefits out of the Drupal8 cache system through the dynamic page cache. So the way that the dynamic page cache works is that Drupal will generate a page minus any uncashable bits, personalized bits typically, cache that page without the personalization and then in a separate step right before sending the page to the browser Drupal will fill in the personalized bits into that cached page. And then big pipe takes that a step further where that injection of that personalized content actually happens on the browser side. So Drupal can actually send that cached page down to the browser and then in subsequent what it basically flushes tell the browser to fill in the personalized bits of the page. Drupal8 also added what's known as the render cache or component cache. This is a cache for reusable parts of a page that could get reused across multiple pages. So for instance a view that might get used on multiple pages or rendered media that gets used on multiple pages goes into that render cache which makes the generation of either that anonymous cached page or that dynamic cached page faster even when that page hasn't been built yet. So what made this possible was the Drupal cache API which added properties to all those rendered bits whether it's the anonymously cached page or the dynamically cached page or components in the render cache. They all get properties. They get cache tags which define their data dependencies. So a piece of rendered content can get tagged with node 1, 2, 3, media 4, 5, 6, users 7, 8, 9. And then when any one of those entities is updated in the system, Drupal knows to invalidate those parts of the cache. Additionally Drupal8 added cache context which tells Drupal of those for those rendered pieces of content what kind of variants are possible. Very common one is user role. So you might have a differently cached version of a dynamic page by user role. And same with the render cache. And lastly, cache max age is a property of rendered content that essentially defines the lifetime of that content in the cache. For things that you can use cache tags for typically that's marked as permanent because you rely on cache tag invalidation. But for things that you don't have a lot of insight into data dependencies, for instance coming from an external service, or if it's just time bound like the weather, max age is kind of a fallback. So when it comes to the cache ability metadata of particularly things in the render cache or components, it's important to keep in mind that that cache ability metadata bubbles up from the inner components on a page up through to the page and response itself. And so this slide just demonstrates and sort of reiterates the fact that let's say I update image media one to five. And let and it's the hero image on a blog post. And it's the thumbnail on a blog listing page and also appears on the homepage in a in a view block. Drupal knows that when I update media one to five to invalidate all those pieces of the page cash. So just to go back to that lazy built step that I mentioned in respect to the dynamic page cash. I mentioned single thought I mentioned big pipe originally with I think Intel Drupal eight dot six or maybe eight dot five single flush was default. And I actually have this graphic which it looks it comes out better on that screen over there. But you see in steps one and two big pipe essentially sends down the skeleton of the page and then sends down multiple flushes that tell the browser to fill in the personalized parts of the page. Whereas single flush all that replacement is happening on the server side and then the browser gets it in one fell swoop. The advantage of big pipe maybe somewhat obvious is that that time to first interaction for the user is significantly higher. So they can interact with the page sooner. So I think I went over the what the promises of the Drupal eight cash done pretty quickly. But I hope that I hope that that's somewhat evident. Before I get into you know the gotchas I want to emphasize that these aren't comprehensive. You may have encountered your own pitfalls and some of these may not be a problem for you at all. But just things to keep in mind as you consider working with Drupal eight or as you've already started working with Drupal eight. So at Pega one thing that we are doing is integrating with Drupal agnostic design system called bolt. Luckily it uses twig so we're able to integrate it at that level. But the way that bolt thinks about components say card or a feature band or a collection. We have to sort of translate that using some advanced theming a little bit of chicanery to map you know Drupal isms to those design components. And we use the UI patterns module along with some twig and a lot of pre processing to make that happen. I kind of like to call that server side progressive decoupling. That was a word that Dries was using for a while. He hasn't used it much lately. But I think the key takeaway that I want to emphasize is that out of the box and by design the Drupal eight cache system works best when you're controlling the display and layout of content with the block system display modes and views. In other words it's very site builder centric which makes complete sense. But that's the expectation that the Drupal eight cache system has for how you're going to use it. And so when you start slightly deviating from that for good reasons or for expedient reasons that's when you can negate the advantages of the Drupal eight cache system. So one completely legitimate way, one completely advanced theming technique is of course customizing twig. And in this case this is a very, very simple example of a theoretical entity twig override where we're picking and choosing which fields we want to output and we're wrapping them in some fancy HTML. On the left side though you see I mentioned that the cache tags in that case won't bubble and that's because that content variable that is common to a lot of twig templates is what actually contains or has sort of the pre-rendered cache ability metadata for that that twig template. And if you're picking and choosing which fields to render and you're not ultimately rendering that content variable though with that cache ability metadata won't bubble up to the other pages which means you could have some stale content display on your site. And then the trick is, this is straight from Stack Overflow, is to set a throwaway variable with the rendered version of that content field. And that causes Drupal to see that you've rendered, to see that you've rendered the content and bubbles up the cache ability metadata. So one other big question, one thing that I wasn't really aware of until I got a bit along in our Drupal Lake development process was the really tricky problem of caching lists of content, aka views. Out of the box and by default Drupal ships with a couple of cache tags, there are any type of cache tags like node underscore list or term underscore list. And if you build a view of nodes by default it gets that cache tag node underscore list. And so then when you update any node in the system, Drupal will invalidate any view with the node list cache tag. So just think about that. If I update any node of any type, any list of nodes will get invalidated. So if I update, say, a blog on my site, the list of events could potentially get invalidated. If I update a press release, the list of case studies could get invalidated. And that is not good. It ultimately, it helps defeat the purpose of the Drupal Lake cache system. Luckily, Contrib has some good modules to help you work around that. One is the handy cache tag modules. And that adds some narrower cache tags that you can use. And those narrow cache tags are things like lists by content type. So article list or blog post list or press release list, essentially. And then along with the views custom cache tag module, that adds the ability to add those custom or those handy cache tags or own custom cache tags to a view. It also does the job of removing that NAD type list, that broader cache tag from your view. So now, if I update blog three, only the blog listing page and maybe the home page will get invalidated. You can also take that a step further. Let's say, for instance, you had a recipe website and you want to, and you had a list of vegetarian recipes or maybe a couple of views where you're listing vegetarian recipes. You could write a little custom code that on node, in this case on node pre-save, will invalidate a custom cache tag based on that taxonomy term. And then again, using views custom cache tags, you can configure your views to use that cache tag. So another, in our case, one gotcha that we ran into related to unnecessary cache invalidations was related to automated processes. On one of our websites, we were pulling in content from an external feed, and we were doing that pretty blindly. We just had a list of all the external content, and we pulled it in, updated a bunch of nodes regardless of whether that content had actually changed in the source system, which meant that we were invalidating the cache of a bunch of nodes unnecessarily and relatedly, potentially invalidating a bunch of other pages. So in our case, this was a custom solution. We implemented some hashing of that source data on import so that we could actually detect when that source data changed, and then only update and save the content in case that that hash had changed. The migrate system, my migrate API actually has that mechanism built in. So if you do recurring migrations, you can configure your migrate plugin with track changes, and then when you run Dresch migrate import with that update flag, it'll only pull in the changes that have changed in the source system. So going back a bit to the idea of sort of advanced theming or going out of the guardrails, another thing that we encountered early on in the development was that we were seeing for authenticated users pretty slow page load times, and we were like, well, what's happening? The dynamic page cache is supposed to help us with that. And in this case, we narrowed it down to this user menu that was included on every page. And it's personalized. It says hello, Kelly. It has a link to my profile. And we'd actually done this the right way almost when we created the render array, and we're injecting this into the page outside of the block system I should add. That's key. We did tag it with the user context. But that was a trigger to Drupal to say, hey, that means I can't cache the output that's going into the dynamic or that would go into the dynamic page cache. And I know it's a little hard to read, but on the right side there, that's just an HTTP header that you can enable on your site. And all it's saying is that for the Drupal dynamic page cache, this page was uncacheable. So what we had to do was in constructing that render array for that dynamic content, we had to move the actual creation of that personalized content into what's called a lazy builder callback. And then we added a element. This is all documented in the render API, which I'll have links to with an element called lazy builder that points to that function. You could also potentially pass it arguments. You also tell the Drupal system to create a placeholder. And again, we had cache contexts already on that render array, but that's that's part of the formula that's needed to tell Drupal to replace that content in the lazy build step, as opposed to the first rendering step. Another gotcha that we ran into was related to large navigation menus. Does anybody here use the book module in core? Yeah, that's about three or four people out of 200. And one thing they may or may not be aware of is that the book module, the book navigation in core right now is not cacheable. Obviously, there isn't a huge lot of a lot of demand for that. So but it's good to know that Pega isn't the only company that's using book module still nowadays. But in fact, we use it for software documentation that we're importing from an external system. And some of these books are 1000s of pages long. And the book navigation then becomes huge. And so until we fix to that, we were seeing 40 to 50 second initial load time, page load time, and those pages weren't weren't cacheable. So with the help of Benji Fisher and Isovera, who actually has a really detailed blog post on this subject and really great presentation from nerd summit. He and and his colleagues at Isovera helped us solve this. First, they tuned the menu generation itself so that it would it would happen faster. Then he made it possible that we could cache that navigation once per book as opposed to once per page. One thing to keep in mind about the caching of menus in general is it's actually really tricky, because they do have to vary by page because Drupal keeps track of the active trail in on the menu. And there are complex permissions in Drupal that control like what people can see and not see and that makes the caching of menus even trickier. And that work just wasn't ever ported over to the book module. In this case, we had a purpose built solution. Because we could make compromises on some of those things like we didn't really care somebody saw a link to a page they didn't necessarily have access to wasn't really use case that that had that was relevant to us. And to make the navigation cashable by book, we moved the logic to set the active trail onto the client side. So expanding the the current page that the person's on happens sort of after the page loads. The result was that that made big pipe, that navigation big pipe eligible. So even on cold caches, at least the person got the the content of the page, even if the menu appeared on the page later. And then with cash pages, we were seeing about two to three second load times, still not great. That navigation is still huge. But we're looking at future optimizations, for instance, to upload more of that rendering to the client side. And then potentially to do what I'm calling pump priming, which means just sort of generating that navigation at the time the book gets created. So I've talked a bit. I talked a bit about big pipe. One thing just to keep in mind that there are some prerequisites. Your your hosting service, your web server needs to support what PHP calls output buffering, essentially streaming, instead of saving up all the rendered content and then sending it out at once. Similarly, if you use a content delivery network or reverse proxy like varnish that has to be configured to support streaming and output buffering, and then also in our case, we had to work through some bad Drupal behaviors implementations in custom code. In some cases, people were using developers had used it kind of as a proxy for document on ready, which it's really not. It sort of is but it's not. Because when big pipe injects that personalized content into the page, it's calling Drupal attached behaviors. And so if you have some custom behavior behaviors, you have to make sure that they're compatible. They don't think don't do things like add multiple click events to content on the page. So this next slide really just I wanted to put in here to emphasize it's more of a call to action for myself and for the community. There isn't a lot of tooling in core or contrived to measure hit rates or miss rates on any of the Drupal cache, which makes sort of analyzing where you could address problems difficult. In this case, this is just a screenshot of our Fastly dashboard. Fastly is our content delivery network. And it'd be really cool if we could get to the point where we're actually able to see sort of miss rate over time or hit rate over time for the Drupal 8 cache system. There was in D7, a module called Heisen cache, which probably is a good candidate for something to get ported to Drupal 8. So just a couple of development and debugging tips to use and keep in mind when working with a Drupal 8 cache system. Develop with caching enabled as much as possible and test as different users and roles that will allow you to hopefully catch problems in the development cycle sooner. Before, you know, your customers or your users encounter them. And in your development dot services YAML in your local environment, you can enable cache tag headers. And I know this is a little difficult to read, but that causes Drupal to output some HTTP headers in the response that will tell you whether there was a cache hit, what the cache context were that bubble up to the response. So you can see whether there's variations or there's a cache context in there that's preventing the response from being cached. It also adds a cache tags header, which will show you all the data dependencies, all the cache tags that bubbled up to the response. The one thing to be careful of that, particularly in a local development environment, is that that header can get quite large. And if your web server isn't config, if your web server is configured to limit the size of HTTP headers, you might run into a very difficult to diagnose white screen of death. But just something to keep in mind. And also, Wim Lear's has developed this render vis module, which kind of takes all that metadata that you can get out of these P headers and actually puts it on the page and then allows you to visualize it and use console commands to interrogate, okay, which components on this page are using node list, for instance, or which components on this page are using a user role of cache context. So I encourage you to check that out. It's still in alpha. It's another great candidate for contributions. So just to wrap up, the Drupal 8 cache system is a powerful feature and a key value proposition for Drupal 8. I think it's one of the biggest benefits in Drupal 8, certainly when it initially shipped. And if you sidestep guardrails, that can lead to unanticipated and suboptimal results. Though I would fully admit that what those guardrails are is a little hazy. They're more like painted lines on the road, actually, that it's fairly easy to veer over. Custom theming requires awareness and controls. Awareness on the part of your theme developers and potentially controls. So for instance, using those HTTP headers, you might be able to run monitoring to see if the cache abilities of key pages is changing between releases. And site builders and developers need to proactively manage cache tags on views. I think that's one big one that people should be aware of that's really to really get a lot of the benefits out of the Drupal 8 cache system. You have to be careful about those list cache tags. And then ultimately, this especially goes out to, you know, your clients or your business people that tuning takes work and maintenance. That's not just for the Drupal 8 cache system. That's performance in general and projects need to budget for it. So I just wanted to mention quickly about sort of what's next for the Drupal 8 cache system. JSON API is going into core with 8.7. The REST module is already in core. I've heard talk of GraphQL potentially going into core. All those things, well in particular JSON API and REST, already leveraged the Drupal 8 cache API in terms of being of cache responses and to invalidate responses in a smart fashion. So I could see how, as people adopt decoupled more and more, that they can still leverage the Drupal 8 cache API. It's not it's not just a benefit to the standard theme system or the standard Drupal render pipeline. For instance, I know Gatsby is working on the possibility of partial or incremental site builds. And one could imagine the expiration of something by cache tag triggering a partial site build in a static site generator, for instance. So here's a I have a bunch of listed resources. Of course I'll be posting the slides. I want to thank everyone for coming. I especially want to thank Wim Lears who took time out of the 8.7 release to go over my slides with me, validate and vet them for me. I want to thank the people at my NetCamp presentation who provided invaluable feedback and the folks at PEGA who have seen this presentation several times already. It looks like we're right up on the wall in terms of time, but I think if anyone has like one question, I think we can take it. Just remember Friday is contribution day. I hope people can stick around for that. And if you please go back, go to the site and add any feedback. Thanks.