 OK, seeing as it's already a couple minutes past start, I think I'll kick things off, because I want to leave some time for questions. So I recognize many faces in here, but for the people I don't recognize, I'm David Strauss. I work on a lot of performance and scalability challenges with Drupal, particularly on the Pantheon platform, where we run a whole bunch of Drupal sites. And so we get to run into all sorts of education and scalability challenges. And this is a caching model that has been born out of those. So I just want to start off with what are the challenges that we're running into today? And this is specifically for Drupal's object cache. The object cache in Drupal is the API historically into Drupal 7 and before with cache set, cache get. And as we move into the Drupal 8 era, increasingly more and more of an object model around cache assets. This is caching everything from entities to fields to various data related to users to views configurations so that the application has rapid access to the data that doesn't require going all the way to the database canonical data for everything and then rebuilding it from scratch every time it needs it. Drupal uses the cache really heavily up to hundreds of times per page reading out of the cache. And on average, on a page, often 10 to even 40 cache writes. So it's a very busy part of Drupal core. And historically, people have solved this problem in terms of scaling it, getting it decoupled from the database, and its scalability issues by using things like Memcache and Redis. But these have their own issues. This is the traditional answer for I want to scale up my Drupal site. I have multiple web servers. I want to implement something like Redis or Memcache. I use a module for that. Configure Drupal's caches to use it. This is mostly actually pretty good. This has gotten us really far because this unlocked us from all of these reads and writes happening on the database, which the database treats data very, very seriously. It treats it in a way where when it says it's written something, it's really written it, like all the way to disk. It's tried to sync out that data. This means that the database is very reliable, but it's not very performant for data that we don't care very much about or we can regenerate. But the solution doesn't take us all that far because this creates a bottleneck. The bottleneck here is simply that the network link going into the caching box is just going to be capped out. And we see this regularly on Pantheon where even when we provided dedicated Redis instances to servers, we sometimes see it max out the network link. We're talking gigabits of traffic being read out of these caches. Here on the left, these are in terms of megabytes per second. So if you multiply that by eight, you get some idea of the network links that are getting saturated. Redis isn't the only way to do this, though. You can use other approaches to try and scale out this problem. On Awkwia, for example, they use Memcache, which you can deploy multiple servers for. Then you're distributing some of the cache reads. But these all have different issues. Like in the Redis world, if you start using multiple boxes and replicating, you have your own replication topology, possibly a multi-master setup. And anyone in this room who's dealt with that for MySQL, just multiply that problem over again. You have your own latency issues for that. The connectivity between those boxes may get severed. You could also go with the Memcache approach, where the boxes don't talk to each other. It shards out the cache. But then as boxes appear or disappear, which is what gives you the HA for it, you end up with consistency issues. Because even with modern consistent hashing with Memcache, it is still now choosing a new box to put a cache item on or taking a cache item away from that box, as boxes appear to exist or not exist, which the web servers may not even agree on, which boxes exist. So both of them have their issues for scaling, whether it's administrative complexity or the actual consistency that they provide. So I've never really found these to be particularly satisfying answers. And ultimately, these are all still network bound in the sense that all these cache items, every time that you read one, is going over the network. So if you're saturating those links, you kind of are dividing the problem, like you're increasing the denominator of how much network throughput is going to each box. But you're not really changing the nature of the fact that it's all shipping over the network. So I've looked at this problem. And I wanted to find all the solutions that I could aggregate from other implementations, not necessarily using something that's actually implemented elsewhere directly, but trying to crib from the solutions that I've seen work. One of the things that I've learned about, ever since undergrad, is processor architecture. And this is not an uncommon problem when you're designing a multi-core processor, where every time you add another core to the processor, you're adding a whole bunch of local computational capacity. But one of the biggest problems in scaling processors is that computational capacity isn't very worthwhile if it can't actually work on the data set. So you have something known as a working set. And when you're designing a processor, for the processor core to continue to be pumped full of data and to be able to really execute through things as fast as it really can in the actual core of it, it needs the data to be local to it. You really want to bring that working set as close to the computation as possible. So modern multi-core processors use what are called L1, L2, L3, you name it, how many levels of cache is. This one here shows it where it's a local L1 and a shared L2. A lot of multi-core processors have a local L2 as well and then a shared L3. But ultimately, the design is the same in the sense that there is some data that is cached locally and not shared and there is some data that's cached in a way that is shared. And the data that's close to the core is very fast to access but is not distributed in the same way where updating data in the local cache can require complex coherency management so that it's available on other cores if you have a say multi-threaded or multi-process application. Because even though it's doing this, it has to preserve the illusion of all these caches being one main memory. Like when you're programming on these processors, you're just addressing the memory as if it's just the whole set of RAM and then the processors themselves are actually juggling this data down into these different caches, even this local L1. And then if you write data to the L1 and another core tries to read data from the same area of memory, the processor still has to make it appear to actually be consistent. You pay a little performance penalty when that happens but it still works. And so processors have developed these coherency management algorithms where they mark different regions of memory as owned by a certain processor core or they lock it in a certain way or they write through in a way where if the writes become more expensive so that if core zero wants to write to its L1 for a particular thing, it actually ends up writing to the other ones as well. So there's no one way this is done but the message from here is that bring the data close to the computation but preserve the illusion of consistency so that the developer doesn't have to care about all of this juggling around. I also pulled from what we've done for the file system on Pantheon which is a system that we internally call Vahala. And one of the big aspects that this system does is it has that same kind of problem of trying to pull the data local to the web server but still manage coherency across the nodes. And the real lesson from here is that we didn't just do a traditional file system where you have the writes and then you write the data and you invalidate. We actually implemented a cache coherency algorithm where as you write data to the file system it actually sends those events back down to all the web servers so that the write propagates and eventually ends up consistently in all the caches. And so what we have here is another model a little more relaxed than a multi-core processor in terms of consistency but ultimately achieving that goal of bring the data close to the computation but still maintain coherency. I'm gonna put the stack online just in case you wanna take it. But the, so that's another lesson. We've had a lot of experience developing this because we've been using it in production for years so we know some idea of what scales in terms of coherency management and what doesn't. The another lesson that I pulled from is what MySQL does for its modern replication model. Traditionally MySQL has replicated to multiple servers using what's called statement-based replication where what it does is you write some SQL to the master server and then for the replicas it just sends that same SQL out to the replica. This is actually not really the favored model anymore for MySQL replication. The modern method is either a hybrid of that with the new way or just purely the new way which is actually what we use at Pantheon which is when you send the SQL to the primary server it runs the query, modifies the data and then figures out what rows am I changing? And then it computes that into a change set that is very concrete in terms of saying this primary key has these columns updated to this and this primary key has these columns updated to this. It always looks exactly the same way. You can even see a sample of output from taking a bin log from this style of replication and it translating it into pseudo SQL basically saying for rows that match this set these values to this and that's what they all look like when they replicate. The lesson here is that you can have complex models for altering the state of a system and replicate it out by materializing that so that the replication model itself doesn't have to be that complicated. This is actually what we do here as well where we have a very simple way that the events actually go back down to the client even though there are many ways you can manipulate the data on the server itself. So all these have been kind of pulled into the model for Elcache. So it's inspired by multi-core processors to get the working set close to the actual work being done. It's inspired by the file system we have in the sense that we do a write through cache where you update your local data and update the remote server and then have events that replicate back down. And it's inspired by the MySQL row replication of handle all the complex stuff on the server and then just send a digested simple set of events back down to all the clients to simplify the coherency management. So that's a lot of theory to dump, but I wanted to take known good designs and pull it into this. And I wanted to also contrast with what has landed in Drupal so far in terms of change fast backend. They're actually quite different in their design even though they both had the goal of bringing data close to the client. One of the biggest differences is how it handles incremental changes to the cache where a chained fast backend has a very blunt approach to changes where you write one item, the whole bin gets invalidated. That works fine if you almost never write to a bin, but it's very hard in practice to guarantee those kinds of conditions. And it certainly doesn't make for a general purpose cache that we can use with more of Drupal core for things that might be changing fairly often. They also have a very different philosophy of how they work with the changes where Elcache takes the changes and has these events that can alter individual items that replicate down to the local web head. Whereas chained fast backend basically has these invalidation counters that provide a simple but blunt way to handle cache changes where basically it just puts down the hammer and says something has changed, everything you know is wrong, go back to the database to figure out what is true. How many people in here have heard of chained fast backend, by the way? Okay, so this is pretty new to a lot of people. Chained fast backend is in Drupal 8 core at this point and it's a cache you can use where you have a local storage which is often APCU which is an in-memory storage that is stored in a way that persists across PHP requests but doesn't actually replicate beyond your say PHP FPM pool. So if you have multiple web heads it's going to be fine for that local web head but it's not actually automatically shared in any way across a cluster. And then chained fast backend handles that by then back into the database to manage the coherency of that data where if you write data to a bin it records that in the database, it records the cache writing you've written and then it just that every request checks against the database to see if a bin has changed and if it has then it treats everything it knows about that bin is wrong and then goes back to the origin. Elcache in contrast uses an event model for that where it still uses APCU and a database but instead of it basically having a thing where any write becomes a bin invalidation it actually keeps an event stream of these things that have changed where every time you write a cache item or delete a cache item or invalidate one or do anything with a tag or a bin it records a set of events on the server side in the database that the client then pulls back down to freshen its local cache. So it takes this approach and this is something that we do with Fahala is it it freshens the local cache it doesn't just invalidate it. And that means that the amount of network communication that occurs is proportional to the changes happening to the underlying data set not proportional to the size of the underlying data set. And that means it works great even if you have a trickle of changes coming in even if you have a moderate number of changes coming in it works great. About the only time it degrades as a model is if you are changing things more than you really should for a cache. So the model that we implemented for this didn't solve everything out of the box of course. Like we've done a lot of production testing against this in terms of taking real world Drupal 7, Drupal 8 and actually WordPress sites and putting this cache into them and checking our assumptions for with load tests and click through tests where we were just hopping around the interface checking that things are working okay. And that revealed a lot of misfound assumptions about how Drupal works with its cache even for someone who's worked with Drupal for basically 10 years like I have. So Drupal writes to caches very, very often. We often saw cases where it would on average write 10 to 40 times per page to caches like doing calling a cache set operation. Of course this is not evenly distributed among bins but this confirms our assumption that the kind of change fast backend is not going to be a great general purpose cache. Elcache's initial set model for processing a cache write turned out to not be ideal that way. I had assumed going into this that I could make writes really expensive and it would be okay as long as read scale that really, really well. But that's a pretty common assumption with caches but it ended up not being true. And I'm gonna show some benchmarks of different caching models we found in the database for actually storing the cache data. We also found out that most modules assume that missing a cache item is a good reason to push the eventual thing it constructs into the cache. That's actually not always a good assumption that modules make because a lot of times we see that happen where it writes the item and then it never gets read again. And a lot of modules are overusing the cache in that sense where they're assuming that their path of code is a common case when it's not always. And also some cache items even worse than that are actually set more often than they actually get them which means that there are sets for those cache items that have literally never been read before it actually sets it again. We also initially implemented this where we used cache tags for bins where there's kind of in the relational model it has like a tag table that basically tracks what tags are on what cache items and then we initially used this for implementing bins. But we found that that didn't work either because clearing full bins in Drupal turned out to be quite common and we needed bin clearing to be really cheap. So those were things that caused us to modify the design. But this was one of the most surprising things. In the database itself for processing writes an insert focused model was actually the serious winner here. By default, Drupal with its cache uses a more update or insert model where it tries to insert the item or update the item and then falls back to the other one if it needs to which is basically a at best that is the equivalent of on duplicate key update which you can see on the right-hand side of both of these graphs. And the original model for Elcache was the idea of we'll insert the new event for what's changed with the cache item and then delete the obsolete ones right after inserting it. So if I said I've set cache key A to number three I would basically delete any prior events that were involving cache key A because the latest thing that I've done to cache key A supersedes every previous thing that's ever been done to that cache key. So what we ultimately ended up moving to is a model where we would insert the new event of what's changed for say cache key A record that we had changed cache key A and then do a batch to delete in the destructor of the caching system which caused it to happen not only in a way that aggregated the deletions of the obsolete events but also ran after it would close the request in PHP FPM for the user. So the user doesn't actually wait on this to happen either. The, I tested this in two different very different scenarios. Low-splay and high-splay. Low-splay was a case where I had 64 possible cache keys and the PHP would each write randomly choose one of the 64 for its 40 writes and then it would write those and then it would exit each of the 10 processes and I ran this quite a few times the results were very similar no matter what combination it seemed to choose because it was sufficiently random. And then there's also a case where I chose high-splay which of the 40 writes it randomly chose among over 4,000 cache keys which basically meant that it's very unlikely to stomp on an existing cache right. So this would be the left-hand side as a case where you have a cache item that is frequently being updated and the right-hand side is a case where you have a whole bunch of cache items that are being basically written once. And so it's not a huge surprise that in the high-splay one we didn't see much variation between the different models because on duplicate key update insert and batch delete and insert and delete almost all of them just inserted in effect at the cache write time because there wasn't much overlap that you were very unlikely to do a write to a cache key that had already been written to. But that's not actually the case for a lot of cache keys in Drupal. A lot of them are like the left-hand side where you actually have a cache key that is continually being rewritten and updated. And it turned out that our standard approach of basically on duplicate key update actually a little worse than that is not actually that great. So one thing we discovered is we could also just optimize the data model for our cache storage just in the database. The other thing that we did to confront the case of cache keys that were constantly written, but not actually read. By the way, these particular cache keys don't treat them as like actual data about the C tools thing because this is just an old snapshot of it. But this was basically a tracking system that I implemented in here to track the concept of how many times the cache key being written versus read, which would normally be way too expensive to do in the database. But if you do it in the local APCU, you end up actually being able to track that pretty cheaply with just counters. So each of these snapshots would only be for the data from one web head. But if web heads are getting all random requests, then it's a good sampling of your data. So one of the things that we added to Elcache is the concept of it tracking this data and then eventually coming to the conclusion for some cache keys that they are, that they're too expensive, that they're basically being written more than they're being read at a very high ratio and they're worth ignoring. So at a certain threshold, it decides enough of this cache key, I'm going to delete all existing copies of this cache key and then basically put a moratorium at least temporarily on further rights to it. So it will black hole the rights to the cache key and the reads to it will miss, but it already knows that the reads to it are vastly outweighed by the rights to it for those keys. So in practice, this identifies a, this identifies a decent number of keys on a typical production Drupal site that are actually just getting written more than they're being read and then it just kind of pushes them aside, which is very nice when you have an expensive right path, like writing it to the database and then replicating out to the web nodes. Pardon? Oh. I said I have so many questions. Okay. Yeah, there's a lot of material. So I want to leave a lot of time for questions because we also have a lot of like core maintainers and stuff in the room. So even with all of these optimizations that we did for Elcash, there are certain cases that works better for. It's really, really good for things that are frequently read. I'm talking about these like one megabyte used cash objects that are constantly being pulled down from something like Redis or Memcash in a lot of people's setups today. These sorts of items get replicated to APCU. They get read from local PHP memory. It doesn't even have a network round trip to read these cash items. So it completely gets rid of the idea of network saturation as a bottleneck when you have really heavy traffic that is reading cash items on a website. And that's why it's also really good for items that are rarely written or large because you're basically multiplying your benefit in terms of not having those items ship over the network. What there are cases though that don't make a lot of sense for it. Like at least in say the Drupal 7 world, things like the Formcash don't make a lot of sense to put on it because those items pretty much get written once and then read once in a typical scenario. And actually a lot of them on a real production site get written once and then never read because you're displaying a form to a user that they never submit. Fortunately in Drupal 8 that is not treated as a cache anymore because it isn't. It actually breaks your site if you clear it. The thing is handleable earlier in the stack like the page cache don't make sense to put into this sort of thing. It'll just clog up your cache data. I think almost anyone here who's running major production sites is probably already pushing cache data for things like pages to something like varnish or CDN anyway. In the case of using a cache like this, you probably would want to entirely turn off Drupal's internal page cache, have it black hole that. And you can even do further optimizations with the Drupal at that point, like telling it not to expect that it has a page cache that requires a database connection for example, because if it uses a null cache, it doesn't. We also found that a lot of keys that update often just cause a lot of overhead for replication and clearing a lot of keys at a time with something like a tag also puts a lot of burden on replication. In this new Drupal 8 era of being able to tag every hundreds of items with something like the node list tag and then clearing that can be a little expensive in a system like this, but I'm working on that as well. We've taken a very serious approach to the implementation of this. Every, literally every single line of L cache is unit tested as a library. It is tested in both against mock and production configurations for both the L1 and the L2 caches, all of which ship with it. You'll notice here that the structure of it is that there is an APCU implementation of L1, which is local to the web head. There is a database implementation of the L2, which is used for coherency across the cluster. Those are the primary production configurations for it. And then also there is a null L1 in here, which it actually uses when you invoke it in a sort of a CLI configuration where there's not a useful APCU. And what that does is it just bypasses the L1 and just uses the L2. I have reason to believe this is still faster than using Drupal's cache because the L2 with its batch deletes and insert always model is actually faster still in terms of the data model on the server side than Drupal's built-in cache. So there's some interesting opportunity to explore this even without the need to replicate to local web nodes. And then mostly for testing purposes, there is a static L1 and a static L2, which literally just use static variables in PHP. They're mostly used for testing to make sure that the data models work out and that you can mix and match so that it can test say APCU L1 against static L2 or a static L1 against the database L2. So it basically tries almost every permutation in the test suite of combining production and mock implementations with each other and they should all work and they do. So that's good. The, actually the model that it uses in terms of the data model is fairly similar to PSR6, the author of which is in this room, the Larry Garfield, in the sense that it basically has an object that gets returned as an entry from the cache and the cache itself is its own object that is something you interact with. It doesn't quite use PSR6 though for reasons I'll explain in a moment. It's also a composer based library. So the Drupal 7 and Drupal 8 modules and the WordPress plugin we wrote all pull in this composer library which provides a high level cache interface supporting everything necessary for each of those to implement the framework local versions of their caches to bridge it over. And we went with these lightweight adapters for each of these frameworks. The Drupal 1 has zero data that it actually tracks. It only is a wrapper around the Elcache library. And then we've published modules and extensions for Drupal 7 and Drupal 8. I have ambitions for getting this into core, possibly as a default cache because it can fall back to not using any local data at all. It's still faster and we've gotten amazing results. So I've kind of saved the best for nearly last. So this is a major production site running on Pantheon and what we did, this is from a load test. I'll also go to the production data. You'll see before here, what we did is we flushed the entire cache. We warmed up Redis, which you can see Redis cold there. Josh Koenig ran these benchmarks. And you can see Redis cold on the far left. And then on the left middle, you can see Redis warm. You can see that with Redis, the cold things don't matter that much. Those are pretty initial cases and that doesn't reflect the common case. You really only wanna care about the cold case from the perspective of it shouldn't be terrible or infrastructure breaking. But here you can see on the right hand side of the Redis case, it's averaging about 300 milliseconds, a little under 300 milliseconds per request. And on the right hand side, we have Elcache. There's a little bit of a spike here from a web external thing, probably something that gets missed in the cache and then inserted into it. That's specific to that particular site. But you can see it warms up and then it's actually hovering at just above 200 milliseconds once Elcache is warm. And this is because it's no longer even making the network trips to Redis to fetch its cache objects. It's just talking to the database at the beginning of the request to synchronize its local cache. And then you'll also notice that this dark yellow color, if you can kind of see the tiny text at the bottom is Redis and you can see that just disappears basically as time spent in the request. Any of the time it was spending on the network, waiting for Redis, fetching items from Redis, writing to Redis, et cetera. And you can also see that the database time has not gone up that much from before when it was heavily relying on Redis, even though it's using the database for all of its cache synchronization. Because it only makes trips to the database to handle its writes and a very, very quick synchronization at the beginning of requests. Which if there are no cache items to replicate, the select returns zero rows from a cached query that takes like a millisecond. The concurrency also went far up because these were actually time boxed load tests, not concurrency set load tests. And here we see that we were only able to manage 225 concurrence once Redis was warm. And we easily made it up to over 350 concurrence once L cache was warm. And that's because it scales better horizontally. Since the cache items are being stored on the local node, it's able to handle a lot more traffic and data on each web head without having to be bottlenecked by any central cache. We went live late last night with the same site. You can see when L cache got enabled for the site by when the light, the kind of tan color is Redis. When that goes away, that's when L cache got deployed. And you can see that it had more stable and faster performance by not having to make those network round trips for accessing cache objects. And it didn't even really hit the database very hard. This thing here is the last 24 hours from the site of New Relic. This is from the actual host machine that is running the database for this website. And you can see that it's not really even possible to distinguish in here when L cache got deployed, even though it started relying on it for managing cache coherence. But this obviously isn't the end. It's still in the early stages of production use. I've gotten some pretty awesome suggestions. And some of them are things like trying to use MySQL I with the asynchronous mode to fetch all these events to synchronize down. So at the very beginning of the request, basically say give me all of the events that have happened since the last time I looked. And then letting the query run and come back. And then when you actually try to access the cache, then actually block on processing those cache events. And then basically that means that assuming the query has come back and shipped its data before you actually do your first cache read, which it's probably a little bit of PHP before that really happens. You probably already have the query data local to the machine ready for you to read into the local cache. And the upside of that is you don't have any synchronous weight on obtaining events. Because if we have a lot of events to synchronize, then it'll take a little while before say Drupal can actually get into bootstrapping. But it would require yet another database connection because everything for L cache is written with PDO right now. And it uses its own connection, which as my understanding is that Drupal wants to move toward that direction for cache management anyway, to take it outside the bounds of the transaction layer in terms of having arbitrary items roll back in a way that people wouldn't necessarily expect. Because a lot of our cache implementations already exist outside the transaction layer in things like memcache and redis. And since so many production sites use those external caches, it probably doesn't make sense for us to assume that the cache operates in a transactional fashion. It also creates a lot of deadlocks on sites too, where they don't carefully order the locks in the database. And when you have the cache running in the transaction layer, I've seen it take down sites. Another suggestion was to synchronize again with the central cache at the end of requests in the destructor or in a shutdown function so that after the request has closed and sent its response, it takes care of any additional rights that are possible to process then, thereby saving someone else's request from having to process those. And then I'm also looking at doing SQL Lite instead of APCU as an L1 cache because the new locking systems that are in SQL Lite are granular enough that it might actually be totally viable to use that as a node local cache for a web server. And then that would actually even allow the CLI to take advantage of the local cache as well, which I feel like we're probably going to need a future where we configure sites to have some local node, local persistence data, especially since now PHP 7 has the ability with its opcache to store the opcode caches on disk as well, which would then accelerate CLI as well. So we have a few opportunities here to make the command line experience with Drupal a lot better in terms of performance, both in terms of caching objects and caching opcodes. I would really like to get this in core because I think it actually functions quite well as a general purpose cache. And our existing option for handling this sort of case of pulling the data local to a web head is just not that viable for most admins because most admins actually wouldn't even know how to make the decision of whether a Ben is a good candidate for Chainfest backend because the cost is so high on a cache write for Chainfest backend that you basically have to be perfectly sure that almost no writes that Ben happened before you deploy that. And we don't really have any good tools for admins to realize when that's the case. And even I was fooled early on in this process of thinking a lot of Ben's would be more stable in terms of not receiving lots of writes than I thought. And I was also mentioning earlier how even given the benchmarks of the kind of insert only batch delete event model is actually faster than Drupal's built-in cache. I think there's some potential to use this in a way where even if APCU is not available in a robust way in terms of the size of it or whether it's even available as an extension, I still think it makes sense to use this model and then use something like a Null L1 or a SQL Lite L1 because then you still get the benefit of the database performance at the central cache even if you don't get to take the advantage of bringing the data local to the actual web head. And we're already relying on composer-based libraries for a lot of Drupal 8, so it's not that weird. I looked into doing this with PSR6 and PSR16, which are kind of from the framework interoperability group of PHP and PSR6 is a cache interface or a cache object model. And then PSR16 is more of a higher-level cache interface. And, okay, so you can follow Larry after this if you wanna learn more about these. PSR6 has been formerly ratified, 16 has not. But they, I don't really feel like they're quite in the right spot that I wanna use this as the backbone of how this gets implemented because Drupal 8 heavily relies on two concepts, cache tags and retrieving already invalidated items briefly while they're possibly being regenerated. And PSR6 doesn't yet have any kind of concept of those interfaces, even though they could get bolted on in a way, I would really like to standardize more of that before rolling it out. And also, I do like the concept of deferred persistence because there are a lot of cache writes where I could decide as a developer, I'm not going to rely on rereading this from the cache during this request. And as long as you can defer the right, you could batch them. And a batch insert is better than multiple inserts because you just have fewer round trips to the database. And 16 is sort of like almost a superset of six. But it largely seems to provide a counter interface which would be useful to WordPress, but not necessarily Drupal. So with that, I will open the floor. We have a microphone for questions if you can use it. Otherwise I can repeat them if it's too difficult. Hey, you said that the problem that you faced, one of the problems you faced is that there were a lot more writes than you expected. But would you say that the majority of the data of the network's output is still generated by reads? Absolutely. When we look at the data links, let me actually pull up the graph for that. You can see the purple line, the distance from the purple line below the red line is inbound traffic to the node, and the green line is outbound traffic. So you can see that the cache reads, which is the green area, is massively, massively outstripping the cache writes for something like Redis. Okay, so is the bottleneck on the machine's network cards or on the router? Either way, we see this for large sites with lots of containers or web heads hitting two, even three gigabits of traffic. Yeah, but only one of those will actually generate a bottleneck, either the router or the network cards on the Redis or whatever. Well, a lot of these are deployed in cloud systems anyway, so they're using virtualized network equipment. Okay, so it's in the network cards, and that leads to something I wanted to ask. For instance, the way that you said that this is solved, at least partially in MySQL, is that you generate writes to a master and you send reads to some slaves, right? Correct. What part of the solution wouldn't be solved if, let's say, we would have a separate modified Redis library, for instance, that would communicate with the master, or let's say, a proxy, and it sends writes to that proxy, but when you need to read something or establish a connection, you would ask that proxy, where do I need to read from? Yeah, you could implement somewhat a similar model by having a primary Redis instance and then replicating it onto each of the nodes and then talking to the replica. The biggest issues you'd run into is just the complexity of that setup. You're having another daemon that you're running, both in terms of the central instance and in terms of on each of those local servers. But more concerning is the consistency issue, where Redis replication is asynchronous. Drupal assumes that you get read after write consistency for caches, at least between page loads. Elcache guarantees that a write that occurred on a page load will be visible on any subsequent page load after the completion of that write, regardless of whether that page load occurs in the same web head or a different one. Whereas Redis, your replication latency could go up and down with the volume of writes, and the only way to fix that would be something where you're checking again against the primary Redis instance. Cool, but you're talking about Redis replication. Now what I'm trying to suggest is if you were using, let's say a proxy, then when you write something you would be moderately sure that the writes would happen on all the instances because you're only writing to the proxy. That gets into overhead for writes then, because now if you have a proxy that guarantees that it writes to all of the local Redis instances, your time for writing is proportional to the number of web heads you have. And then let's say one of your web heads goes offline. Then you have a complicated issue of handling the partition, where what do you do? Do you fail the write because one of your web heads has failed and you can't guarantee that you replicate to it? Do you blacklist that web head and have to have a special reintegration protocol for it? You have a lot of topology and administrative issues when you have to have writes actually reach all other systems. Yeah, but that's the thing, you're only writing to one of the places, it doesn't matter what the web heads are doing, because either way, even with Alcache when they are writing to the L2, they could possibly run into this problem, except that it also has the L1, right? Well, the L2 happens to just be the database server as well and you already have to keep that online. So, while it's relying on something that could fail, it's something we already have to manage and maintain. Yeah, so what I'm suggesting is... Do you mind if we take this offline? Yeah, absolutely. I just wanna make sure we get through other questions. Hey, Damon, Ken here. Two quick questions. Have you given different hosting platforms support different options? Have you looked at trying out Redis or Memcache as the L1 cache option? I haven't. It would be totally possible, but I'm not sure whether it would benefit because with APCU, you actually store it in process with or in local memory without any sockets to go over. However, accessing Redis or Memcache over a Unix socket is very low overhead. The only benefit really would be though is a smarter say LRU algorithms, like better handling when there's memory pressure because APCU is fairly famous even, excuse me, today, with not having the finest behavior when it's under memory pressure with its allocated cache size. But it would literally take like an hour or two to write an L1 that works that way because it's a fairly simple interface. For the L1, just because this is kind of interesting. So the L1 interface just looks like this. You have to have, it has to get a pool ID so it identifies which node originated the event so that it doesn't re-replicate events back to the kind of PHP FPM pool that originated it. And then it has to be able to manage where the high watermark is in terms of replicating events from the central system, which is what the last applied event ID and set last applied event ID is. And then it has to have a set function. It has to have a function to check whether that item was negatively cached. Like it has a concept of negative caching where basically if it verified that an item doesn't exist, it actually caches the fact that it doesn't exist. This is more of a problem on the WordPress side than Drupal, but WordPress has all sorts of configurations where it's configured by virtue of the cache item not existing. The get he overhead provides the kind of subtracted, kind of almost ratio of reads versus writes. The L1 is responsible for tracking that. And then it is responsible for having set with expiration and delete. And so like this is the only stuff you have to implement for an L1. The static L1 isn't much more complicated than that, it basically just works with a local array. Other quick question. Have you seen any pattern to coder, contrabore, core issues that led to high writes that shouldn't have been? The most common issue I see is just that assumption of because you missed with your cache access that you should write the item back. And I don't think a lot of things think about caching beyond that. And so when I've looked at the analysis of production sites with the overhead data, which is how it tracks that ratio and then eventually decides with the learning that it's done to not continue accepting writes. The majority of the ones that are not while performing cache items don't get that high of ratio of overhead. Like they're not terrible, but they're mostly like written once after one miss. And so they have an overhead of like zero in that case, which means they've never provided any benefit to the site. So a lot of them also have an overhead of one, which means they've been written without anything ever reading them, or at least an equivalent ratio. The only advice I'd have to module authors is to not necessarily assume that a cache miss makes a write worthwhile. Any idea on an ETA for the 1.0 release? Well, I'm being very conservative. I'm going with like the Google beta kind of thing of like Gmail is beta for years. I want it to be like, this is mostly because when this sort of system breaks, it can be extremely confusing because you end up with like something like inconsistent data or something. But I will say that in our load tests against Drupal 7 and WordPress implementations of this, and some Drupal 8 ones, we haven't seen a site breaking issue in weeks and weeks of testing. And mostly what we've been doing over that time before deploying it to production is just optimizing the code paths that we found were more heavily used than we expected. So in the L2 implementation that you have, you're doing insert all the new cache stuff and then both delete the old ones at the end. Yes. But in at least the default database scheme of the chips with Drupal, the cache key is the primary key. So you can't actually. It's not using that. Okay. It implements its own tables. Okay. So you are just doing like the, you like have an extra index or a counter column. Is that the idea? I will just pull it up. Okay then. So because it is, it doesn't actually use Drupal's own database abstraction layer because it runs on a separate connection. It uses the traditional schema installation method. But basically the main thing is this cache events table and it just has an auto increment event ID column. So it also, the database doesn't even have to check that the primary key is unique at insert time because it knows the primary key is unique because it's an auto increment key. It only has two other indexes. It has an expiration index for doing cleanup and it has a lookup miss index which basically allows it to find, let me blow this up. It's not nearly as visible on there. It has a lookup miss index which basically has the address of the cache item which is a packed structure of the bin and key or in the case of Drupal, bin and CID. And then the event ID to basically say in the query, give me the latest event that's affected this bin and CID. And then, and because indexes are handled as these tree structures, it's an extremely efficient query for it to find the latest event that's affected a cache item so that when it misses on the L1 and goes to the database to say, do you have anything about this key? It's able to use this index to pull it very quickly and ignore the older events related to the item. And so that's just doing a, select where order by max limit one kind of trick. Is that the idea? I'll just pull it up. So it's in the database L2 on get entry. So there's the query. And actually before even reading PSR6, it already implements the concept of the cache should never fail even if the scheme is broken. So I'm already down with that. You'll notice on here that it like catches this exception and there's a whole thing in here to like test whether the actual exception is schema related or real more fundamental. And then if it's just kind of schema related, then it throws a warning and otherwise it will re-throw the exception. So like if it's actually a syntax error with the query, then that's a different type of thing than, oh, the table's missing. So there's sort of a whitelist of certain types of exceptions PDO may throw that are considered to be acceptable to gloss over and then just kind of miss and semi silently handle. But basically it does this, the select of those values from the events table where the address is right and the expiration hasn't passed yet. It orders by the event ID and then picks the first one. Okay, cool. And then on the PSR6 front, let's talk, I've got some ideas for you. Okay, let me make sure that I'm not going over time. Okay, that is actually time. Yeah, Larry. Auditorium in 35 minutes. Okay, thank you. Thank you. Hey, very nice. Thank you. Thanks. Are you gonna be at the spring on Friday? No, I'm actually flying out tonight to get to Berlin for the system D conference. Sorry. They have another symphony? System D conference. System D, okay. For the C code I work on. Time slot today? No, I'm actually, because I need to be there to give a tutorial tomorrow in like the morning, I'm actually flying to Frankfurt in about an hour and then I'm flying tomorrow morning from Frankfurt to Berlin. Okay, nevermind it. So we'll talk online about this. Okay. I think we can solve your PSR6 issues. It was designed to handle the kind of stuff that you're talking about there and we have a lot of discussions about this stale beta question. Yeah. So let's talk online about that and see if we can make that happen. Yeah, I also, with respect to that, I think one of the most important things to address is what tags are supposed to mean for caches in the sense of, is it supposed to be for batch invalidation or as Fabian has kind of put it in some of my discussions with him who he actually had a lot of the recommendations that I talked about. Here it is. Is it an issue of causality management as well or say you have one cache item derived from another, you invalidate the tag. If you derive a cache item from an now obsolete one, how do we make sure that that doesn't stick around? And I would propose possibly handling at a different level with something called vector clocks, which basically track causality among items. But if PSR6 actually, or a successor starts supporting the idea of cache tags, then we actually need to know how extensive what tags mean. Like if PSR6 doesn't support tags. I know it doesn't. That was done deliberately, but a mechanism to make that an extension is built in and the way we designed it is specifically to do that. PSR16 is gonna be totally useless for you. Okay. It might be worthwhile for our WordPress support, but. 16 is specifically for, I don't wanna deal with the complexity of PSR6. I just want a dumb key value. Okay. And so I get a simplified interface for that. Oh, okay. I misinterpreted the relationship between them. It's simply a utility wrapper for people for whom PSR6 is too complicated to deal with. No, this needs to operate at the PSR6 level at least. We can talk through how to deal with the tagging. Is there any discussion of supporting PSR6 natively in Drupal? Until tagging happens, I don't see that happening. Katz is very against it. Mark's son is very against it. So I didn't bother pushing it. Okay. Yeah, I feel like. I would love to be in Drupal. Okay. We think it would be advantageous to get it after us. But we need to. Oh, thank you. Thanks. Okay. Thanks. I was thinking about the size of the overhead that it is imposing on a database. It is going to be a small one, but it is going to be a present. And you mentioned that there are still going to be some logs because you're writing to the database. Correct. And you could implement an L2. Or something like. Yes. What I was thinking, the alternative L second. If, if, if even that amount of database overhead is a concern for you, then, and you, you want to get rid of that database overhead so much that you're willing to run a separate system, then yes, that totally makes sense. And I would be happy to take a poll request that implements a like a Mongo L2 or a Redis L2. Memcache would not be able to support the necessary. It has strut. It has the concept of. Yes. You would probably want to. You don't really need to search. You just need to pull a range until like you get to the item that is older than. Or, or a notion of even a sequence that you can easily iterate through. Okay. Yeah, you cannot have good items go missing from the L2. That's true too. And this handles the hot key problem quite well. Exactly. For example, I had this exact same issue and I was thinking to solve it with a proposal that he actually suggested with the intermediary proxy. If you have heard about this Facebook's RC router. Yeah. So. That works with Memcache, I believe. Yeah. It's for Memcache and you can set it to actually write everywhere and read from the local one. So it's a workaround that the problem has to run into the cache. Yeah. Yeah. Yeah. Just for the example, just a single big key might cause the botonic node to hold. Yeah. And I often see that the case on Drupal sites where they have cache, cache values that are literally a megabyte or two. And it's. Schema cache. Yeah. Yeah. Schema cache or. I see a lot of them for views. Schema, registry and registry. Registry. Yeah. And those get so big that you don't even have to have that many requests to a single cache to get a hot key for that in terms of bandwidth. Yeah. Like when it's four megabytes, like I would need to pull out a calculator but like how many concurrent page requests do you have to have before you start saturating a few gigabytes of network? It's not that many. Especially if you read multiple multi-megabyte cache items on a request. Okay. So the modules for Drupal 7 and 8 are they already public on Drupal? Yep. But they mostly work. Like I mean mostly like I'm not aware of any open bugs on them in terms of like issues where. Everyone that has hot key issues or whatever can actually open your bugs or. Yes. I would love to have people try this out. And. Yeah. Well. I'm sorry. Yeah. The only issue I'm aware of that Fabian's raised which may or may not actually affect your site is the issue with tag clearing where it doesn't quite do the same kind of concept of tag versioning that Drupal core may expect. I haven't seen any issues with this yet on sites that we've tried it on. But for extremely nuanced things like if you were say doing e-commerce then I might hold off right now on the Drupal 8 version. But if you're mostly just managing content you're going to be fine. Drupal 7 is fine. Drupal 7 doesn't have a concept of tags. Yeah. Yeah. Actually the only caveat I would mention for Drupal 7 which only would require a tiny tiny patch is that it looks for the database connection information in the environment the way that we do it on Pantheon. But it like it's just looking for some server environment variables. It like. You can either export it or I would I would. Or I would just take a patch that like changes it to look for the same place Drupal those are something I was just getting it done it was but there's a note on the page. But other than that one thing I think it should it's pretty much just drop in and run it. Like there's zero configuration for it. Okay. Yeah. There needs to be a lot of work. It's a lot of positive work. And I've tested this with Drupal 7 and 8 as the only cache it works with for all bins. So it's not like it won't it'll break your site if you use it for every single bin. It's just that there might be a bin like form cache or a page cache that you don't want to put into it. That would be better to not put into a cache. So I've seen it in memcache it's very nice. It's okay. Like you're welcome to put it in whatever you want. Like it's just that the benefit of replicating out the cached form data to all of your web heads is no because it's only going to get read like once. Yes. And this is why I was talking about my suggestion. You do seem to, et cetera, which is why I question that if you investigate this line of. We have network, we have network caps between nodes that are based on the clouds that we deploy to. And those are going to be at some level. They're going to be either like 1.5 gigabits. They're going to be like the most we've ever seen this 10 between nodes on a cloud. And eventually you saturate it. Like the, we already have sites saturating, you know, three gigabit, four gigabit links to a cache. It's not saturated on the web head because it saturates on a Redis. Like the Redis box that is sending all the cache items gets saturated. What I'm trying, I was trying to say is that it wasn't a drop in replacement until you developed it. And if it was a really smart solution that you developed, I think it would be helpful for not just until. No, that was my point of like showing the performance graphs of it improving versus Redis that even when you're not saturating Redis, it still is faster. So, and I have evidence to believe based on the database schema tests that even without the L one, it's still faster than the built in cache. Yeah. My ultimate goal for this would be something where it installs a Drupal core. If you have a PCU, it uses it. If you don't use a null L one. Well, if you run it on a single node server, then it has no events to synchronize. If you wanted to, the data is a separate database connection and really modern versions of MySQL are highly competitive with Redis in terms of performance. I've looked at it a little bit, but it would be interesting to write an L2 against that. But the problem is that almost all the cloud database stuff now only opens up the MySQL protocol socket. So it would only be useful if you deployed your own database and wanted to maintain that additional thing. I would rather build it against something like Redis than against this special socket on MySQL. Nice meeting you. You too. My name is Felipe. Oh, hey. Where are you based? London. OK, I'm actually going to be in London early next week. Yeah, thanks. Yeah, it's just a lot of material.