 Okay, hello everyone. My name is Janne Salo, and I'm a senior developer at Exo, and I'm going to talk to you a bit about the cash control module and how you can use it to help you to build high-performance sites on a triple and how we have used it at Exo. So what I'm going to talk to you, first I'm going to spend a couple of moments to talk about who we are as a company. And then I'm going to go into some details about the cash control module itself and how it works, things like this. And well I'm not going to go into too much details because there are some complexities involved with the module. And then I'm going to present you three cases we've used the module in. The first one is really easy, like a perfect use case for the module. It's this hockey site called theatkoaika.com. Then we have a hard case community site for teenage girls, demi.fi which has a lot of authenticated users generating a lot of content. And it really puts the servers on a strain. And then last case is a bit different one in that we didn't have to deal with high loads rather than geographical distribution. And after that we'll have some Q&A. So a few words about Exo. We believe in open source, which I guess everyone does here. And we come from the north in Europe, our headquarters is in Helsinki and we also have a presence in Estonia and the UK. And well we were founded in 2006 and we're about 60 people now. Most of which are developers, we don't have that much other staff. And since 2006 we've served more than 120 clients. And some of you might have met our CEO Janne Kalleola who is the chair of business and strategy track here at Trubelkon. And this is really all I'm going to say about us. For now if you are interested in hearing more, you are welcome to visit at booth number 38. So what's cash control and how does it work? Well on this slide you can see the address to the project page. You're welcome to check out the code if you like. You'll probably, if you're technically orientated, you might want to take a look because I'm going to skip some details in this presentation. Okay, so cash control is on conceptual level, it's really simple. It's a module for integrating your site with Varnish or some other external HTTP cache. And I'll be mostly talking about Varnish today because it's the one we have most experience in, but we also have some experience with the NGNX caching. It's pretty much the same, except that it doesn't support birch requests as far as I know. Well, whenever I say Varnish, you can replace it with your favorite external HTTP cache if you want. And well, how it works is, it's really just many places, the cache control headers in the HTTP request that Troubles sends out. And some of you might already have guessed where the module's name comes from. Yeah, we also have support for birching content from the cache. And we do automatic birches when, for example, nodes update, nodes get comments, things like this. And then we provide a hook system for those who want to upgrade some content on their own. But actually, just before the lunch, I went to see the session by the Columbia Law School where they presented their own way of doing cache birching, and I kind of like it. So that's cache tagging. It's a way I would like to see cache control going someday. And cache control comes with an admin UI to select which menu, root or paths you want to cache. And you can also specify different TTLs, times to live for different paths. So you can have content that expires fast and some content that just hangs in there until birched. And so on. And we also ship a WCF file for Warnish. And you can use to configure your cache with. So how does it work? On a very, very high level, it goes like this. So whenever I request this incoming to the server, Warnish checks it and sees if it can be served from the cache. And if it can be, then Warnish just sends it to the users browser. And here's where things get interesting. Warnish also does it for authenticated users. And how it can do that is because if the request isn't found in the cache, Warnish of course passes the request to Drupal and Drupal then starts executing the base load. But it does it as an anonymous user and such generating a response that can be cached by Warnish safely. There's no personalized data in it. And of course we don't do that for all requests. First of all, if the page isn't said cachable, then we don't do anything. And then if the page can't be seen or executed by an anonymous user, we don't do anything. We don't want the cache to fill up with sensitive data ever. And there are also some minor details here. In some cases we might want to bypass the cache, but those are really not worth going into here. Okay, so then the user's browser has finally received the response. And if the user happens to be an anonymous one, we just show the page to the user and be done with it. And if the user happens to be authenticated, we need to generate personalized parts of it in a separate HX backend, which we call the get components backend. And then we inject the results on the page. And you might wonder what personalized content actually means in this case. Well, first of all, you can enable cache control support for any block interval, meaning that all blocks you enabled for are tagged as personalized and they will be generated for authenticated users in the get components HX backend. And we also provide an API where you can tag any other content than blocks if you need. And for those who are interested, we actually, in order to be able to generate personalized content in the HX backend, we need to store the function that's going to be used to generate the content and the function arguments. And then we need the HTML ID, which we use to replace the content in the user's browser. So what benefits does this sort of approach have? Well, first of all, I think this is kind of obvious, only the needed parts of the page are loaded. We avoid some bootstrapping. Actually, we avoid all bootstrapping or almost all bootstrapping for anonymous users. They just get the page directly from Warnish. And for authenticated users, we still need to do bootstrap in the get components backend, but we only generate the parts that are actually needed, so we avoid generating the whole page all over again. And we do this in a single request. All the personalized parts are executed in a single request. And, well, from the user's point of view, one of the benefits is that the user instantly gets something to look at while the hard parts of the page are being loaded. So it makes the site just feel faster. Okay, so there's a gap catch to it. First of all, building high performance sites is a complex matter, so cache control is not going to solve all your performance problems. And if you're only going to remember one thing about this session, let it be this, I'm not presenting a magic bullet here. Why is it a complex matter? It's because that simply if you take Warnish into use, that's probably not going to solve your performance issues as we will see later. And another catch is that whenever you develop a site with cache control, you kind of have to keep in mind that you're working with cache control, or you might end up doing a lot of work. And what this means is that you need to think what parts of your site are going to be personalized for the users, and what's the performance impact of these parts. And the sad truth is that most likely you will end up writing at least some custom code when you're using cache control. And another sad truth is that you probably end up spending some time wondering why the site behaves differently when cache control is enabled. And yes, the fact that the site sometimes behaves differently is that it's due to the fact that, for example, CSS and JavaScript files simply aren't loaded. All of them aren't loaded when an anonymous user makes a request. So you may end up having to load some of the JS and CSS files in a hook that cache control provides you. Okay, so you might have heard about edge site includes and you might wonder why you should use cache control instead of ESI. Well, ESI is partial loading technique, which is supported by Warnish and also some content delivery networks. And in ESI you basically write this special markup that gets cached in the Warnish. And Warnish, before serving the page from the cache, it goes through the ESI markup and loads the ESI marked path, parts of the page from cache or from trubal, usually from trubal directly. And, well, cache control does have some benefits over ESI. One is that when using ESI, you will have to wait until the whole page is loaded by Warnish before it passes it forward. And while doing it, Warnish causes trubal to bootstrap several times per page request when generating those parts. And, well, to my knowledge, this might change in trubal 8. I hope we are heading to better direction with that. And it might, if this is the case, it might even end up burning the server with even more bootstraps than it would without any caching. So, onwards to the cases then. First one is jatkaika.com, which is the leading ice hockey site in Finland. We get about 200,000 unique visitors and 1.6 million page loads per week. And the good thing about this is that almost exclusively the page load are being done by anonymous users. Actually, the only users using the site, as all indicated, are the site administrators and we have disabled cache control for those. Cache control allows you to disable it from certain roles. And what makes it even easier, content on the site is read a lot more often than it's written. So we don't really have to worry about cache perching that much. Okay, here's a screenshot of the site. It basically offers the Finnish hockey fan everything he needs. About news from different leagues and results and statistics and teams, whatever. So, how we achieve this, we have the basic triple MySQL setup. The Boosted Pizzolar and Memcache D and Warnish and this all is running on one server. And we have cache control enabled for all content pages, all node pages, taxonome term pages, the front page, whatever we have. And we are using different TTL settings for different pages to control the cache freshness. And we are actually not using any custom code. We still do have some requirements about content propagation to the cache that we need to. The site administrators want to be able to update the site so that the updated content can be accessed by the users fast. And for this we are using low TTLs on, for example, the front page. We are not doing purchase to the front page. It's enough to use 30 seconds or 1 minute TTL on the front page. And because of all these, the server loads are really minimal. We are able to hand pretty much everything the users can throw at us. So, this is how things work in the perfect world. We just enable cache control and our site starts working like a charm. Unfortunately, this is not the case always. As you will see with demi.fi. Demi.fi is the community around the Demi magazine, which is really popular amongst Finnish teenage girls. The site has been around since 1998 or so. And the version we did, it's maybe the fourth incarnation of the site. And we had to do a huge migration for it. And currently the site has like 250,000 registered users. Millions of nodes, which include discussion threads, community pages and blog posts. No, well, we get 2.8 million weekly page views. And I can tell you that teenage girls are probably the hardest demographic to please. They will really let you know if something's wrong. But the good thing about them is that they also forget quite fast. So, what makes this case hard is that most of the page loads are done by authenticated users. And during the busy hours, we might even have 1500 of those logged in at the same time. And, well, the user base is pre-fanatic. They hit refresh at all times. They generate a lot of content. And they write, they start new discussions at an alarming rate. They comment each other discussions at an alarming rate. And thus they generate a lot of content. And this poses a challenge to how we can keep the cache up to date. Because they also want to see the content they posted on the site immediately. And even further, almost every page on the site has a lot of personalized content. That's different for each user. So, we might be in some trouble with this one. Okay, this is a screenshot from the site. This one actually features the forum listing, which is one of the most requested pages of the site. And on the right-hand side, you can see some blocks, which are personalized for each user. And furthermore, the users can arrange the discussion topics to their liking. Well, as I said, new discussion threads are being created at all times. So this page also needs to be refreshed often. So I will go to the details a bit later about how we dealt with this. Is that we actually offloaded almost all the theming of the moving parts to the users browser. And we just served JSON fees from the backend. Okay, a little bit about the setup. Here, too, we have a Drupal and MySQL setup. We've upgraded the MySQL to Percon server for performance reasons. We're using solar for searches and as a storage. MongoDB is a storage. We have Nginx and PHP-FPM and Memcached and varnished. And all of this is running on almost one server. It used to run on one server, but then we leverage another server to partly offload the PHP-FPM processes. Well, this is something I'm a bit proud of, that we can run this kind of site on one server or one and a half servers. Well, how we used cache control here is that we enable it for almost all user-facing pages. At least the ones that are requested the most for some pages cache control can't really be used. Webform pages are one of these cases where it's not advised to use cache control, because the forms will just fail. Well, since a lot of the users are authenticated and there's a lot of personalized stuff on the page, the HX backend, the Get Components backend is under a lot of stress. And well, also we have written quite a lot of custom code to keep the cache fresh. So that birches happen when they should happen. And there are also quite a lot of JavaScript and CSS tweaking needed to make the site look good when used with cache control. This is actually one case where cache tagging would benefit us a lot. And after all this, the server loads are still significant, but they are mostly within tolerable levels. We do get some rush hours where the loads go up a bit. And we're still kind of it. This is working properly. We're still kind of struggling to push the loads even lower so that we can just leave the server and never look at it again. Okay, a little bit about the strategy we approach this kind of monster with. We want to avoid triple bootstrap and theming as much as we can. Well, luckily this is what cache control does. It tries to keep as much content into varnish cache as possible. And well, as an example that using cache control isn't the only thing you can do or it's using cache control as self isn't usually enough. We have this fast JSON-based package for data that changes often, like the forum topic list I was talking about. And we distribute the content as JSON and let users process handle the theming. And we've created a little module called Front Thever to help with this. And we also use cache control for those HX packets. So we cache those results with very short TTL 30 seconds or so. So we don't purge those actually. But 30 seconds seems to be short enough time period so that nobody really notices it. And yes, further, we use fast storage. We use solar for views, MongoDB for field storage and memcached for cache, triple cache backend. And well, as most triple developers probably aren't that comfortable with fiddling with varnish or optimizing database engines and stuff like this, my professional advice would be to get a good sys admin in your team that can do these things for you. So we have learned quite a lot of lessons in this project. One of them being that the get components backend really needs to be fast. And this caused us to actually rewrite the whole thing to use MongoDB as a storage backend. When the module was using MySQL as a backend, the server loads skyrocketed in some cases. But now, when we got that under control, we noticed that cache control's front end also needs to be fast. There's a lot of JavaScript magic going on in the front end. And since there are a lot of mobile devices in the market, these days it needs to be pretty fast. And also cache purging can itself become a performance issue, meaning that the users generate so much content that the varnish kind of doesn't take well to the amount of purge requests it gets. And we solved overcame this by using a purchase instead of bands in varnish 3.0 terminology, because in varnish 3.0 bands are implemented using regular expressions. And if we have an extensive list of bands, it will take some time for varnish to go through all of the entries in the ban list to see if the incoming request matches any of those. So we switched the purge instead, which just searches the entry in the cache and deletes it. And well, just an example of how performance issues kind of, when you solve one, you run into another one. After we have done all this, we noticed that memcache D actually started to show some varying symptoms with the way it handled form cache. Our form cache size went up to 10 gigs or something like that, causing memcache D to lose some of the cache entries, because it just ran out of space allocated to it. So after trying different options, we had to move the whole form cache to MySQL, which in turn caused MySQL loads to go up. So there's really no silver bullet here. There's always something to tweak in these high traffic environments. And if we didn't know it earlier, we do know it now that building high performance sites is hard. And it gets harder if you don't take performance into account from the very beginning. And by the very beginning, I mean the design phase also. You need to work with the designers to kind of have a plan, what is the performance cost of different options. For example, showing a certain piece of content on a page, a certain listing, and showing personalized content, giving the users a lot of options. All of this makes caching harder, and well, it probably makes the users happier, but not the developers. And it's best to mitigate potential performance killers at this point. If you can avoid some obvious performance trouble with just influencing the design phase, then you might save yourself a lot of trouble later in the project. And well, another lesson is that cache control is pretty much far from perfect, and it doesn't alone solve your problems, as I hope I have stress enough in this session. And actually you may end up doing a lot of work ironing out small glitches with cache purging. You might get some user feedback. Yeah, I made some changes to the site, and they don't show yet. And things like this, you might actually end up using quite a lot of debugging time with those. So that's the hard case. Then we go to the different one, which is Declacampus. It's an online learning tool and a community for engineering construction students, where they can learn how to use the tools. Structural engineering tools, Decla provides. And here our challenge is that the users, there are not that many users, but they come from all over the world. And almost all of them are authenticated. There's a moderate amount of personalized content per page for logged in users. Not that bad. And they don't really generate that much content of their own. There's a support forum on the site, but it's nothing like the forum the teenage girls use. So, again, a screenshot of the site. This one is very simple. Basically, it has lessons to use the tools. Okay, so how did we approach this problem? Well, the site is hosted in Finland, but the user base is spread all over the world. And we wanted to mitigate the latency to users in, for example, Asia and Australia and all these remote locations. So we figured we need some sort of content delivery network. And well, we tried out a few until it turned out that the fast live CD and actually uses one as its backend. So we decided to give it a go. And it turns out that cash control actually plays pretty nicely with fast live. Pretty much everything works straight out of the box even cash purchase. And if you are having some trouble with fast live, they even allow you to upload your own VCL configuration file. So you can affect the way how the harnish actually works. Okay, this also seems very happy almost as the first case, but it's not equally happy because we still needed some custom code due to the fact that some users, a lot of the users are authenticated and they see a lot of some personalized content. So I'm going to quickly cover all of the topics again. So cash control is a module for integrating your site with an external HTTP cash. I've been talking about Varnish, but you could really try it with any HTTP cash. And it works for both anonymous and authenticated users. And the fact that it does work with authenticated users is the justification for its existence. And how it integrates your site with an HTTP cash is that it manipulates the cash control headers in the HTTP responses sent out by a triple. And how it does it for authenticated users, it simply tags some parts of the page as personalized and those parts are executed in an HX backend at a later time. It can help your site to be a lot faster than just vanilla triple. And it can be easier hard depending on the complexity of your site. And I wish I was presenting you like this bulletproof solution that always makes your site blazingly fast, but that's not what I'm doing here unfortunately. Well, what we mean by an easy case is that we have mostly anonymous users, pretty high read-write ratio, meaning that there are a lot of content reads compared to content writes. There are only a few purchase needed. And the hard case is that we need to serve a big number of authenticated users that generate a lot of content. And we also saw that cash control can help you achieve geographical distribution on your site. At this point I would like to invite you to discuss our questions if you have anything. Yes. The question was about HX requests and how we cashed those. Or do we do it at all? The answer is that you can use cash control to cash HX requests if you want. You just have to make sure that they are... You can't really do that if the HX request serves something that's only meant for a single user, for an authenticated user, but for anonymous users or HX requests that do the same thing for each user. You can do that as we have done in the Demi.fi case. We cashed the HX request that serves the JSON output for the form listings. Thank you. Great presentation. Thanks a lot. Do you deal with out of band issues, like multiple varnish servers? Because one of the things that we've sort of noticed in dealing with any sort of HCTP proxy cash is that we usually have multiple varnish servers and at the same time when our users are essentially are invalidating or purging content, whether we're using band or purge, it still takes a while for that request to actually happen. So we started doing it in the actual thread and then we moved it to a sort of spawned off a new thread and then we moved it to QAPI and sort of did that because we had to clear out a bunch of varnish servers. Have you sort of addressed any of that or come across any of it? Because we still don't know what the best solution is. Well, this is actually a point where I would really like to have RCS admin here to answer you, but cash control does support multiple varnish servers. If in case of purchase it sends out the purge request to each one of those and it can be configured to do it in a non-blocking way where you just send out the purge request and just continue executing. And it's kind of, let's hope it gets done before the user loads the page the next time. But as how to do that, what are the implications of the varnish side, I would really have to consult RCS admin for that. Thank you. Anyone else? No. Do you mean our own NJX backend? Yes. The question was that if we can cash the results sent out by the get components backend that cash control uses and in short the answer is no. We used to have in the previous version of cash control, we tried this approach where you could actually do that if the page has components, only components that, let's see how it went. That don't need to be generated for the user during each page request such as let's say the box in the upper right corner of your site that says welcome user name. That's an example of a block that really doesn't need to be generated over and over again. But it kind of made the backend a bit more complex than it should be. And that was one of the reasons, one of the reasons to the performance problems we had and we decided to simplify it and now we just don't, the HX backend responses are not cashable. You mentioned that you use both Memcached and MongoDB. Why is that? For example in this get component function, why do you use MongoDB instead of Memcached? We use MongoDB because we want to have non-volatile storage which Memcached isn't. We need to be sure that when we tag some parts of the page as personalized that we will find the content we inserted at a later time. And with Memcached you don't really have that currency. Okay, so that's the purpose, actually the same purpose for the form. Sorry? This is the same as the form cache. We don't use MongoDB for form cache. We used to use, in this hard case we used to use Memcached for it. And well as it turns out wasn't really a good idea because Memcached ran out of space and it's still a kind of standing problem with the site that we haven't found the optimal caching solution for the form cache with big volumes. Did you do some tests to measure the speed gain? Did we do some tests to measure the speed gains? Yes, we have done some tests. I wish I had some figures with me to show but let's say in a happy case where you have a lot of anonymous users and the content can be served directly from the cache you can get to several hundreds or thousands of requests per second whereas with Vanilla Tripoli you can get to tens or maybe hundreds and optimal cases. So we are talking about several orders of magnitude here at the best case, in the worst cases it's not that significant. If you have a site that doesn't use it right now have you some sort of white or black testing for specific features that shall be enabled or disabled from cache control? What's the proper way to start with it? The question was about configuring cache control and disabling it per need. There's a global switching cache control that allows you to just disable the whole thing from your site if you're doing development or if it's somehow malfunctioning or for whatever reason and you can also configure per path, per menu router path which ones are cacheable and with which TTL and for nodes you can also enable or disable caching by node type for example you usually want to disable caching from web form nodes and maybe give it enable for other ones. Question about varnish, did you have to play old out with the VCL file or did you use just default? Did we have to play with the VCL file a lot? We had to cache control requires its own VCL file to work perfectly because we use some special cookies to denote whether the user is authenticated or if the cache control has been disabled for a user and we do not grant any permissions to user space on these cookies we just use them as flags to let cache control know what to do. We do need some custom VCL and we have had to make some iterations to it but I think it's pretty much stable now. I'm not seeing so many people reaching for the mic so I guess we're done here if anybody doesn't have anything. Thank you.