 Welcome to the last session of the day. I hope you enjoyed the conference. I hope the food outside is good. I miss that. I'll try to make this conversation less boring because every time people talk about performance, they turn to actually go away from the conversation. Yeah, and then I try to make it really simple and Drupal developers, this is not really hardcore technical session. This is more of a foundation introduction of how the website works and why we think the performance matters. And we're talking from the enterprise level, not from a personal blog website perspective. Let me get started. My name is Josh, and I'm based in Canberra. I'm always based in Canberra. I've been working in the government for my entire life. I started in Drupal 5.7, which is quite old. And the agenda for today, we'll talk about web performance, we need to know what is web performance and why do we care about web performance. Then we will need to dive into the journey of the request to see how things are working at the background after you actually hit the enter in your browser, where your request is going. And surely when we talk about performance, there is cache involved and how cache is working in different layer, in browser layer, Drupal layer, then we have a working Drupal website. Then people will talk about CDN, and we all know Vanish, and then what's the difference between Vanish and Rittis? Then we all know the meme, or you don't know, it's from the old Drupal days and people saying that don't panic and clear the cache. I believe the developers are still saying that. We will worry about that later. And then after everything, hopefully we come up with the best factors on the performance-oriented architecture and the Drupal is a quite popular topic these days. It's been around for years and we will talk about Drupal from the performance perspective. All right, what is web performance? I got this from Google DrivenEye, because these days you don't know answers, you just use the AI, right? Then it was telling me it's pretty much from the two aspects. One is saying that, oh, you request for websites, how long does it take for you to actually see the website, the web page? Or how smooth is that experience like you see the working web page, not something like your story doesn't move? Two, which I think is more important is the perceived user experience is about how you feel about the website. The truth can be, the website is not performing well from the objective measurement, but the user feel like it's performing well somehow. Which is quite important and I think they perceive the user experience takes a higher priority. Why? You have poor website performance, you have bad user experience, then you have a leaving customer. According to the stats, like one in four visitors will leave a site if it takes more than four seconds to load. And what is four seconds to load? You can actually find that from Google and Google actually has a really good tool called Lighthouse in your console. In the Chrome to actually tell you how long does the web page to actually paint and how long does the web page become interactive. And every second delay that will actually impact the user's satisfaction by 16% that was from last year's stats, not sure about this year yet. So we will talk about the journey of the web request to figure out where we can actually improve their user experience and the web performance from different layer. Everything starts with a click, like you have a browser and someone click on the link and click enter in the browser and your browser will think, what should I do about it? Sometimes the browser thinks I don't do anything about it. Sometimes, for most of the cases, the browser will actually generate a request with the header. The header will include the information that tells where should we actually fetch the content that the user is asking for. And the browser will do certain logic check. Well, I will talk about the detail later, but the browser has local cached version of the response. It will check whether this web page has been downloaded locally. If it has, then it will check some condition whether I should actually fetch the new version of the web page or not. If everything is fine and I don't have to fetch the new version of the web page, I'll just load the old version, the cached version. If you inspect the network in your Chrome or the browser, you may see there's no actually traffic going outside. And before I talk about the details on the logic change, I'm going to introduce a conception called ETag. ETag, well, a lot of people know what ETag is, but just in case, it's an entity tag for an HTTP response. At least, we can imagine it's an ID of a response. In other words, it was an MD5 of a response, and it's a hash of a response. If anything changes in the response, the ETag will change. So, imagine if you have the last updated time in your response, in your web page, or in your HTTP response, it doesn't have to be in the content, it can be anywhere. If the last update time changed, the ETag will be different. It means that the browser will treat it differently. Now, let's talk about the detail. The browser will ask, do we have a cached version? First, if we don't, send it to the target and try to get a new response. If we do, the browser will check from the previous stored revision of the response, do we have the header indicating that please do not cached this page? If we do, saying that, Drupal is telling the browser no cached, don't cached this page, the browser will follow that saying that, okay, we don't cached the page, and we will actually send a request to the target with the ETag. So, there's a value in the header called if non-match means that it's asking the target whether the ETag is different. I will talk about what if it's different, what if it's the same later, and if cached control no cached equals no, means that, okay, we cached it. The browser will check with the cached page. Pretty sure you guys are familiar with this too, because in your performance setting page in Drupal, there are settings for that, and to make it clear, that's not really for your web browser, that's not for your web server or Drupal to control the cached, that's indicating the browser, or any upstream CDN, or any entity, that how do we actually handle the cached? And the browser will ask for the max age. How long do we actually cached it? And if the time is within the cached hour, it will actually load the cached version. If it's not, it will actually send the request to the target with the ETag. And here we have to talk about another conception in the header called ModifiedSins, because this is the indicator of last time when the web page was updated in the server, and the browser will use this value and the max age value to calculate whether it's expired or not. This is only high level of the logic, how the browser is handling the caching. There are many other things. I don't really have the enough space to draw the diagram here. So you can imagine your browser is actually doing a really good job to actually cache the website. It's by default, unless in the console you actually tick no cache for the debug purpose, but we reckon you just keep it on. One response, when the server received the request and able to actually compare the ETag, whether this resource has changed or not. If it's not changed, it will return htps 304. 304 is a really small piece of response indicating the browser okay, this resource, this web page hasn't changed at all. I'm not giving you the full response. There's no content, there's no web page in the response. Simply indicating it's not changed. If it's not changed, the browser will render the cached version. If it's changed, then the browser will actually make further call back to the target and fetch the updated version and that's similar to just no cached version. So this is already quite complicated and imagine developers actually debugging a change and we haven't seen a change and people say now I already cleared the cache but I haven't seen it. Maybe try incognito mode then your browser will actually fetch it for you. All right, so enough with the browser. In the best case, the browser actually send the request out and it will be delivered to the DNS service and the DNS knows okay, where's the URL, which IP is that and it will be redirected to the web server and what happened within the web server. So this is a really simple case. We haven't got CDN and everything in the middle yet. It's simply browser to Drupal, right? In the server, we have engine X or patchy and we have Drupal running in it or Drupal will do. Again, from a really, really high level. I'm not going into detail. I'm going to show you a really complicated diagram after this slide. So what Drupal will do is it will check the database and we have a number of tables in database starting with a case and the score or whatever. It will fetch the data from the case table first. It will check whether the entity we're trying to render is already cached in the case table. If it is, fantastic. I will just render from the case table. If it's not, then I will actually conduct a really, really complicated SQL join tables and grab the content from the fields, from node, from whatever table. This is by default Drupal doing that. You can't really disable that. Drupal is doing that regardless. This is what I was talking about. The real Drupal render flow diagram is for Drupal 8. That's the version I can find but I'm pretty sure it will be similar in Drupal 10. We have much more complicated logic checking in it but from a high level it's checking the case table. Now, we have a simple architecture. Browser is caching. If it's not caching, send it to the web server. Then the web server sends it to the database server and database server doing the caching too. We have two layers of caching. Drupal, database and the browser which is fine. We're happy only if we have too many requests from different browsers. If we have talking about enterprise level, we have too many requests requesting different URLs to Drupal means that it will all go to the web server and it will go to the database server to actually fetch even if it's from the cached table. We're still reaching the database table. I think you may need to actually find a friend from outside to actually talk about the autoscaling and manage surveys on Drupal hosting. They will actually handle the autoscaling otherwise simply many requests will actually crash your server. Then people start to talk about CDN. What is CDN? Pretty sure you all know that. It's a content distribution network. A cloud computing at the age closing to your network. It will actually store the web pages at the age which means that when you make a request it will reach the CDN's resource and the CDN will give what you want. The traffic will not go into your original web server if this is cached. In the diagram there is a green bar between the browser and the web server. With the CDN properly configured you can actually cache up to like 95% of traffic. The browser will only reach the CDN and most of the content will be served by the CDN. Only small parts of the content will come from the web server which will help you. CDN is always a service and all the big companies are doing that. If you guys use Gapsimus, Akamai is a CDN you guys are using. What the CDN can do apart from serving the content is behaving like a firewall. You can configure the traffic in CDN the firewall. If there is a DDOS for example it will know it, it will ban the AIP. There is no traffic coming through the CDN and you can actually create custom rules such as if there is a request for a particular URL and you know some custom logic on that URL and you can actually make the CDN to redirect that request to somewhere else. Or the CDN knows the cookies or the tags or any information carried from the request header you will actually check the header and it will actually do some customization to massage the request. The last thing, the CDN can actually store some files from media files at the age. If you have some static image icons you don't want to actually serve from the web server you can directly upload it to the CDN and it will be served by the CDN. Now, vanish pretty sure everyone already knows vanish. Vanish is an application sitting in front of Nginx. It can be in the same server with Nginx or it can be in the dedicated server in front of Nginx is a caching software that received the request from the upstream can be from CDN or from the browser and it will actually check whether the response is already cached it's using caching within its memory which means it's really fast because the response is from the memory and similarly to the CDN you can configure vanish with all the custom logic like how do we want to distribute the request how do I want to pass it or block it or doing whatever proxy work you can do it from the VCL pretty sure there's if you want to talk about details there's another presentation about vanish for it then where vanish will play in the end-to-end diagram we see that vanish will sit in front of Nginx and we're just putting 90% of traffic if it's configured properly the times of traffic can be served from vanish so that there's another layer of request actually reaching your web server now the last components in the diagram Redis Redis is a content store is a no-SQL database in memory people can use it to store anything in general web app or the website world SQL query result like or you run whatever SQL query then the result is in Redis and Redis will actually give the result instead of running the query again in the database in Drupal world is even better the Redis model actually moved the cache tables the content of the Drupal cache tables into Redis no-SQL database which means that in the previous diagram when we have Drupal Drupal contact database and fetch the content from the cache table now is fetching the cache content from the Redis no-SQL database which means that less traffic to the database server now we have an N2N diagram and you can see that 1, 2, 3, 4, 5 5 potential places that we have the caching and that way and also we have the web server and the database server also scaling enabled our website is pretty safe with all kind of like a traffic coming in now do we think that we don't need to panic if we clear the cache when we run Drupal CR for example if we run Drupal CR in our prod I follow that one if we run Drupal CR in prod means that all the cache will be cleared means that suddenly all the traffic will come into your web server and database server means that all the structure we put in front of the database server they are not working because nothing is cached so we don't want to actually clear the cache and we need to worry if we clear the cache in prod but how do I see the change like the developer actually have a new function in the page and I want to see the change in the page I mean you don't really clear the cache how do I see it normally that's from the testers that's why we have to talk about cache tag I'm really glad that Drace mentioned about cache tag in the keynote because cache tag I was going to repeat in my presentation multiple times cache tag was introduced in Drupal 8 which is a long time ago but for developers from the Drupal 7 or even previous they have no idea what is cache tag and they program the way they like so it turned to be forgotten by long time Drupal developers what does Drupal tag does Drupal tag is kind of like a flagging for your Drupal entity is adding the dependencies to the render everyone knows that everything is renderer in Drupal even so small entity or a page everything is renderer actually add a dependency to the renderer saying that okay this current entity is depending on the content of this tag if this tag changes or someone saying that you should create to touch the tag this tagged renderer or entity will need to be cleared the cache needs to be cleared Drupal is doing that by default we don't have any custom module doing that or country module doing Drupal core is doing that which is really really awesome Drupal is already doing this smart solution to actually handle the cache we need to leverage from it pretty sure other people been talking about this for long time because renderer is in entity level and we can have multiple entities in the same page and means that the tags in each entity will bubble up to the URL level I will give an example later on how does the bubble up works and with these we will enable selective cache clear which means that we don't run direct CR we automatically clear the cache when we need to and we only clear the cache for particular page this is an example of cache tags for example a current page is node1 node1 will be the tag whenever the current node1 is changed Drupal will clear the cache of everything tagged with node1 and also not least article is a tag if anything like any article in website, in article content type is changed then the page with this tag will be cleared from the cache similarly the entity type id entity is actually applied to all kind of entities give you a real life example so we have a page it's a landing page and we have a feature news at the middle and we have a view, an article, news article view at the bottom and on the right hand side we have the user information the feature news is node5 and the news article views actually showing node1, 2, 3 and the user information is uid2 and what happened to the tag so the tag, the minimum number of tag we will have is node4, node5 and all the article nodes because we have a view in there and user2 means that if node4 changed if node5 changed if all the article nodes changed only one of them changed if the user profile and user2 changes the current page cache will be cleared by Drupal automatically we don't have to do anything and this is all default from the Drupal core when we actually develop developing a custom page or custom solution we need to be careful of adding too many tags because there's a limit from the CDN for example the size of the the cached tag if you put way too many cached tags in there CDN won't be able to handle that so we just need to keep that in mind and we have the purge module and we have a bunch of other purge-related module depending on the purge module to actually integrate with varnish and CDN with the purge module configured what we can do is we can automatically clear the required page from Drupal so Drupal will initiate the cache clearing and it will trigger the cache clearing in varnish and the CDN the purge module will send signal to varnish and the CDN will think that please clear this particular page because it's tagged it's all started by Drupal and we don't have to do any other customization debugging as I mentioned that we need to actually use the cached tag and Drupal automatically will clear the cache and purge module will work with varnish and CDN and to debug whether the page is cached or not check your browser console check the response header whether your browser is actually sending a request out or it's not then you need to actually clear the browser cache and also remember that the CDN actually treats the query string as a different URL so if your query string is different then it will initiate a new request for example if you think your page is cached by argmy you just add query mark something equals to something that will actually break the cache some tips please use Drupal to render as much as possible try not to use for the selling content like news or other content please use Drupal to render because Drupal already cached quite smartly and you have all the software in the stack to cache for you so why not leveraging from it developers always add cached tag in your render array when you render you have the cache in your mind and you need to understand how long do I need to cache for it when do I need to clear the cache for this particular content otherwise it will be cached forever you only use a frontend solution for example you use react call when there's a global value that is constantly changing someone want to display the server time in the web page you don't want to do that but if the requirement is really like that you don't want to use Drupal to do that you rather to actually have an API to show the dynamic value and have the react or JavaScript whatever in the browser to actually contact the API directly without going through Drupal because your Drupal is busy enough and if you want to integrate with other third party API which serve the dynamic values you also want to go from the browser directly to your API endpoint not through Drupal and nowadays presentation uses cookies and frontend JavaScript actually fetch different content based on different cookies so you don't want to actually cache it for different cookies so presentation need to be from the frontend if you're going through Drupal then it's likely that it will be cached and then it's going to be the frontend presentation one of the challenge after I talk about that is if you have a really dynamic URL argument like location, country, region, district, suburb that you want to show the detail of that CDN and Banish will be passed because the URL is different and there can be millions of different URLs and you won't cache it because it's a frontend solution that you have to move the details from the endpoint and using your browser to actually call the endpoint then you can cache the endpoint and you can only evaluate the endpoint result when it's needed right definitely yes or no initial response size may be smaller or larger or similar and response size is much smaller then coupled Drupal but what you need to be careful is if you change the layout of the website in the coupled Drupal you need to clear the cache for the entire site then all the traffic will go in and if you have all the value pulling directly to your browser you need to be careful of the CPU usage in your browser because there will be crazy state management in your browser all the React is doing to manipulate the data within your local browser something missing I'm not going to talk about it reference some diagrams are from there question find me out flying