 Hello everyone, so welcome to a more technical talk So first of all, I wanted to ask a question and see Who in the room so hands up in the air who in the room loves cooking hands in the air All right, so it has nothing to do at all with the talk But I want to start with an active element to get you warmed up. So now we get into the boring bits I'll talk today about a Few concepts, but I want to start off with telling you what I will not be talking about So this will not be about infrastructure performance Like how to set up your Kubernetes clusters and things like that It will not be about frontend performance how to lazy load your images and similar things And it would also not be about network performance how to reduce the latency in your DNS round trips of something like that So these are the things that we will not cover However, we will do cover some things. So Everything I'm talking about is meant to be back-end only so in not talking about frontend performance, which is a huge topic, of course, but it's not about this talk now and This is either algorithmic in nature. So we're changing our approach of doing the Solution that we need for a problem or Implementational meaning that with the given approach that we have we try to make it faster or cheaper to execute so first of all The talk has a pretty long title and it contains performance and scalability So let me talk about these two terms because oftentimes These are confused or people think they are the same thing or they are not clear what the difference is So let me first go over that the performance is How fast you do a single unit of work and making something more performant means for a given type of work You just aim to it to execute it faster. So that it takes less time to execute the scalability is the ability of your system of your application to Easily support a high load a high usage without degrading in performance and While these two are directly related They are not the same and oftentimes optimizing for one degrades the other so if you have a system that does something a Very high amount of times and you improve the performance then it will in total take a lot less time But if you multiply that at internet scale It might mean that that time that it takes is still too much time And with the finite resources of the servers that you have it might just been mean that your servers just crash So scalability is not necessarily meek and faster. Oftentimes. It's actually making it slower But it changes the approach of how it's done so that it can more easily be done multiple times at the same In the same request for example, it can be parallelized you can do only a fraction of the work and Try to guess the rest and things like that so performance and scalability and not the same thing and Sometimes optimizing for one degrades the other So with that being said how do you how can we visualize that relationship? We have an exit Diagram here canvas with two axes One being how fast or slow? Given operation is and the other is about the distribution of our load So at the left we start with a very low load a single visitor Our our grant ma does a request on our private block It's a single request block does not do much and if we optimized for performance This will actually be very fast So we can ensure that the single operation Executes almost instantly and we get to go. However, if now all of a sudden our block is Is Shared by an Instagrammer by by an influencer What happens is that if our block is not scalable? The performance will slowly degrade the more usage our block because gets and with high load It might even end up not being able to serve these requests at all anymore because our server just crushes under the high load and with a scalable approach first of all it might actually be slower to do a single request because Usually for a scalable approach. We have much more logic It is not the simplistic way of doing it as fast as possible But there's way much more involved. We have heuristics to decide when do we do it? Do we do it at all? Maybe we only do it every ten step and so on and so forth So we have much more logic making it slower But as the load increases we are able to still serve these requests the degree the degradation is either non-existent or most of the time Happens in such a way that we still can serve these requests over the entire spectrum of the load that we need to serve So that is the relationship between performance and scalability That's a very important distinction and some of the mechanisms I'm talking about some of the concepts they improve the performance and others improve the scalability So with that being said first of all if you think that something is to slow Measure it figure out. Is it actually slow or is it just perceived? How slow is it? Measuring is important to pinpoint problems and to have a frame of reference so that if you make changes you can actually Find out if you made an improvement or if you maybe Added a regression So with measuring you can find the hotspots so that you can focus your work on where it matters most Because optimization is costly. It's time-consuming So you don't want to optimize something that is never used at all You want to optimize the one point that has the highest impact on your bad performance You optimize against requirements. So optimization is also Always a trade-off. I'm probably standing in front of the slides here. So sorry about that optimization is always a trade-off so meaning that you don't Optimize indiscriminately you figure out what your business requirements are what is the core thing You need to achieve as best as possible and you optimize for that. That might mean making something else slower It's always a trade-off and then finally Measuring in itself has an effect on your platform So the more tools you add to measure what is happening the slower your platform becomes as well because you add Overhead of measuring on top of everything else That is why it's important to always use adapted tools You can use Tools that give you very very precise measurements about what your code is doing But you wouldn't run that on the production server. The production server will just crawl to a halt So that's why There's three general categories of tools that you'd need for measuring performance and scalability Profiling is the act of running your code through a measurement system where every single line of Code is being measured and is being analyzed on its impact. So with a profile It can see okay my code spends most time in this function. It uses that much memory It has so many calls of this function and so on and so forth Then we have benchmarking which basically takes a function or subsystem and tries to execute it multiple times and then fetches average times to measure it this serves a different purpose because Some code for example will not run With the same performance on every iteration so getting getting average numbers is Something else than then profiling the actual code and then finally we have load testing and load testing is Trying to figure out how scalable your system actually is by using it from the outside by just hitting it with as much load actual load as you can and see where it breaks because Actually pretty much every system will break if you put enough load on it if you put enough stress on it so those are the three main ways of how you measure both performance and scalability and Whenever you measure something keep in mind that measuring itself has an impact as well For the WordPress System for example It is common to use query monitor to get insides into your code But remember that using query monitor makes the code slower So it is the measurements that you get give you a relative A relative measurement of how your code relates to a different version of the code But the absolute numbers don't mean anything because the absolute numbers will be completely different as soon as you switch query monitor off always keep that in mind So I Will now talk about a lot of higher level concepts and sometimes there's PHP code as well But this talk will not include that much PHP code even though it's day It's in the title that it's a PHP talk It's more about the server side of the application meaning PHP on in WordPress so One of the main concepts you should be aware of when you think about optimizing your application optimizing your code is That there is a degree of how real-time your code happens to be I Invented a few terms for this talk. So hopefully you don't mind that here. I came up with the recency spectrum So on the left we have What is pure dynamic code? So that is the default mode of WordPress to operate You have a template that is being executed every time you do web request So if someone clicks a link on in your in the browser and you just have a default WordPress installation every single time WordPress will go through an entire cycle of dynamically generating that result Then as we move further towards static code we add We add interruptions into the real-time cycle so if we talk about micro caching for example micro caching is the fact of Doing near real-time for example, if you do a News page With live ticker and things like that you want it to be pretty much real-time But actual real-time is too hard on the resources So micro caching would mean you cache it for five seconds for example five seconds delay For most people is still real-time a lot of people even take longer to even download your web page if you add enough ads to it So five seconds is nothing in terms of the recency in terms of how real-time it feels But five seconds in terms of server resources if you're getting millions of page requests per second five seconds is a huge savings potential For your server so adding that type of micro caching still feels a real-time But is way more scalable than just being dynamic then we have regular caching that usually means you decide which parts of the Which parts of your side should have which time to live so this should Be cached for 30 minutes dish this should be cached cached for two days and so on and so forth and then Every time something that was cached expires it gets regenerated. So we have control over how long You want the different sub components of your page to be cached? and That is that goes beyond micro caching so it serves even more and it saves even more server resources But this is not real-time anymore So if you have one part that is cached for a week Well, you will not get updates for weeks. You can hardly say that feels like real-time that and then we have long-lasting caches with the main difference being that It does not directly have a time to live usually time to live is very high or it doesn't expire at all And you have a trigger that invalidates the cache instead For example, you can have a cache that Stores how your post is rendered and only when you Save an update to your post the cache will get invalidated for the rest of time as long as nobody changes that post You don't need to regenerate the cache. It is still Current it doesn't grow stale over time just because of it as long as nobody changes to post It will stay current and then finally at the other end of the spectrum. We have Static code so that means there is no runtime way of updating the representation This is something like a static site generator where the entire rendering Is changed from being doing runtime to being doing compile time or doing deployment So that is as long as you don't deploy a new version of your code. It is static So this is the recency spectrum feel free to use that term if you like it I don't know if it exists. I didn't find it, but it made sense to me So with that said As we saw that the more caching we add the more server resources we save way up to being static at the end The obvious approach is cache all the things But also Curse all the caches because everybody who has ever dealt with caches knows that yeah, these things can really be a pain to debug so Yeah, you should always be Aware that as I already said before Everything is a trade-off and it is not only a trade-off in terms of performance with us versus scalability It's also a trade-off between how optimized it is and how maintenance heavy it is We have different types of caches that come into play when we work on a PHP application server The most PHP one of them is the opcode cache So the opcode cache is the cache that is used by the PHP runtime itself to store the Results of the PHP compiler. It's not technically a real compiler, but let's stick with that for simplicity's sake So when you when you execute a PHP file the PHP file is loaded from file system It is laxed it is paused it is compiled into bytecode that pile code bytecode is done processed and that is done Generating your result and the opcode cache. It actually caches to result of some of these steps We'll go into that into more detail a bit later Then we have a server-side data cache I don't know if that is a technical term basically everything you use in your custom code where you store something Hopefully not in the options database in your Redis cache and things like that. That is Just pure data that you store what that data represents might be very different things. It might be the generated html It might be information about the user that you retrieved from an API Whatever and then we have the browser side cache which is the cache that the user of your site has on their end and the browser side cache is That is the most front-end will get here But basically everything you can have end up in the browser cache means your server is not being hit anymore So that is a way of not doing the work, which is way faster than doing it in a cheap way So let's talk about cache expiry as I always said with with some of the ways you can cache You need to define What the time to live is how the cache behaves when it is invalidated and cache invalidation is the important bit That is the thing that is hard to get right and that makes all the difference because caches are perfectly fine if Your cache invalidation is perfectly fine if you have an issue with your cache invalidation then You get random output and you don't know where to start to debug. So cache invalidation is very important to get right The most simplistic way of dealing with cache expiry is a hard time to live It basically means you cash something for let's say 10 minutes So during these 10 minutes the first time you generate a result and then it is cashed Store in the cash and then for 10 minutes We always serve it from the cash and then after these 10 minutes or cash isn't expired So the next request figures out all the cash is expired. I cannot use that so it goes and Regenerates the result stores that back into the cache and then serves that cash result again that is the most basic form of caching and It is easy to implement, but it comes with a few drawbacks So usually You can imagine that most most of the time You use a cache for an operation that's expensive to do So if you now have a hot TTL and your cash expires The user requested the result, but you don't have the result. It's still So you need to regenerate it, but it was an expensive operation So it takes a long time to regenerate it and all the while the user is still waiting to actually get that result That's why a hot TTL. It's easy to be implement but oftentimes it's just Not a good experience because you have a synchronous lock on the user Until the result is regenerated again once it expired So what is an improvement on that is a soft TTL? You cannot always use a soft TTL, but when you can you should always prefer that it basically means that once you figure out that your cash has expired Instead of letting the user wait you continue to serve the stale result of the cash So you don't erase the cash results You can continue to serve the stale result But you trigger a background process to update the cash in the meantime and you can still you might still serve Thousands of users while this regeneration happens They just get stale content out of the cash which in some instances perfectly fine And then when the cash has being updated then you replace the stale version with a new updated version So that is always preferable not every it's not always usable. It depends on whether the result your request needs to be Precisely accurate or whether it needs to be good enough. It needs to be just if you have Let's get back to the news cycle For example, if you continue to serve the last version of the life ticker for two more seconds, that's not a big deal but if you Answer an authentication request with a stale result that might be more problem problematic, of course And then finally we have expiry via a cash key so The whenever you cash something you use a key to reference that part in the cash and You can actually use the cash key when when you generate the cash key You can use the the data point that defines whether Your data is fresh or stale. You can include that data point into the cash key So when there's a reason for your data to be stale your cash key will change and you will have a cash miss And that will force the cash to be regenerated without you needing to manually Adapt it so this makes your cash reactive in a form where some outside Source can actually have an effect on your cash key which creates a cash miss and therefore Automatically the catch cash is invalidated. You can only use this When you use a cash storage that cleans up after itself automatically though So don't do that with the WordPress options database for example because it will just fill up after each cash miss and As the cash key changes automatically you have no way of manually going in and cleaning up the old entries That's an important distinction so the act of Caching can be done in a hierarchical way and Oftentimes this is referred to as Russian doll caching You know these dolls where there's one inside of another inside of another And you don't see the inside dolls as long as you don't open up the outside doll and What Russian doll caching is is if you have a given result. Let's say our home page We might have full-page caching which caches the entire result. That's fine, but now if Let's say Okay, which one did I pick now for the slides? I think I picked the headline B. Let's say headline B needs to update The simplistic way of doing that is just scrapping our page cash and regenerating the entire page So with Russian doll caching we can actually have a hierarchy of caching in place with different cash keys These cash keys can use the expiry via a cash key by the way Which is a very smart way of combining Russian doll caching with an automated optimization and Then if now we want to update headline B We invalidate Headline B which invalidates the entire collection of posts and in invalidates the entire page But on navigation or the previews of the other posts or the title or whatever else They don't need to be regenerated. They stay fresh So that's That's a nice mechanism that auto optimize Optimize itself. You need to be aware of the granularity though So always make sure that You don't go too fine into detail because there's overhead Use these auto invalidating cash key. So expiry by cash key and Remember the cash limits. So the more caching you add you run a risk running into Something like your memcash running out of memory, which actually disables the cash which doesn't make things faster Also caching is a cross-cutting concern So what does that mean? We have several different layers that our application is usually built off out of and normally want to keep strictly within the limits of these layers But some concerns like caching or logging or security You cannot easily keep these within these limits because they are what is called cross-cutting concerns So they actually transcend these layers and that makes them very hard to implement properly if you do object-oriented programming for example It's always a pain to figure out how best to structure these that is not that you fail to reason about how to best put them into objects it is because they are Inherently problematic because of their nature. So there's three ways of dealing with that and This is important for caching because it means it lets you know how best to implement caching in your application through dependency injection Meaning that you can inject a cache object everywhere and then use that cache object through decorators which is a Design pattern where you wrap Whatever you have with a cached version of itself that that works well if you have Interfaces and then aspect-oriented program only to Accelerate a bit here So using dependency injection that unfortunately creates a lot of noise. So oftentimes that is the least preferable option the option I Usually prefer is using decorators Which means that if a service needs a dependency to execute some work and that execute that work is heavy we can wrap that dependency with a cached version of itself and Neither the service nor the dependency need to be changed. So you don't need to change their code You just can wrap them from the outside and then finally the There's an example of caching decorator. So you can see our code without caching and the the naive way of doing the caching would be to Immediately include the caching logic with the actual logic of your code that creates a lot of convoluted mess Unfortunately, so what we want to do instead is have the cache be encapsulated into a separate class This code didn't need to change This code didn't need to change the only thing we changed is in our bootstrapping code Where is assembler application? We wrap our fragment with a cached fragment. That's the only part that needs to change the rest of the code You see we have the original code again Aspect oriented program will not go into detail. It's a method of actually like hook-based programming in WordPress Just at the PHP function level so you can say before this function is executed or after this function is executed Do this instead and this lets you easily wrap all methods with logging or all methods with caching Now a last concept here immediate versus deferred This again like the recency we have a scale here We can do something immediately or we can not do it at all and there's several steps in between of doing that and When we When we think about whether we do something immediately or not the important part is that the web server is not a task processor So what I often see is something like this to check out It you add action and action and add action to your checkout process or to your form submission and Your web request times out. So that's not the way to do it The way to do it instead is the checkout just stalls the fact that Something was ordered and that's it and then you have background workers that pick up orders and do the heavy work Instead outside of the rep request With this out of process Execution need to always make sure that you serialize the context correctly that you can scale the web server and task process independently and use optimized infrastructure for the message queue And then don't preload pre-instantiate the entire application. So this is normal instantiation you just Execute everything all at the same time But instead you can actually use proxy objects and proxy objects are an approximation of something that's way cheaper to instantiate and This automatically optimizes everything then because when you actually hit a proxy it turns into the actual object itself Which optimizes by itself? Sorry for rushing through it. This was taking way longer than I expected Proxies are not always the exact same thing. So be aware of that again granularity is important to To consider and then you can automatically generate proxies with library called proxy manager And then the last bit code generation Code generation means that you turn something from being executed runtime to executed at compile time Which makes it not be executed at compile time at all which gets us into the skipped part of the execution scale and What happens with the opcache is if we do everything? if we compile everything we get to this where the opcache is just being read interpreted and processed instead of laxed pause compiled and so on and There as a bonus if you keep everything as static as possible all the static code ends up being this So static code it is in memory and is just being executed So this is the fastest way of of doing your logic because you're mostly not doing the logic at all So this is it. Sorry for rushing It was way shorter than I expected