 34th lecture in the course design and engineering of computer systems. In this lecture, we are going to study more about the concept of caching. In the previous lecture, we had said that you know, if IO is your bottleneck, if fetching data from some device is your bottleneck, then caching is a common technique that is used to improve performance. In this lecture, we are going to study more details about what is caching and how do caches work. So the basic principle of caching is, you know, you have some data that is located at some, you know, so called far away location and you need it here, then if you have to fetch that data from that far away location all the time, then that is going to impact your performance. So for example, whether it is disk or some other node or something. So then what you do is, you store the recently fetched data in what is called a cache in a nearby location. So that the next time you need the same data again, you do not have to go all the way here, you can simply get it from the cache. So a cache note that it has limited capacity, the cache could accommodate all this data, then why not keep everything here itself, you know, that would solve your problem. But typically caches are only used to store the recently used data because they can only store very limited amounts of data, not all the original data. And if fetching from this far away component to this is your performance bottleneck, only then will a cache improve your performance. If something else is the performance bottleneck, adding a cache may not be useful. But if this IO is your performance bottleneck, then instead of most of the time if you are just fetching from this cache and not going all the way here, it will improve your systems, performance and capacity. So we have seen many examples of caching multiple times in the course so far. And across all of them the principle is the same, you know, you have CPU caches, you know, the CPU needs some data from DRAM, whether it is instructions or program data, the CPU needs it from DRAM. And instead of going to the DRAM all the time, the CPU has many levels of caches. And it will store the data from DRAM into caches so that in the future it can directly access data from the cache. Similarly, you have the disk buffer cache, you know, whatever you read from disk, you will store it in memory in the disk buffer cache so that you can avoid going to the disk most of the times. Similarly, TLB caches, they store the recently accessed virtual to physical address. So the MMU will check the TLB cache first and if it cannot find the address translation the TLB cache, then it will go to DRAM and you know walk the page table. So these are all examples of caching we have seen so far. So in this lecture we are going to see more examples of caching and more importantly we are going to try and come up with some general principles to design caches so that you can design them in any system. So one popular cache is what is called HTTP cache. So whenever you make an HTTP request, you know, whenever the client makes an HTTP request to some server and gets some HTTP response back, you know, whether it is a web page or something like that, you get an HTTP response back. What you can do is you can store this response in a cache so that in the future if this client or somebody else wants this same URL again, the same web page again, you do not have to go to the server, you can simply get it from your HTTP cache. So HTTP responses are frequently cached. So these can be shared caches, you know, for an organization you can have something what is called a proxy server so that all clients in the organization will go through this proxy server to access the internet. So that if some web page is already cached at this proxy server then if suppose one client access the web page and it is got and you know stored in this proxy cache then other client wants to access the same web page, it can simply get it from this cache, you do not have to go all the way to the server. So you can have such shared caches for multiple users in an organization. And this cache, this proxy server will if it is not there, this proxy server will get it from the remote server for you and serve it to the clients or you can also have private caches within browsers also just for one user so that if the user wants to access the same page again immediately it is available, these are not shared with anybody else. So all of these types of caches are frequently used in computer systems today. Note that one thing to remember is you can only cache plain HTTP, you cannot cache encrypted HTTPS content that easily because you do not know what is inside that HTTPS content. So that has to be remembered. So now then the question comes up, you know, I have stored some web page in a cache, what if the server changes the web page, you know, it is some news website, the news has been updated. I do not want to be looking at this old stale news all the time, right, when the news gets updated I want to access that fresh news, not this cached old news. Then how does caching work in such cases? So normally what happens is the servers will indicate how long this caching can happen. So when the server sends a response back it can say, you know, the maximum age of this response is say, you know, 10 seconds or something. So the server will say for the next 10 seconds you can use this response but after that I may update this response. So therefore do not use it after some maximum duration. So there is an expiration date that the server puts on HTTP responses or it can also, you know, say there is something called cache control header in the HTTP response where the server can say do not cache this at all, I am constantly updating this page, please do not cache it, right. So there is HTTP response headers like cache control and max age that the server will set, you know, the HTTP response consists of all these headers and the actual content, right. So these headers will be set to control the behavior of the cache and what these caches will do is if the cache has some expired content, you know, the user, the client has requested some web page and the cache sees that within its storage it has some expired web page, then maybe but the server need not have updated this, you know, this might still be the valid web page. So how will the cache know what it will do is it will talk to the server and do what is called a conditional get that is it will fetch the page from the server only if required. So it will tell the server, hey look I have a copy of the page that was last modified at so and so time, it will tell the server some information about its cached copy and ask the server is this the most recent one or do you have an updated version? If the server has an updated version, it will send the updated version otherwise it will send a response indicating what you have is okay. So there are ways in HTTP protocol to do this conditional HTTP gets where you will fetch HTTP response only if your cached copy has expired. So HTTP caches frequently do this, okay, all of this is done in order to avoid going to the server multiple times because the server might be far away and it might be slow and hurt your application performance. So the other type of caches are DNS caches, right? We have seen that DNS is a way in which a domain name for example like nptel.ac.in is resolved to the IP address so that your client can actually send data to the particular server IP address and these DNS records are stored at authoritative name servers that is you know you have a top level domain and then under this you have you know name servers for .in then you have .com various top level domains and under this .in you have ac.in which is academic websites and under this you have nptel, okay. So you will first talk to this authoritative name server the top level domain get the IP address of the server then you will talk to the server get the IP address of this server then you will talk to the server and get this IP address in this way DNS resolvers do this hierarchical name resolution to find the final IP address. Now that you have you know found out this IP address after this long process you do not want to do this all the time therefore these DNS records that map from a name to an IP address these will be cached, okay, they can either be cached locally in a machine or once again shared at the DNS resolver. If there are multiple clients in an organization all of which are using the same DNS resolver then this DNS resolver can itself cache these DNS records or each machine can also cache the DNS records, right. So the DNS records not just of this final resolution but you learnt of the IP addresses of many intermediate nodes in between also. So all of these can also be cached because tomorrow if you want you know some other dot in you know government dot in then you can you have cache the IP address of this name server you can directly go there instead of starting your resolution from the top. In this way various intermediate IP addresses as well as the final IP address all of these DNS records are cached and caching is very important because you know as you can see DNS resolution involves multiple network communications and you know it can take few tens of milliseconds. So therefore caching is very critical. Now again the question comes up what if you know this IP address is updated and I have some value in my cache but the actual server's IP address is updated then will I suffer by using an old version of the cached DNS record that is why DNS response once again for any cache you will have this time to live. There is a field in a DNS response called time to live that the server will set saying okay this IP address is valid for so long. After sometime you have to once again do this resolution and fetch the latest DNS record. So this time to live field allows us to update the DNS record after some point of time. So apart from HTTP and DNS caching in general any application can do its own caching. For example you know we have seen this web application architecture where there is a front end, there are various application servers, there are various databases right. This is how a typical application looks like a real life application. So at every stage you can have caches. For example the application server is if it is sending the same queries to the database all the time it can just have a small cache of you know recently received database queries. The front end can keep a cache of the responses received from application servers. If some user has just searched for a certain product and again another user searches for the same product then you can simply send the cached response back. Okay. So in this way between any two components you can have a cache to reduce the communication between these components and improve performance and anything application level data objects, database queries anything can be cached. And all of these caches are again separate software components that sit between various other components. So you are adding extra complexity into the system with caches because you know you are bringing in new components into the system but they improve performance. And frequently for these application layer caches you use what are called in-memory key value stores. That is simple databases that keep a mapping from some key to some blob of data. For this key this is the data, this key this is the data. So such stores of key value stores and they store them in memory. So there are popular software like Redis, Memcache, D that serve as in-memory key value stores and these are often used as caches. Note that it makes sense to keep cache in memory, right? I mean if you are going to disk and you are bringing something again storing it in cache on disk is again slow. So the whole point of cache was to quickly access it. Therefore usually these caches store all of these recent application data in memory. And you know examples are for example if you want to cache the recently obtained images there is an image database you are getting some images. So you use the key as the image name the value is the image contents or the key is your database query and the value is the result of the query you know all the rows of the database table that were returned, right? So in this way use some key to identify the value. The next time a query comes, a request comes for the same key instead of going all the way to the next component you can simply look it up in the cache, right? And we have also seen other examples like CDNs you know whenever there is a web server has some web pages it will push this content out to CDNs, content distribution networks which have replicas throughout the globe you know they are geographically spread so that any client can instead of going to the server it can directly fetch a web page from the CDN itself. So CDNs also cache various application layer objects, web pages, images so that clients do not have to come all the way to the system they can directly get the data from the CDN. In this way at every level you know between application layer components between clients and the system in the form of CDNs, in the form of HTTP proxy servers, DNS caches all through you have various levels of caching happening in computer systems so that you can avoid this extra expensive communication. Now across all of these caches there are certain common design principles you know common things to keep in mind that I would like to point out in the next few slides, okay? So the first question is when do you cache? Just because you know you are fetching something from another component you do not just put it in a cache there are certain guidelines to help you decide is this worth caching is it worth putting an extra cache component between these two components or not and what are these guidelines? So you will only cache when the workload will lead to high cache hit rates. For example you know you have accessed some piece of data from some remote database and then this data will be needed again immediately in the future, right? Only then it is worth putting it in a cache when there is high locality of reference. Suppose you have accessed some data you will never again use it in the future or very less likelihood that you will use it in the future there is very low locality of reference then there is no point caching it. Similarly it is worthwhile caching when there is a skewed distribution you know where some items are very very popular some images are being repeatedly fetched from the database then it is worth putting them in the cache. But if you know once you access an image you will never use it again then what is the point? You know such popular items if there are heavy hitters then you can cache it. So you have to think through does your workload lead to high cache hit rates or not. If I were to you know take the extra effort and put a cache will I get good performance gains or not only then you will cache. The other thing is the cache has to be faster access than or you know somehow closer than the original copy of the data. If your cache is also here you know as far away as your original data or it is slower than accessing the original data then what is the point? There is no point in caching. Therefore usually caches are in memory caches so that instead of going to disk you can get it from memory CPU caches are a different kind of technology called SRAM that is much faster than the DRAM that is used for main memory. So in this way caches are useful only if they are faster and or closer. Then the other thing caches need is you need a good eviction policy of course if your cache is big enough to store all the data then why not just store all the data here right that will never happen because your caches are somehow different technology they are more expensive you can never store all the original data you can only store a subset of it. Therefore which subset of the data will you store what if you know the cache is full I have to throw away some older item therefore you need a good eviction policy which will help you decide once my cache is full then I want to store another item I have to throw away an old item right. So your cache should be in such a way that you can implement a good eviction policy which will ensure that you do not want to evict a very useful item as soon as I throw this away immediately if somebody asks for it what is the point it is a bad eviction policy. So my eviction policy should be good like LRU least recently used is a common eviction policy used in all of these caches you know something has not been used for a long time most likely nobody will need it in the future again let us throw it out. And this eviction policy should be able to implement it easily without too much overhead for example finding the least recently used item should not take like you know 10 millisecond then it is very inefficient. So things like that your eviction policy should be good and the other thing is your cache has to be large enough to accommodate the working set size. So suppose the application is using some subset of this data frequently that data should be accommodated if your cache is very small whenever you want anything in it it is not there it is not very useful right. So there are all of these guidelines that tells you for a given application for a given workload given system is it worth using a cache or not. Now the other thing is where is this cache located right suppose you have a client and you have some server from which you are fetching some data which has the original copy of the data know that these are general terms you know this client can be the CPU server can be DRAM client is basically something that is requesting data from a server and now you have a cache in between. So the question is where is this cache located relative to the client and the server. So there are two types of caches okay one is what is called an inline cache what is an inline cache an inline cache is a cache that sits directly on the path between a client and a server that is between the source of the data and whoever wants the data that is an inline cache what does this mean the client will never directly talk to the server the client will always check the cache if the item is there in the cache the cache will return it if the item is not there if it is a cache miss then the cache will talk to the server fetch the item and then give it back to the client. So the cache is always in the middle that is an inline cache the client will never directly talk to the server examples are most of the caches we have seen so far like the CPU cache is an inline cache the CPU will always check the cache first and then whatever has got from memory is put into the cache and then return back to the CPU write the disk buffer cache all of these are inline caches the other kind of cache is what is called a write a side or a look a side cache that is the client will first check the cache and if the item is there in the cache well and good if the item is not there in the cache the client will directly talk to the server and get the data item and later on update the cache. So the cache is not in the middle it is on the side it is not on the direct path the client server can directly talk to each other the cache is only on the side and the client or the server they will update the cache whenever some data changes. So for example if the server changes the data it will tell the cache hey you know update your copy or invalidate you know this cached copy has to be either marked as invalid or it has to be updated. Similarly the client can update the data at the server and then the client will tell the cache hey update your copy but the cache is not involved in this communication between the client and the server. So what is an example of a look a side cache we have seen one before the TLB right if you remember so the TLB is sort of on the side the MMU will check the TLB and if it is a TLB miss the MMU will directly access the page table get the address translation and then update the TLB. So the TLB is a look a side cache that is why it is called a translation look a side buffer it is not on the path between the MMU and the page table. So these are just two types of caches for you to you know when you look at a cache you should be able to understand what type of a cache it is. Then the next question comes up how do you populate the contents of the cache ok you have a master copy of the data somewhere at the server and some clients need this cached copy how do you populate this cache. So again there are two ways ok. So there is what is called a demand filled cache that is the content is populated only when needed you know if the client is there the client will request some data only then the data will be put into the cache only then the data will be fetched from the server and put into the cache that is a demand filled cache that is the client will pull the data when needed into the cache. Things like CPU caches, HTTP caches are all demand filled caches and when with demand filled caches it can so happen that if you have multiple caches you know this client is requesting some data and this client is requesting some other data then these copies of these different caches might diverge. For example this CPU core has requested some memory locations they are there in this cache this CPU core has requested some other memory locations they are there in this cache. Now the CPU core has updated some memory location but that would not automatically come here because why it is a demand filled cache only if the CPU core requests it will come into this cache right. So if you have multiple caches the contents of these caches might diverge in demand filled caches because based on demand only you are updating it you are not generally all the time updating the cache contents ok. And then you have proactive caches which is the server will actively push whenever any data changes the server will actively take responsibility to populate all of these caches and keep them consistent. For example in some CDNs you know the server has distributed its files to CDN the server will update the CDN replicas whenever you know a web page changes it will push it to the CDN replicas. So in such proactive caches it is easy to maintain consistency across these multiple caches but otherwise if it is a demand filled cache you know some CPU core has updated some item here so then this CPU core may not have the updated copy of that item because it may not even have the data in its cache line right. In the case of CPU caches when you have you know private caches to every core in such cases if it is demand filled it is harder to maintain this consistency. So now the important question with any cache once you have multiple copies of data you know you have your server that has the master copy of the data and then you have multiple caches throughout you know and each of this is caching some subset of this data. Then the question comes up how do you keep all of these cached copies of the data in sync you know if some data changes here it has to be updated in all the caches if the server changes some data it has to be updated in all the caches otherwise if you have old values of data in these caches then it will be incorrect for the application to use an old value of data. So with caches this challenge comes up that now instead of one copy of data you have created multiple copies of data all through the system and somehow you have to take the responsibility to keep all of these multiple copies of the data in sync and that problem is what is called cache consistency. Anytime you have multiple caches and clients are accessing data from multiple caches you have to ensure that everybody accesses the same consistent version of the data. So the question comes up how do we guarantee this cache consistency? So we will see some of the ideas to do that. So let us take the example of CPU caches you know this is something we have seen a lot. So in a CPU if you have two different CPU cores each CPU core has its own some private caches like the L1 cache L2 cache they are private to this core and then you have a common this L3 cache is a shared cache and then you have DRAM right. You can have some structure like this some caches only this core has some caches are shared across cores and so on. So now across all of these caches you have to maintain information for example if this cache has certain memory location x stored here and this CPU has updated this memory location x then when this CPU also wants to access x it needs to know the information that oh no the updated value is here. So this information of which cache has which item this needs to be tracked and how is this tracked? There are two ways one way is what is called looping that is whenever a cache gets some data item all the other cache is also snoop oh this guy has obtained this memory location from RAM this core has obtained this memory location from RAM right. So everybody is snooping around to see what the others are doing and based on this everybody knows now this core knows oh in the past I have snooped I found out that C0 has this memory address x. So the next time I want it I should check with C0 for its latest copy in this way you can snoop or you can maintain a directory that is all CPU cores can keep a directory which has information about which CPU core has cached which memory locations right. So these are two common techniques used at the level of CPU caches to keep track of which cache has which memory location. And in some systems of course it is easy to keep track for example in CDNs the server has all the master copy of the content and the server is distributing you are you know entering into a contract with a CDN provider and saying oh please cache this content in all of these different places. So it is easy to know because you are the one who is actively pushing content out to CDNs it is easy to keep track of which CDN replica has which web page. And in some systems like HTTP caches it is very impossible to know you know if people are caching web pages in their browsers it is very impossible for the web server to look at the caches of all the billions of internet users and see who has this web page and whose cache. So therefore in some systems like HTTP caches it is impossible to keep track of all the copies of the cache data. But in some systems like CPU caches you have to keep track of which cache has which memory location so that you can consistently access the latest copy. Now the first problem in cache consistency is keeping track of the replicas. The next problem is updating all the replicas how do you keep these replicas and sync with each other. So that is what cache coherence protocols do. So now when one CPU core you know the CPU core in its L1 cache has you know cache to memory location X a cache line X and this other CPU core also in its L1 cache also it wants to access the same memory location X. Then what will you do? Then there is some cache coherence traffic that runs between these two CPU cores in order to synchronize the value of this memory location. So whenever core C0 updates this memory location then C1 will either update its value also or it will invalidate its value. It will say fine let me delete this thing from my cache. So when one CPU core updates a value in its private cache all other copies of that data and all other CPU cores have to be synchronized in two ways either all the other cores also update the value. You know they will also write the value and you know update it to the latest value pushed by C0 or they will invalidate that value. They will mark it as invalid and say later on if the the CPU core request the same memory location then I will fetch it. I will not worry about it for now. So you have to keep these multiple replicas of a data if you know a data item is cached at multiple locations. If one copy is updated all the other copies also have to be kept in sync they all have to update or at least they have to invalidate. They have to remove it from their cache so that the next time they need it they will get the latest value and not use their old stale value. Now this you can do in CPU caches because you have kept track you know using either snooping or a directory you know if this memory location is there in all these caches. What about things like HTTP caches where you do not know where an item is there a web page in which all browser caches it is there all around the world a web server does not know. In such cases when a web server changes a web page then how do we ensure that all the replicas are kept up to date it is very hard. Therefore in such cases what we do is we usually identify the latest copy of the data in some way using some sequence number version number. So whenever the web server updates a web page you know it has made some changes to a web page it will keep some time you know the modified time the last modified time or some version number something you will maintain for the data item so that when this there are multiple caches that might have older versions of this data item when a client requests some HTTP cache then this cache can go check with the server saying hey I have this version of the data or I have a web page with this modified time is that what you have also. If the server also has the same latest version then this cache can simply return its value or if the server has updated then the cache will get to know that you know oh what I have is the old web page the newer web page has a higher version number or a later modified time let me fetch it. In this way you need some way of identifying this is old data this is fresh data so that when the cache is of course the server cannot you know push these updates to all the caches all over the world but the caches themselves can check oh this is my version number that is the latest version number let me update. In this way you have to put some kind of a identification number like a sequence number version number time stamp something on these data items so that whenever caches have to access the same data item again they can compare is their version number is their time stamp the latest or not and if it is not the latest they can either update their copy or invalidate their copy or something like that. In this way these are some of the techniques available to maintain cache consistency. So that is all I have for this lecture in this lecture what I have shown you is many different examples of caching in computer system from CPU caches to disk buffer cache to TLB to HTTP caches to DNS caches application layer caches we have seen many different examples and in addition to that we have also seen some common design principles across all of these caches like you know where is the position of the cache it is is it in line or is it on the side how is the cache filled is it demand filled or is it proactively filled how do you track all the replicas of cached content how do you maintain consistency across cached content. So we have seen some common principles that are widely used across all of these caches. So the next time in a computer system you have to design a cache keep all of these in mind and see which of these patterns will fit your requirements when you are designing a cache. So one exercise for you to do is you know you can examine HTTP headers in Wireshark and observe these you know last modified time cache control max age there are several HTTP headers that are used that decide which HTTP pages which web pages get cached in your system and which don't. So observe these things to understand how web servers today control caching in HTTP proxy servers and HTTP caches. So thank you all that is all I have for this lecture let us continue this discussion in the next lecture. Thank you.