 Welcome to the Edge Caching Dynamic Apps talk. So we're going to talk about content delivery networks. So who here knows what content delivery networks are or uses them? OK, awesome. So my name is Michael May. And my love affair with content delivery networks started a couple years ago when I started a company with my friend Richard Schneemann down in Austin, Texas, called CDN Sumo. And CDN Sumo is a Heroku add-on. The idea being for it to be super easy and quick to get performance benefits with a content delivery network and without actually having to know too much about how they work. So at the end of last year, CDN Sumo was acquired by a company called Fastly. And Fastly is a content delivery network that's built on the open source varnish cache. So today we're going to talk about caching and some ideas that make caching work. We're going to talk about content delivery networks and how those actually work. We're also going to talk about some interesting and innovative features in some content delivery networks that enable this thing called dynamic edge caching. And then finally, we will talk about how you can integrate dynamic edge caching into your Ruby and Rails apps. So the whole idea behind caching is that I have a piece of data. It lives somewhere, whether that's a hard drive or a database wherever it is. And I access that piece of data pretty often. So I want to make that access quicker. And the way you do this is by moving that data closer to where it's being accessed and also moving it into some sort of storage that works faster than where it normally lives. And so also a side effect of caching is that when you cache data, you can also reduce load on the original storage location where that data lives. So let's start out and define some terminology that I'm going to use. The first is cache hit. And this is just when requested data is contained in the cache. Now the inverse of that is a cache miss. And that's when the data is not in the cache. There's also this thing called the hit ratio, which is the percentage of accesses that result in a cache miss. So the higher your hit ratio, the better performance you're going to see. There's also purge, which is when you remove data from the cache. And your origin is just your HTTP application server. And finally, the edge is an HTTP caching server. Now so caches generally are much, much smaller in storage space than your original storage location. So we need some sort of strategy to determine what should live in the cache. And so one of the ways, one of these strategies is called least recently used, also known as LRU. And so if we were to define an interface for how this worked, we'd need a set method, which puts things into the cache. We'd also need a get method, which would take things out of the cache. Now these get and set methods are also going to do something else. Because caches have a limited amount of storage, so we need to keep track of how long a piece of data has lived in the cache. So the way we do this is the get method. When it puts something in the cache, it will also move a reference to the head of the cache. And the head of the cache being the location of the cache that was most recently accessed. Now the set method is also going to, if the cache is full, call a prune method. And so this prune method will just remove an object from the tail of the cache. And the tail of the cache here is an object that's been in the cache for the longest amount of time without being accessed. So LRU is just a strategy to determine what data should live in the cache. And if we were to implement this, we could use two different data structures. We use a hash table to store the actual data. This gives us those constant time get and set functions. And then we could use something like a doubly linked list to track the time since last use. So I'm going to briefly mention all of the different types of caches. So at a very foundational low level, you have the CPU and hardware cache. And on top of that, you have main memory, which is a cache for your hard disk. And then on top of that, you have software caches and application caches. And then further up, you have things like memcache d and varnish cache. And at the very top, we have content delivery networks, which we are going to talk about. So a content delivery network is a globally distributed network of cache servers. And so the way this works is when people request data, those requests are routed to edge caches. If the data lives in the cache, that request will be immediately returned to the user. Otherwise, the cache will go out to your origin server, fetch that data, store it in the cache, and then pass it back to the user. Now this edge cached circle in the middle actually looks like this. So we have edge caches all over the world. And we call these pops, which stands for points of presence. And so the way that this works is say you have a user who is in Sydney, Australia. And your application server is in Ashburn, Virginia. Now there are many thousands of miles between Ashburn, Virginia, and Sydney, Australia. So it takes a long time for data to travel there. So the whole idea behind content delivery networks is to offload as much data as possible from the application server and put it somewhere closer to the user. And that's the whole idea behind content delivery networks. Now traditionally, when you think of content delivery networks, you think of them as typically only caching static content. And so static content being things like images, JavaScripts, style sheets, things that don't change very often. And now the good thing about static content is that it's super easy to edge cache, right? Because we have things like the asset pipeline, which is super easy to use. You define an asset host config. You set that to your CDN URL. And then when you compile your assets, the asset pipeline does its thing and automatically builds your reviews with the links out to the CDN. So that's super great and super easy to do. My thinking on why content delivery networks typically only cache data is because caching data is an inherently hard problem. And then so when you distribute your caches all over the world, it just multiplies that complexity. So let's stick into content delivery networks. And let's talk about how data and requests actually get to the edges. Then we'll talk about controlling this content using HTTP headers. And then we'll talk about some lesser known interesting CDN features. So how data gets to the edge, the first thing we need to know is that there are two types of CDNs. There are push CDNs and there are also pull CDNs. So the way that push CDNs work is when you manually sync your content out to the edges. And this usually happens when assets change. So if you add some new images or you update a piece of JavaScript, you're going to have to manually go out and also sync those to the edge nodes. So one of the downsides of doing this with push CDNs is that it can be harder to keep your assets in sync, simply because we're human and we can forget to do stuff. One of the other things to note here is that if you have a lot of content and you're pushing this content out to your edge and your origin server is under a high load, this is just going to increase your load there. So that's the thing to watch out with push CDNs. Now the other type of CDN and the one that we're going to talk about for the rest of the talk is called a pull CDN. And the way this works is that the CDN actually pulls content from your origin server. So this is great for us as developers because we get seamless updates when content changes. So we don't have to worry about syncing data to the edge. Now however, there is a small latency cost on the first requests that you make. And that is because pull CDNs act as reverse proxies. And if you're not familiar with a reverse proxy, it's just a HTTP server that sits between your user and your origin. So when requests are made by your user, they first hit the HTTP proxy server. And then the proxy forwards those requests onto your origin. And if we look at how this works for pull CDNs on the first request, the user makes that request for a piece of data that request is routed to the edge. That data is not going to live there because it's the first request. So that cache will then make a request to the origin. The origin passes that data back to the cache. The cache caches it and then forwards the request back to the user. So that's kind of a long process. But on your second, third, and any subsequent requests, these will be super fast because when your user makes a request, they'll go to the edge node. And the edge node can come back directly to your user with that content. So you'll note that if there's a large distance between your user and your origin server, this is where content delivery networks come in. So now let's talk about how requests get to the edge. And when requests are routed, they're usually routed to the edge closest to where you are located. And the way this happens is when the client makes a request, this request triggers a DNS lookup. And DNS will resolve this request to a specific geographical region and then forwards that request onto the CDN edge. So this is a pretty high level view of this. There are a lot of people that I work with who know way more about this than I do. But this will just give you a good idea of how those requests are routed. So now we know how requests get to the edge and how your content gets to the edge. Let's talk about how to control that content. And the way we do that is with HTTP headers. So there's this thing called the cache control HTTP header. And I'm sure many of you have used these and are familiar with them. The way that the cache control header works is it defines what requests can be cached and for how long to cache them. So the whole idea behind cache control is to prevent any sort of caches interfering with the normal request and response cycle. And if you remember the LRU strategy that we talked about, using cache control headers will override any default caching mechanism that's built in. So there's a number of directives that we can specify the cache control. The most common is the max age. And we pass in a time in seconds. And so cache control max age, this defines how long the response should be kept in the cache. And this request, and this directive, is respected by all shared caches and private caches. So here when we're talking about shared caches, we're talking about a content delivery network. And we're talking about private caches. We're talking about the browser. And generally, you use max age as kind of an umbrella strategy to say, hey, cache this. There's another directive in cache control called private. And the way that private works is it indicates to shared caches that this response cannot be cached. However, private caches, likely browser, can cache it. So you generally use private when your responses contain some type of sensitive user-specific information that you don't want to live in a public shared cache. Now there's also another directive called s max age, which is a little less known. And s max age indicates how long to keep a response fresh, i.e. cached. And this is only respected by shared caches. So browsers won't honor this. And this is really useful if you want to set different cache links for your CDN and your browser. You can also specify for things not to be cached. So you can use no cache, or you can use max age by setting it to 0. There's also another HTTP header called surrogate control. And here surrogate refers to reverse proxies. And so in a similar fashion, you can specify a max age. And what this surrogate control max age header does is indicates how long to keep the response cached. However, it's only respected by reverse proxies, like a pull content delivery network. So here's an example where we're setting both the cache control and the surrogate control. We're setting cache control to 3,600 seconds, which is 30 minutes. And we're setting surrogate control to a long time, which happens to be a year. So what happens here is the surrogate control header will take priority over cache control. And the surrogate will cache this content for one year. The browser will cache it for 30 minutes. So this is useful if, for instance, you want your users to periodically update your content, just in case something changes. So they always have the freshest version of the content. But that content will live on the edge for as long as you specify up until you update it. Now, also an interesting thing to note is that the surrogate control header is sometimes stripped out by the surrogate cache. And if you think about it, this just kind of intuitively makes sense, since browsers and private caches won't honor this. We might as well just strip the bits out and save the bandwidth and not send it to the browser. Now we're going to talk about some interesting features that you might not know that content delivery networks, some content delivery networks, have. And before we talk about the first one, we have to know a little something about good old transmission control protocol, also known as TCP. And this is a protocol that web browsers use to connect to servers. So it's important for us to know something about this. But specifically what we need to know is how TCP establishes connections. And this happens with this thing called the TCP handshake, also known as a three-way handshake. So what happens here is when the client wants to connect to the server, the client will first send a SYN, which specifies some synchronization data. The server, when it receives that SYN, will send back a SYN-ACC, which is an acknowledgment that it got the client's request. And some more synchronization data. And then when the client receives the SYN-ACC, it will, again, send an acknowledgment back to the server. And then data can start to flow through this connection. So this is a pretty long process. And depending on your connection, this can take 100 milliseconds or something. So that's a lot of latency to do just for opening up a connection. So there's something called HTTP Keep Alive. And this is actually an HTTP Keep Alive header. So the way it looks is connection keep-live. And what this does is keep the TCP session open for an extended period of time. So what we're doing here is eliminating this TCP handshake that has to happen by connection, by instead keeping the connection open. Now when we're talking about Keep Alive and content delivery networks, what this means is that edge nodes will do Keep Alives with your origin server. So this means you will get faster cache misses, because when the edge has to go back to your origin to request content, that connection will already be open. So you won't have to spend the time doing that handshake. In some content delivery networks call this dynamic site acceleration or DSA. So the next thing we're going to talk about is instant purging. And the way this works is this goes from right to left. So the way this works is when a user or someone or whatever updates a piece of content on your origin, when that update happens on your origin server, the origin will issue a purge request out to the nearest edge. And so when the edge receives that purge, it will do the purge and then pass that on to all of its nearest neighbors. And you basically recursively do this until some period in time when you determine that all of your edges have purged this data from the cache. This is actually a pretty hard distributed systems problem. We have an engineer at Fastly who's done some research and given talks on this. It's really interesting. So I put the link up here. It's brusespang.com slash bimodal. And slash bimodal because the algorithm that we use is called bimodal multicast. So at Fastly, when we do these purges, they have an extremely quick. So they happen globally in less than 300 milliseconds. So when you think about it, that's an extremely fast time for anything to happen globally. And so because these purges happen so fast, this is actually what enables dynamic caching at the edge. And we have a blog about our Instant Purging feature if you're interested, fastly.com slash blog slash building fast and reliable purging system. So Instant Purge enables dynamic caching. So we're going to explain what that is. We are going to use an example to explain Rails fragment caching as well because that is a type of dynamic caching. And then we're going to map how that works to edge caching. And processes are very similar. So first we need to define what dynamic content is. And dynamic content is data that changes frequently. However, frequently changing data doesn't mean that it's continuously changing. So there's periods of time when this data is not being changed. And so dynamic caching is when we cache this data in between when it's being changed. And this can be extremely beneficial when your dynamic data is being requested multiple times. So to give you an example of some things that we can dynamically cache, we can do things like API requests, comment threads, news articles, product inventories, and even things like search results. So the way that dynamic caching works, there's three steps. So the first one is that you need unique cache keys. The second step is that you need to bind a piece of data to these keys. And the third is that you need to purge this data when it changes using the cache key. So as an example, let's take a look at Rails fragment caching. And so the way this works is in your views, you define things that you want to be cached with the cache keyword, and then you provide a cache key. So this will prevent generating your entire views every time and can speed up how your views are built. And now when that data changes in your controller, you will issue, you'll call the expire fragment method with your cache key. And that issues a purge and removes that data from the cache and then updates it once it's regenerated. So I want to tell you that if you understand and can do Rails fragment caching, you can also do dynamic edge caching. The processes are very similar. So just to look at a real world example real quick, let's take a look at a dynamic page like Pinterest. So if we were to cache this dynamic content here, we might break it up like this. So the first thing that we might want to cache is these individual pins, right? This entire thing. If we wanted to take it a step further, we could cache the user content here, the avatar and the name of the user. And then if we wanted to go even further, we could cache things like the number of repins and number of likes, things like that. And of course, since these images and avatars are static, we can also cache those. So this is just kind of a strategy to give you an idea of how you might break up some of these dynamic things to cache. And so what happens is when this data updates, you issue purge requests and the cache is refreshed with the new data. So let's talk about how you dynamically cache things on the edge. So remember, the first step in this process is to create your unique cache keys. And with CDNs, we use these things called surrogate keys, which are your cache keys. And surrogate key is actually an HTTP header that tells the cache to associate a key with a particular set of data. So step two is to bind a piece of data to this key. And the way we do this is we set surrogate key response headers on HTTP get methods. So as an example here, we have a product endpoint. We're doing a get for a specific product. And so on the response in the headers, there's this surrogate key header. And here we're specifying two keys. The first is products, which will map to the entire set of products. And the second one is a singular key, product slash ID, and that will map to just the singular product. Now the third thing we need to do is to purge this content when it's updated. So what we do is issue purge requests on HTTP post put delete methods, anything that will update data. So here, following our same example, we have this product endpoint, we're doing a put, we're updating the price. And if this was Rails, we'd have a product controller with an update method. So in that update method and your request handler, you would issue a purge request with your surrogate key. So that will purge this from the cache and then refresh the cache with a new data. So let's talk about now how we can do this with Ruby on Rails. So heads up, there's a gem called Fastly Rails, github.com slash fastly slash fastly Rails. And this contains a lot of helpers that I'm about to show you to make it super easy to do this. So the first step, right? The first step was to create unique cache keys. So in Rails land, what we do is we're gonna extend our models with surrogate key instance methods. So the way this looks, here's our product model. We're defining a thing called a resource key. And so this will be a unique key for a singular instance of a product. There's also a thing called the table key. And this is going to be a key that will map to the entire set of products. So that's step one, creating cache keys. Step two is to bind data to these cache keys and the way we do that in Rails is we add helpers to Action Controller to set up these helpers. So in our products controller, the first thing you'll notice is that we have this before filter called set cache control headers. And you'll also notice that we're only doing this on index and show methods, which are our HTTP get methods. So what this method looks like, we're setting cache control response header and also surrogate control response header. And we can define whatever values we want for this to live in the cache. You'll also note here, the last line here, we are stripping out the session data so we're not sending any session data or cookies to the public cache. And the reason we're doing this is because session data is private data and we generally don't want it to live in a public shared cache. Now back to the products controller, we have our index and show methods. You'll notice that we are, before we respond with our data, we are setting the surrogate key header. And in index, we're using the table key. And in show, we're using the record key because the table key maps to the entire set of products which index does. And the record key maps to a singular one, so it maps to show. The way this method looks is just like this. You set the surrogate key header response key to the key that you pass in. So there's the first two steps. The third step is to add purge instance methods to your models. And what this looks like is here we have a create action in our products controller. So after we create a new product and save it in the database, we'll issue a purge all. And so this purge all will purge the entire set of products from the cache and then subsequently regenerate the cache with the new data and this product that we just created will be contained in that. So likewise, we can do that with the update method. So after we do the save, we call purge. That will delete this product from the cache and it will get refreshed on the next request. And likewise, we can also do for delete. This works the same way as create. After we delete it in the database, we do a purge all. That will purge the entire set of products and then refresh it so that the product we deleted will not be there. So there's the three steps to integrating dynamic edge caching with your Rails app. And all of this stuff I talked about is in the Fastly Rails gem. Now if you decide that you wanna go and do this to your Rails app, the best way that I found to do this is to take an iterative approach and go kind of end point by end point controller by controller and add this just as a way to keep your sanity. So that's about all. And so just to wrap up and hit some key points. If you're not already offloading static assets for your web apps onto a content delivery network, you should do that because things like the asset pipeline make it super easy. And you can get really great performance gains just from doing that alone. The second thing is that we talked about instant purging and HTTP keep lives. And so purging is what enables this dynamic caching to happen on the edge. And if you do decide to add this to your Rails apps, which I hope you do because your users will love you, take advantage of all of those helpers that exist in Fastly Rails. And so that's all I have. Thank you so much for listening and thanks to all the organizers of Garuko.