 Hi everyone by my count, there's like eight other things you could be doing Including not working. I mean not doing anything or going doing some work and you chose to come in here. So thanks for that My name is Hooman. I'm gonna talk to you about CDNs today. I'm the VP of technology of a company called Fastly We have an hour. I don't know if I'm gonna talk for an hour, but I do have a lot of things to tell you How many people know what a CDN is? That's great. How many of you are using a CDN? I'm really glad that the first number was bigger than the second number. That's great So what I'm gonna do today is I'm gonna talk to you about CDNs. I'm probably gonna I'm gonna take you through a journey I'm gonna probably tell you some things that you already know Hopefully I'll tell you some things that you don't know and my goal is this is a successful hour for us If at the end of this you have some ideas some new ideas that you maybe didn't have before some new Ways of using the CDN that apparently all of you are using so that's great news And that's how I'm sort of gonna do this session. I have like a hundred and some odd slides So there's a lot of stuff to tell you but hopefully at the end of this you'll have Some new ideas or some new ways of thinking about what you think is a CDN, but is actually a lot more So just to review CDN is a globally distributed network of caches spread all over the world The CDN caches cash content maybe do some other things and depending on which vendor you look at footprints vary This is what ours looks like But depending on I mean I can only have an access to our map depending on which vendor you look at yes This map will vary and CDN has a bunch of benefits, but you can probably distill everything down to Three things are faster because you've gotten your content closer to your users You get a lot of stuff offloaded from your origin so your origin costs will go down the amount of Processing that's taking place that your origin servers will go down and you can do a lot more With the CDN and that's sort of going to be the crux of this Conversation, but we're gonna start so much basic We're gonna start with the notion of caching This is what CDNs were built for and this is one of the first things that everybody thinks about when they think about CDNs So we're gonna start with caching. We'll start with basic. We'll do some cool things and we'll get into some concepts that Maybe help you look at a CDN in a way that you haven't looked at it before caching was introduced to all of us in this beautiful document called RFC 2016 in 1999 section 13 chapter 13 of 2016 was dedicated entirely to caching I am proud to say I've probably read it a hundred times I am also only gonna say that to this room You may not know however that about four years ago three years ago The 2016 was actually divided into six different RFCs and there was an entire RFC dedicated to caching's RFC 7234 which obviates 2016 but is most of the same concepts and Adds a few clarifications if you will these documents gave us the beautiful cash control header Which gives us a lot of knobs and levers when it comes to caching our content when used in responses It gives us basically a directive to instruct upstream caches for caching different things You've probably seen this header before this is me telling an upstream cash to cash a thing I'm a serving it for a year. That's numbers in seconds So if you have a user that's coming to a CDN and the CDN that's coming to origin and you put this header on it Essentially what you're doing is you're telling the caches in the world that they can cash this thing that you're serving them for a year The problem is You're actually directing every cache Which includes the CDN cache and that cache of your browser your browser has an HTTP object cache You're telling them both the same thing sometimes you may not want to do that You may want to tell them two different things So the spec also defines a second directive called S max H Now don't ask me why the hyphen is between max and age in the first one and between S and max age in the second one Why it's okay to not hyphenated and won't I have no idea, but I guess there's a one hyphen limit But the S stands for Shared and what this does is essentially sends out two different directives to the to the world max age in this case 600 directs the browser to cache that thing for 10 minutes and 604 800 which I think is like a week in seconds Directs every shared cache which in this case would be a CDN cache So that's cool. That's a second directive that lets us sort of granularly cache things in the ether But sometimes you have another cache you have another shared cache a corporate cache and middle box cache Whatever that's sitting upstream still from a CDN and maybe you want to direct them all to do caching differently The spec the original spec doesn't give us a mechanism, but there is a second header So what the S max age does here is it essentially directs both the CDN note in the shared cache To cache things the same way and if you don't want that if you want that be granular There's a second header called surrogate control which isn't it standard But a lot of CDNs actually implement this and what this does is it lets the cache control headers direct The way the browser cache and a shared upstream cache cache something and the surrogate control essentially invokes caching in the CDN For a little while and I think it still happens CDN caches or reverse proxies were known as surrogates So this is why this header is called Surrogate control so this is a way for you to use cache control headers or HTTP headers in general To direct different caches upstream to cache things differently for you a lot of times We see a lot of people that want to do things granularly upstream This is a way to do that control now the the seconds that's Included in these headers is essentially how long something is allowed to be cached for better put What this actually means is once those seconds run out that content on the cache becomes stale When content becomes stale Technically a cache is still allowed to store it, but it can't serve it without doing something called validation or re-validation Revalidation happens to something called a validator if you've ever heard of last modified or e tags Those are official validators in the in the spec What happens is a request comes into the CDN note for that piece of stale content And the CDN says with a cache any cache says I have this content, but I can't serve it because it's stale Now it ran out of seconds like my max age ran out So what it's got to do is got to send a request to origin Essentially saying can I keep serving this thing? It's that's called the revalidation or validation and the origin Can respond with a 304 which you've heard before which is basically hey what you got is good keep serving it That's a non modified or worse yet The origin can actually send an entire new piece of content which says what you have is gone. It's over I have something new for you this line is is bigger because that content actually involves data Where if it was a 304 it's just a single like it just a header that says single packet that says nothing That just says you can keep using what you have in this case It's actually giving you new content the problem with this is this takes time and That's not good because you still even though you're serving the thing that you have in your cache You took a little bit of time to get to origin to Revalidate it and usually with CDNs your users are hitting a CDN cache That's not necessarily close to your origins that takes a lot of time network time From the west coast of the US the east coast of the US is roughly 70 milliseconds round trip and that could end up adding up so This is what you kind of have to go through once you end up with still content and this is how you validate it, however Sometimes you're okay serving still content for small periods of time While you're still doing this validation and if you're okay with that there is help The rfc 5861 showed up not too long ago I guess it's seven years now so it's long ago now and essentially it's an extension that defines a couple of extra cache control Directives for handling stale content the idea behind this RFC is if you're okay serving stale content We're going to give you some knobs and the knobs come in the way of two directives One is called stale while we validate what this means is you got to cache that object for that many seconds Which is I think is a week and then when go go stale while you revalidating it So for that round trip time to origin you can serve you can serve the stale piece for 60 seconds That's what this means so this essentially allows you to not lose performance while you're validating if you're okay Serving something stale for a short period of time and that's a great piece of control because a lot of times We are actually okay with serving stale content. We just don't want to keep doing it We're okay with doing it for a little period of time So normally we see this pattern where you cache something for a long time and for a short period of time You allow the serving of the stale version of that object There's a second directive in this RFC called stale of error And the idea here is that you can serve that object if your revalidation attempt to origin return with an error If in other words, you can't hit the origin if there's an error condition This is great for protecting yourself against downtime Essentially so we see a lot of people that tag their content with one of these two directives to allow Better handling of stale content while some when something becomes stale So through these cash control directives the RFC is essentially allowed us gives give us a lot of law knobs and levers to Cache things with some level of granularity Now there's a question of what we cash and when we cash them and this brings us to a discussion of the different types of content And to talk about the different types of content I'm going to take you back in time to the beginning of time or 1998 Where our content was essentially divided into two categories static content, which was most of it images JavaScript CSS maybe back then Lots of pictures basically and dynamic content, which is what we basically called everything else And this was maybe the the the breakdown We got better. We got smarter The web became 2.0 a whole bunch of other buzzwords entered our world and maybe this breakdown turn into this So there was more dynamic content. We learned about something called Ajax It was fascinating. It was a great time to be alive And what we did is we saw the world through this lens half of our shit was static half of our shit was dynamic and What we what we essentially when it came to caching we said all the static stuff is cachable and all else We're gonna call dynamic and we're gonna make that not cachable This is the the the conundrum would put ourselves in and this is actually a huge the service We're doing to ourselves and it turns out that this is not actually true There is a third type of content. It isn't static and else. There's actually a third type of content and for The best way we could describe it. We call it event-driven content So the idea is this you have static content, which you know about you have dynamic content that is Absolutely uncachable under no circumstances. Can you cash it those things exist? And then there's this third category called event driven event driven content is an imposter It acts like static like dynamic. It looks like it's dynamic content, but it's actually cashable the problem with it is It's unpredictable how long you can cash it. You don't know how long you can cash it a priori. You have no idea let's take out let's say you're hosting a game of Thrones wiki and You have a page for your favorite character and that thing is looks exactly like that for a long period of time Then a horrible death falls upon your character. There's dragons. There's zombies. There's molten metal There's dogs all sorts of shit and your and your favorite character dies and that page gets edited like 300 times in five minutes Right that is a piece that you could not predict beforehand. You had no idea. They were going to be dogs So you don't know ahead of time that there is a certain cash control header you can Put on this content. This is the problem with event driven content because we don't know ahead of time because it's unpredictable we kind of always thought of it as Dynamic and that is a thing. It's a disservice. We did to ourselves Let's review and then I'm going to talk to you about how to handle event driven content three types of content We have static content these things change infrequently and we know ahead of time how long we can cash them for Examples are things like images and JavaScript and CSS and cash control headers here enough We know how to deal with these. These are very well defined We put a cash control header on there cash it into ether and everything's cool Then we have dynamic content which is totally uncashable. Absolutely cannot be cashed Has to go to origin all the time and this is handled With CDNs through a mechanism called DSA or dynamic site acceleration That's just one of the names that's used And the way we mark this for CDNs to let them know that they can't cash it is also defined in the spec We know we use no cash no store headers We can also put a private directive on there There's a number of directives that we can put on cash control headers that essentially tells middle caches never to cash it And you send this through a CDN CDNs handle dynamic content different ways. They Usually go from the CDN node to origin and do some fancy TCP things to sort of accelerate the delivery or better yet They do this symmetric transport where they have Big channels between two CDN nodes and lots of TCP things happen between those nodes and that's even more optimized And that's the way CDNs generally handle them This is stuff. That's totally 100% non-cashable totally dynamic content Then we have this third category called event driven which is static, but it's unpredictably static We don't know ahead of time how long we can cash it for Stories are new stories like examples are new stories like we talked about wikis like we talked about sports scores stocks Whole bunch of stuff that actually falls under this category and we don't have cash control headers for this We don't have the spec didn't give us a mechanism to cash these things by so we're screwed The problem is these things are so cool. If you cash, here's a waterfall trip You guys have all seen waterfall charts before right raise your hands if you have thank you If you haven't please go get yourself familiar with world world charts and turns out that most of like base HTML falls into this category. In fact a lot of CMSs by default Tag their base HTML with cash control headers to make them uncashable because that's the safe way to do it Now here's a waterfall chart where I've put everything that's not the base HTML on a CDN You'll see all those little green Portions of those bars. That's the time to first bite everything is small for all the static objects But it's huge for the dynamic object because that thing has to go home that thing that call is incredibly blocking Look at the benefit. I get from moving that look That's a lot of performance benefit that I can get from caching that one particular object on a CDN But I can't because I don't know ahead of time how long I can cash it for And the reason I can't do this and the reason I there's this I'm in a mess and a bind with this It's because the spec never gave us a mechanism to do this to do invalidation This is our conversation now about invalidation Which is a roundabout way of saying how we can solve this problem and I'll get you there in a second But first the history lesson Here's our C26 16 and the word cash appears on that RFC 616 times So the RFC gives us lots of mechanisms lots of lots of talk about caching How many times do you think the word invalidation shows up if you had to guess? You can't guess seven times and The seven times is like one in the table of contents It's like three or four times in one little section that says if you do a put you have to invalidate And then one time in that one section that the end has to talk about security Which is like the obligatory security section every RFC needs to have That's all that's RFC didn't give us invalidation Mechanisms and it's not like we didn't try there's actually there used to be an RFC or a draft that never became an RFC for Invalidating because invalidation is hard. This was a this is a difficult problem So what we did with this so we've never had a way to handle proper invalidation So for these types of content this event driven content what we did instead is we either made them dynamic and put this header on there or We thought we were smart and we would put like very short TTLs or max-age on these objects in this case five minutes Basically, what happened when we did this is that number that 300 was essentially our threshold for making mistakes Basically what that was was how long we were comfortable with Making a mistake about the cashability or the staleness of something. Let me tell you a story a Very very well-known news organization. I can't tell you who but I will tell you that they're like Alexa 200 Did this this was their way they would do it and they would tag every new story So the base HTML of every new story with a 10 minute max-age because that was sort of what they thought they were comfortable with When he came to caching what they thought was dynamic, but it was actually event-driven content So they published a story about Britney Spears true story. I'm not making this up Published a story about Britney Spears put a 10 minute time to live on it send it out into the world and two minutes later Realized that they made a mistake and what they published was actually false So for the next eight minutes, they were panicking and terribly terribly Not happy because they thought they'd be okay with ten minutes of Mistakes when they actually weren't and for eight minutes. They were basically Panicking what they actually needed was this cash-control header, which doesn't exist They wanted a way to be able to get something out into the ether and then when that mistake is made to Basically remove it from the ether which brings us to a topic Which is known as uncashing by no one everybody calls it purging and in CDNs CDNs purging is essentially the way that you uncash something from the CDN now purging is actually a really easy problem with a single cash It's a really difficult problem with a global network of caches because you have to propagate that and you have to kind of do it quickly so So okay, well if we were purging is a mechanism we can solve this problem We published a story about Britney Spears. We find out two minutes later that it's a fault. It's it's it's incorrect So let's purge it So this this the remainder of the story is that this company This news agency did that they're like hey, we have a purging mechanism with our CDN Let's get the thing off the CDN great They push the button and they realize the purge time for that particular CDN was like 18 minutes So it would take them 18 minutes to get this thing off It had eight minutes left in his and his TTL so there was no point So this only works if it's instant the way you deal with event-driven content Which is a lot of content is you cash it for as long as you want cash it as if it was a static piece of content Send it to your CDN and when it changes you know that it's changed in your application Lots of applications know this origin side What you do is you send a purge to CDN and if it gets off the CDN instantly that gives you an Incredible new set of tools to cash things that you couldn't cash before And this is an incredibly powerful mechanism except that instant bit is very important has to be instant Otherwise this won't work. Let me give you an example I'm gonna I'm not picking a wordpress to pick on wordpress This just happens to be the first logo I found for a CMS and I have a site example Which was also on wordpress, so I'm not picking on them. This applies to lots of Mechanisms like this the way we handle this sort of content before let's say you have a blog Let's say you have a thing on wordpress or any CMS Users come in they go through CDN they come to your origin and You make edits to your piece of content you may have comments to your piece of content And because that could happen at any time you can change things or people can come in and comment at any time You always make this piece of this object this this base object uncashable to the CDN So it never got cashed on the CDN and this is what your waterfall looks like this terrible very very green waterfall that has lots of blocking calls because you could not cash this thing on the CDN because you're afraid you're deathly afraid of Serving still content. That's just what our world was like before Now things are different now your users come in and they go to your Origin and what you serve is you serve your content back with a cool max age time that says cash it for a long time This is also not a real max age directive Don't use it or if you do write me so I can use an actual example in the next time I do this presentation And this object goes into the cash and stays in the cash and from that point on when the users come in and fetch it Hey, you serve it from your cash. Everything's faster. Everybody's happy. That's awesome Now let's say you got to make an edit at origin or people come in and comment an origin. Well, cool I have a purge mechanism. I just send a purge to the CDN node the object goes Disappears actually makes that noise it disappears and now you have basically you start everything from scratch And this has to be instantaneous if that boop took 20 minutes, this would not work This has to be instantaneous and once you have that mechanism now you have basically an ecosystem to cash things that you couldn't cash before Your users come in they go back to your origin. They grab things They put it on you get to cash things on the CDN where you basically rinse and repeat So this is an incredibly powerful mechanism if you have it It's an incredibly powerful mechanism to let you cash more This lets you cash things you couldn't cash before and if you're using a CDN The whole point of it is for you to cash things as close to your users as possible And you want to cash more you have to cash as much as you possibly can with the CDN And this is one of the mechanisms that allows that To talk about caching and caching more we need to have a conversation about cash it ratios Because this is the way CDN is generally tell you how well you're caching you've seen a number like this from a CDN You look at it. You're like hey cool 98% cash ratio. I am just getting amazing performance from the CDN It's awesome. Everything's amazing the problem with this number is that it's calculated through this formula Basically what this means is if you shoot 100 requests into a CDN and only one of them makes it your origin The CDN will tell you have a 99% cash it ratio in your head You think this means 99% of your responses came from the edge of the network closest to your users Except that is not true because this formula doesn't account for where the object came from It just says how many didn't make it to origin and to understand that we need to have a discussion about long-tail content I'm gonna go through a very very advanced cartoon Like Pixar grade cartoon to explain long-tail content to you This is the beginning of our cartoon. Here's a CDN You have your user sitting here and you have the origin and let's focus on one of the caches one of the I know this is the cartoon. I know this is the end of it. Don't worry. There's nothing else coming This is it let's focus on one of the caches your user makes a request to the cache to get a piece of object That request starts with a TCP connection, right? And then it comes an HTTP connection an HTTP request to say hey give me this piece of object now That object when that request is arriving at the edge cache that object could come from the memory of that edge cache It can come from the disk of that edge cache and that disk better be an SSD If it's not that's even a different performance profile And for every one of these steps depending on where the object comes from the performance profile of that response changes So how fast it is when it comes to memory is very different than how fast it is when it comes from SSD from a spinning disk Excuse me or from deeper in the network, which is what usually happens if that edge cache doesn't have it there's some sort of parent or sister cache that it'll go to to fetch something and This performance profile changes the deeper it goes into your network Usually the hottest stuff is cached at the very edge and the colder things are Cache somewhere else in the network probably where the denser or some mid-tier caches are Except that entire chain is considered a hit is considered a cache hit in that cache at ratio calculation that we talked about And you can see that it is a disservice to consider for performance It's a disservice to consider the entire thing as good as serving something from the edge So this traditional calculation isn't actually good enough for us to be a performance indicator This is something that gives us a better inclination better idea of what our performance is and how we're calculating our performance in in fact Request total request basically is the number of hits and the number of misses that we got at the edge of the network Now this doesn't mean the old calculation is useless in fact They're both very useful the old calculation is a great indicator for offload It tells us very well how much traffic didn't make it to our origin which is an important metric We want to know this however. It is not a performance metric. This new thing is a performance metric and It's important to get both of those things from your CDN Most CDNs will only give you the one on the left if they don't give you the one the right They should give you mechanisms at least for you to calculate that because that is an important metric that we generally have Not taken into account when we think about CDNs. Let me show you how this translates to performance Here's a test Can you see the yellow lines? If you can't just take my word for it that there's a yellow line there and they're all the same What we've done here. This is a test that we do to evaluate CDNs including ours I won't tell you who this is I won't tell you who any of the CDNs are that I'm about to show you because the story that the the core of the Lesson is the same no matter who they are so this I'm testing three different pieces of object I'm testing a super popular object. I'm testing an object. That's like a medium tail. That's not a real world I made it up word. I made it up But basically means it's an object that's fished somewhat infrequently in this case once an hour from the CDN And then a long tail object that's fished once every six hours from the CDN So these are infrequently fetched things But the first one on the on the left is something that's actually fished frequently That's a popular object. Now if you notice the yellow line is tcp times that's tcp connect time And if you look they're roughly the same Of course, they're the same because you're connecting to an edge cache And you're connecting you're establishing a tcp connection with that edge cache It doesn't matter if the content is there or not your tcp times are generally the same But look at what happens to time to first bite as we go from left to right That's the time to first bite for a super popular object This is one something that's what fish once every hour and this is something that's once every fish once every six hours Isn't doesn't that give you a new picture that the yellow lines didn't give you And this shows you how the long tail content affects the performance of the objects that are served And this is why that second formula that second cache it ratio cache your ratio from the edge is important Isn't this better? So when you look at how your CDN is serving content, you kind of want this Regardless of how frequently or infrequently your objects are fetched. You kind of want the same performance profile Let me widen it for you Here's a test same test what I just showed you was a one month Period of time it was testing over one month This is a different period of time. It's three weeks worth of worth of tests in the us seven CDNs and this blue bar is the Median the 50th percentile of time to first bite for a very very popular object And you'll see that there are some differences, but they're kind of roughly the same the problem is Many many many business decisions important business decisions are made based on this graph Where everything is one or two milliseconds apart Because this is the way most people test CDNs They take a piece of content like they take an image from their home page They shove it into a testing network and they they fetch it a billion times And that thing is super hot It was hot to start with the testing made it hotter. So it's the hottest of the hot And they get this graph And this graph is misleading because when you add when you factor in long-tail content Look what happens to these bars. Here's something that's fetched once an hour Here's something that's fetched once sorry the first one was once every six hours Here's something that was fetched once every 12 hours and here's something that was fetched once every 24 hours Isn't this much more telling about how these CDNs perform for the content that blue bar was just the tip of the iceberg Well, how they really are serving your content is based on how frequently it's fetched And when we look at the 95th percentile, even more gets exposed and even more so at 99th And this is a different way of looking at CDN performance And this is why it's important to take into account long-tail content and your cash-it ratios at the edge The cash-it ratio the one number that gives you global cash-it ratio is not necessarily an indicator of performance And this is why And when we think about which CDN is best for us, it's a very different picture that if you just consider those blue bars This is basically The the an artifact of how a CDN caches objects and how what their storage model is and also how what their eviction model is How they evict content because CDNs are shared caches at some point They're going to evict something you may cash something you may say I want something cash for a year There's no way any cdn is going to cash it for a year But so because there's always an eviction model eviction mechanism But how they cash it and where they cash it very Widely across cds as you just saw and this is where as you look at cdn performance That's a thing you want to take into account So at the end of our caching discussion our takeaway is we want to cash as much as possible And that includes what we previously considered uncashable what we called event-driven here or long-tail content Which appears to be cached maybe at the edge of the network, but isn't always and we have ways of exposing that So we want to cash as much as we can we want to cash as much as we can at the edge And we want to maintain control over that cashability Which is essentially what our purge mechanism gave us cash control is one way to do it But purging and uncashing instantly is another way Control is a great thing to talk about maybe we should have an entire discussion on it So now I want to talk about control so we're going to move away from caching And we're going to talk about some things that Maybe you don't necessarily think about when you think about cds, but hopefully you do and if you Don't by the end of this. Hopefully you will control is the idea of being able to have full programmability and Full say into how a cdn is dealing with your content I'm going to tell you a story to talk for this discussion I'm going to tell you a story about control and that is the story of the guardian the newspaper Patrick Hammond from the guardian did a talk a little while ago and in that talk he had this slide This slide shows a screenshot from a tool called riffraff, which was a Tool that the guardian open source. Actually the guardian open source is everything. They're amazing with this every tool They build the open source and riffraff was their deployment tool was a way that they deployed code And this was a screenshot that said, you know something was being deployed What was really interesting in this is he was showing here basically their cdn Config what was really interesting is I don't know if you can see it. Let's see. I says build number 13 74 What that means is that this was their 1374 version of config they had pushed to the cdn That's amazing This is exactly the way you operate and the exactly the same way that you interact with any service Any cloudy service that you interact with this is what you expect you want to iterate quickly you want to be able to push code Fall back to code or roll things back And in his example, this was one thousand three hundred and seventy four the version one thousand three hundred seventy four And this is powerful. The reason this is interesting is that we had this conversation 10 years ago This was impossible Because pushing config to a cdn took three hours four hours There was no way we could operate and develop and deploy things the way we do today with cdns of the past And this is essentially a lesson in programmability We already talked about having a granular api for purging the apis we work in the world of apis We want to have a config api. We want to be able to push configs Remove things from this rollback configs all this stuff instantly. We want all of this to be quick And we want to be able to run logic at the edge to be able to Essentially control the traffic as it comes to the it's our traffic as it comes into the cdn and flows to our origin Or it doesn't maybe we surf things from the edge and again, all of this works if it's instant We want this instant Interaction with the cdn. We want to programmatically control it We want to programmatically control our content. We want to programmatically control caching We want to programmatically control everything about the cdn and we want to be able to run code at the edge Because which is essentially programmatically being programmatic at the edge So what does control at the edge mean it means taking some of the logic that we have and we've Traditionally been thinking about it as things that we have to run at origin and running it at the edge of the network The example that I have for you is a fastly example But it's not the only example any cdn should be able to give you this in our case We are based on varnish. So we use vcl, which is the varnish config language if you've never seen it looks like this Here's a little snippet. It's essentially a config language that kind of reads like a script So you have like a lot of if then statements and you basically run through a bunch of logic as you deploy vcl But again, this is not specific to varnish. This is just an example Any cdn that you use any cdn that you've ever used or will ever use should give you a mechanism To let you have control and programmability and logic running at the edge That is a powerful mechanism that a cdn can offer you to let you have control over your traffic It's a delivery network and it's your traffic that's being delivered. You want control over it You don't want the cdn to be a black box We've looked at cdn's at black as black boxes for far too long and mechanisms like this help us not think of them that way Let me give you an example And the example i'm going to give you is one of edge generated content. It's not the only example There's lots of examples of logic and i'll give you some lists But this example is one i like So in this example, basically what's happened is the is the is the end user comes to cdn node and makes a request For a g o ip thing and it's json p. So there's a little json p function in the url and Two cool things happen here three cool things actually one the The g o ip information is generated straight from the cdn node So that stuff is basically based on the client's ip address And we basically construct the json at the edge. This is control of the edge Construct the json at the edge and serve it to the client the other cool thing is We basically had the ability to extract the function name from the url and shove it back into the content That's kind of cool, too And the third cool thing is there's no line to origin The origin sort of just there Because it needs to exist i guess but in this case this is an example of using the cdn for something Maybe it's a very simple example, but we've done two things we've generated content at the edge Synthetically if you will and we've taken we've made that content be dependent on the url that came in So we did some some funny stuff some logic is at the same time And we kind of created a thing that doesn't need an origin which is really cool So when we talk about logic at the edge, there's lots of examples everything from request routing to header manipulation And load balancing to geo fencing to tokenization authentication Anything that falls under Logic that we can run close to our users device detection micro caching being able to have configurable cache keys so We have an example where Somebody cached hotel search results based on latitude and longitude So for the same url you get different requests if you're making your request from pen station in new york versus Where if you're making requests from central station in baltimore Which is really cool because that's a configurable cache game That's this example of running logic and having control over your content at the edge And again, it all has to be instantaneous if you're going to code things for the edge You better be able to deploy that code quickly and instantly otherwise none of this will work And as you deploy code you better see what's going on and this brings us to a discussion on visibility As you're doing all this as you're iterating quickly and rolling things out You kind of want to see what's going on and that's visibility We've talked about apis But apis don't end with how you purge something or how you configure and roll out config apis also have to do with statistics and Metrics that you want to see from your cdn So any cdn that you're using better have a stats api better have better give you network stats htp stats Anything that has to do with caching they better come and it has to be in real time now There's value in historic data But it's a lot cooler when you're having stuff roll to you and you're seeing what's going on in real time It's not cool to see a spike five minutes after it happens if a spike is happening to your traffic You better see it as it's happening So that's you want real-time analytics. You also want real-time logging So there used to be a time where cdns would maybe give you log files Once a day five times a day once an hour whatever batch jobs And that was never good enough you want real-time logs because there's a lot of things you can do with logs as they're rolling in Also, it's a security feature. It's much better if the cdn is not storing any of your stuff What that means is you have traffic that's coming to cdn and that cdn should be able to point to an endpoint of your choosing be syslog s3 gcs whatever And it's basically stream your traffic stream your logs to that endpoint better yet You should be able to configure what you want logged. So it should be w3c logs Maybe things from a cdn metrics from the cdn that are valuable to you with each request all that stuff should be coming to you real time I'm going to give you my favorite example of all these examples because it sort of has lots of elements to it And that's the example of beacon termination at the edge. Who knows what a beacon is? Oh my god, I get to teach you what a beacon is. That's great. Okay beacons Thanks, you come on in. I'll teach you too. Wow didn't know unplanned So a beacon is essentially a request that a browser makes With care that carries information in it usually has query parameters if you're use google analytics If you're use rum or any performance Real user monitoring performance tools if you've ever used ads Targeting all of this stuff. They all use beacons. The idea is this. This is a google analytics example Here's the facelift page and this right here is the google analytics beacon What that does is the google analytics script runs on your page And then it collects a bunch of data Like your viewport and who you are and which page you're on and things like this and it constructs a single request That has all this information as query parameters. So that here the dev tools breaks it down for you There's a bunch of stuff that google analytics collects And it sends that to the google analytics server through what's called the beacon Which is a single request in this case. They use a one by one pixel image To and all the information that's important in what's been collected is actually carried in query parameters. This is such a major mechanism in collecting data in the in the ether web that This actually there's a beacon standard now There's actually an ap beacon api that you can call to send beacons from the browser makes them asynchronous non-blocking blah blah blah What's interesting about beacons is the request And the response the request is important important because it carries the information the response that one by one image Is completely immaterial it matters not at all. It's actually an invisible pixel But htp says when you make a request you have to have a response So there's a response that's sent But that response gets these like super uncashable headers associated with it Because they don't want it cached anywhere in the middle because that they need the request from the next time that request is made From anywhere they need that to get to the origin where the data is being collected So it always gets these super super uncashable headers The way you dealt with this with cds was like if you ever built a beaconing application The way you did this when cds was like this You put a cdn node in front of your application And you ran your javascript on the page and you constructed your beacon and then you wanted to send it back home So you construct this url it goes to cdn cdn is like okay I can't i'm just going to send this home it sends it home to the origin The origin basically responds with a little image and it puts super uncashable headers on it And all the origin wants is essentially this url because in the url There's all the data you're trying to collect and that url that logging essentially goes to some sort of log analysis engine So you collect all the logs and shove them into an analysis engine And then that analysis engine runs over it collects it and then big data buzz buzz buzz buzz That's what happens afterwards. That's how you analyze what's going on And that's the way it worked and we could never cache that thing as cdn could basically provide very little value to this application before If you take all the mechanisms if you take all the mechanisms we talked about today and put them together Check out how we can use a cdn to do to build this application now Today we can take that url we run our javascript still Construct the url we send that url we send we send our beacon to the cdn edge And we were able to cache that image on the cdn So we serve the image straight from the cdn Better yet. We actually respond with the 204 204 is a special status code in htp That says I have nothing to give you but I'm going to respond because I have to respond htp says you have to respond You can construct that 204 straight from the cdn. We talked about generating content at the edge as an example of that Then what we do with things like real-time logging is we log that request straight to an endpoint like syslog s3 gcs whatever And that's how we collect our data without having to build a form of servers that do nothing but log collection better yet Here's a cool example If I log that thing to gcs, there's a function I can run in gcs that essentially takes that data and imports it straight into bakery and I've streamlined that and I've streamlined the entire thing And I have a whole application built the reason this is my favorite example is because I don't need an origin for this application at all I built a completely originless application beaconing applications are common built an Entirely originless application using a couple of services from a cloud and a cdn node With a couple of features that are important now if that that log streaming was an instant We lose being able to analyze data and seeing in real time what's going on. Those are valuable things that we want This is a very cool example that sort of takes all the things that we talked about together and puts them into one place And I love this example because it's an entire application without the origin and it uses Systems from all over the place. It's very cool. And this is a real application We've built this a couple of times for customers and we actually have internal applications that use this We have a few minutes left and what i'm going to do now is i'm going to take a step back And talk to you about what i'm actually talking about with all these mechanisms, which is a bigger picture than features We're having a feature discussion, but we're actually talking about something bigger We're really talking about is the fact that cdns this notion that cdns are black boxes that are opaque And we have no visibility into them as a thing of the past if you're still thinking about your cdns like this stop This is not a way to think about cdns anymore. They are no longer just a mechanism for content delivery By adding programmability control and visibility We're doing lots of things with cdns that maybe we didn't do before And they need mechanisms to do this like all the mechanisms that we talked about are vital for us to be able to interact with a cdn this way But once we have those mechanisms in place we kind of Interact with a cdn and use a cdn very similarly to how we use other cloudy services And it's cliche, but it's probably not a good idea to think about a cdn as a cdn anymore. It's kind of a cloudy service That's a better it's you can actually argue that cdns are one of the first cloud services Because there were these things that were happening in the ether and they were handling your content. They were essentially content as a service for a long time This is a cliche workshop The difference with a cdn as a cloudy entity in the world is that it's closer to the users That's the difference between a the cloud that a cdn is versus a cloud that things like ec2 are So it's a cloud that's close to the edge users. It is an edge cloud if you will So here are users all over the all over the world And here's your the core your central cloud You know how to deploy things on this and you deploy applications on here. You deploy components You scale them up and down however you want Think of a cdn as one that sort of sits at the perimeter it's the first line of Contact between your users and your application and it's a second layer of cloudiness that you've added to your application You've built your application with that's actually a better way of putting it So you have the central cloud that does the things that you want to do centrally And then you've distributed everything to the edge through the use of this edge cloud So this example that we had before was actually the perfect example of this Right, we use the cdn node as a service at the edge as a as a cloudy service at the edge And we use the bunch of storage and data services in the core of the central cloud This is a perfect way of thinking about building applications with different components that are sitting in different parts And are together building our our servicing our application And from this edge cloud you expect services just like you expect services from a central cloud except these are edge services What are edge services? Well, there's a bunch And there's probably others everything from content, which is essentially cdn's To load balancing and request termination and request manipulation Geofencing everything that falls into the security bucket is a service that you can expect from your edge Things like content transformation if you're optimizing images, that's a service that you should be expecting from your edge clouds Building original applications may be even pushing some data to the edge That would probably make sense small key value stores probably make a lot of sense to have at the edge And basically anything that makes sense to run and have close to your users should be a service you expect from this edgy cloud So let's think about it in a different way. Here's our ecosystem You built you build applications and you build them on this central cloud You already know how to scale them horizontally You already know how to scale them vertically The cdn or the edge cloud essentially allows you to scale them radially Right and if you think about this whole thing as one big ecosystem, it's a powerful Model to build applications through you have things that are happening close to users where they where they belong You have things that are happening at the central Cloud where they belong and you when you think about scale this way you think you you essentially divide your world into two You use the edge cloud to run things that make sense to have close to your users things that are latency sensitive Things that just make sense to run near our users. We want them to be distributed We want to protect ourselves at the edge of the network and you run things that make sense centrally things like compute or data processing or data storage etl services whatever And if you think about this as this divided world where you run things that make sense at the edge And you run things that make sense centrally you have a powerful ecosystem to build applications with So the way to think about cdn's and these edge clouds is essentially like a platform And I have here it's a platform for extending your application the edge because this is what the title of this talk was But this is actually not the best way to think about this The best way to think about this is we're not extending an application to the edge We're building an application with services that happen to be at the edge The cdn or the edge cloud is another building another component that you build an application with the same way that you've used any other Cloudy service And that's that once you have your mindset starts think about these platforms this way the the possibilities open up Yes, you need all the features that we talked about those are all essential They need to be there But at the end of the day if you think about the whole thing as an ecosystem as a set of platforms that you use The same way you've done with any other cloudy service There's a power there that lets you build applications that maybe you couldn't build before Certainly more performance certainly more available certainly more distributed That was 50 minutes of talking. You know how many slides that was that was like 166 slides. I'm very proud of you for sitting through that I have I have nine minutes for questions look Okay, so you got rid of our origin you're at Drupal con that hold on wasn't that cool No, okay that origin is my Drupal. We just you're at Drupal con. We're building Drupal What the hell do we do in Drupal to make that work? So first of all don't stop using Drupal That was an example by no means am I telling you to stop using it in fact use it more I want to make sure I am nice to our hosts So the what I gave you was an example of how you could use a cdn to build an originless application I am not by any means telling you stop using all origins Because it makes a lot of sense to have things that run in our origin You're using Drupal to generate content. You're using Drupal to manage content All these things are happening and they need to continue to happen But maybe when you build a beaconing application, you don't necessarily need to engage every component of a Drupal ecosystem Maybe when you build some other application, you don't need to use every component if you think about this as again as an ecosystem You can think about it as building pieces that make sense where they need to be built Certain things are always going to make sense to have with an origin with a management system with a whatever And certain things may not make sense because they're More streamlined or don't need an origin don't need content for example So the question it's it's the the the question is how can you find a way To build applications that don't need an origin But that's not the take the takeaway here isn't build applications that don't need origins The takeaway here is have control over the mechanisms that you use the same way that you have control over your Drupal ecosystem You should have that control over a cdn or an edge cloud that you use closer to users I don't know if that answers your question It doesn't but that's the I by the preaching here isn't stop using origins The preaching here is think of edge clouds or cdn's as another building plot in your applications. That makes sense Yes, sir So i'm just curious a lot of the use of the Edge for things other than just static caching is new to me Do you have any recommendations for the best way to go about learning more about their uses like what you were describing or like tutorials and such any good resources with the With the risk of touting my own company. I think we have a bunch of things So I would recommend that you I think we're still around if you find the red people like back there We can probably point you in some directions there What i'm talking about isn't revolutionary and new like there's others that are talking about using cdn's in a slightly different way That is not just static content There's stuff out there. I would say we talk about it a lot So our blog is a good place to start. It is a fastly blog. So it's going to be fastly We're really polite So we don't like we don't badmouth anybody else, but that's probably a good place to start And if you once you start thinking about cdn's as things that can do more You start identifying mechanisms that you're looking for in cds And then you probably can search by that and find those things that makes any sense. Thank you. Sure. Yes Thanks for giving the talk Just a quick thing about uh, sort of programming the logic like thing that I feel like I need from a cdn And maybe I just haven't gotten this space enough But uh is role-based access to content having the logic for Being able to cache something so that I not just one user can access this but everyone with the same roles You can cache once and they can and they can see it without Just obscuring the content right or out or something Is that is that something there's a solution for without heavily modifying an application some way that can be provided to a cdn like fastly or is that a Difficult more difficult to solve I think it's to do it probably at the level that you're used to with your RBAC systems is probably too much for a cdn but we do have mechanisms that basically authenticate and do um, I wouldn't call it role-based access but Or permissions if you will where you have a mechanism. Let's say you have a place that has The permissions database and you have the cdn look up before they let anybody in They they serve any static content or dynamic content and that's got to go through the cdn. There's a way to There we have ways. I don't know about other cdn's We have ways where you can do look ups before you let anybody in and then maybe Remember that for future purposes It depends on your specific use case that tall guy in the back has coded this a dozen times if not more So he's a great resource for you to talk to Thank you sure Anything else How do we do was that hopeful was that useful was all right. Thank you for coming