 Hello everybody and welcome back to the state of the web. My guest is Andrew Betts. He's a technical product manager and developer advocate at Fastly and today we're talking about a bit of web metadata that is becoming ever more important to web performance and security. HTTP headers. Let's get started. Andrew thank you for being here. Thanks for being me. So in addition to your work at Fastly you've also been working at the W3C technical architecture group or tag for a couple of years. How has this experience shaped your interest in headers? So I was on the tag for a couple of years until January this year and the purpose of the tag really is it's a review body. So working groups that are either W3C or other standards bodies will come up with some new feature that they want to add to the web platform and they will take that spec to the tag for feedback. The tag is it has a broad membership it's just people who have a lot of experience in in a variety of different ways related to the web so we're in a good position to give people feedback on a spec proposal and I found that a lot of the proposals that were coming to us for review were not just things like new elements or new JavaScript APIs but they were actually new HTTP headers and that was interesting to me because it felt like this wasn't an area that we were seeing a lot of innovation and actually there is a lot of standards work going on so that was particularly interesting. Also I work for Fastly and you know we're we're an edge network we process a lot of traffic that goes through the edge of the internet and we let developers manipulate that traffic in in various ways and one of the most popular things that people like to do is to add and remove HTTP headers to optimize security and performance and that kind of thing so there's a professional interest there as well. I guess the thing that really sparked my sort of prompted me to do this research was that I was having a conversation with Steve Souders he used to work for Google and before that at Yahoo and he's written various books on my performance and he said to me that he's been using the same technique for about a decade to clear an object out of the browser cache so you know you serve a JavaScript or a CSS or something with a very long cache time and you suddenly want to clear it from your end users browser cache how do you do that and he has this hacky technique for doing it and he said to me you know Andrew there must be a better way of doing this now I haven't done any research into this for several years and I said well have you heard of clear site data which is a new HTTP header and you send this on any response and it clears your entire origins cache and he said wow no I haven't heard of that so I thought that's interesting so if Steve Soudersen hasn't heard of this then maybe we need to do a better job of communicating some of these new headers. So I think most people's familiarity with the cutting edge of HTTP is probably HTTP 2. Is this an incremental improvement over HTTP 1.1 or is there like something brand new in here that is totally like revolutionary? So I guess it's both really so you know in some ways HTTP 2.1 is just solving problems better than that we've we've progressively been solving better and better in HTTP 1.1 so when the HTTP was first invented you needed a separate TCP connection for every request and response and today that would that would just not work right you know there's hundreds of resources on every web page it would be super slow to have to open a new connection every single time then in HTTP 1.1 we got Keep Alive so we were able to reuse the same connection for multiple resources but they still had to be sequential so the solution to that was browsers started opening multiple TCP connections to the same origin and parallelizing you know up to four maybe six connections and that prompted developers to start using techniques like domain sharding where you would you know allocate more than one external address to essentially the same server so that you could pretend that it was multiple servers and then get your get eight connections instead of four and in H2 we finally do away with that and we introduce multiplexed requests and responses on one connection so it no longer matters how many things you're requesting from the same origin we can interleave those packets in any order. So domain sharding would now be an anti-pattern? Yeah essentially there should be no reason for anyone to use domain sharding with H2. At the same time there are things that are being introduced in H2 that are not incremental improvements they're just completely new concepts like header compression so until now we've been able to compress the body content of a web page using gzip for example but that's compression is advertised in the headers so the headers can't be compressed and that's a shame because actually the headers are critical content you know they are they come up in front of the content that we need to render the page so it's important that we get the headers down as quickly as we can and the headers will be also being getting bigger at the same time so H2 introduces header compression it also allows for a dictionary to be shared between requests so on a subsequent request if your headers are largely the same as the previous one then they'll disappear in compression that I think is one of the great improvements in H2 as well. You've also been doing the rounds at conferences recently you have a presentation titled headers for hackers where you do analysis into the usage of headers on the web can you describe your methodology and how you did the analysis? Yeah so after I was done with this research that I did with Steve on clearing cache in the browser I was curious to find out you know for the headers like clear site data and other things what is their relative popularity you know like some of these best practices are probably being adopted quite well and some of them maybe not showing any adoption at all and so you know the first step to figuring out why is to actually find out what the actual situation is so HTTP archive which actually Steve started and is sponsored by Google and Fastly is a big database of the top is it 1 million 1.3 1.3 million sites it's they're crawled via web page test every couple of weeks and the entire data set goes into BigQuery and that includes all the response headers so I can query across all the headers served on all of those responses not just on the page response but all the resources that those pages loaded and I can get a pretty good picture for what headers are used on the web today. So the web is a pretty wild place I imagine you found some surprises in there is that right? I did and you know as a someone who works with you know web traffic every day and also you know did two years in the standards community looking at new headers and and web standards I thought it was unlikely that I would see anything in the top 30 that I didn't recognize and actually there was one called P3P that I didn't I didn't know what it was and it was served by around 10% of the responses in HTTP archive so I thought wow what is this? I looked it up and it turns out this is a privacy related header it's it was designed as a machine readable statement of privacy policy and a user agent a browser would surface this to the user in some way it seems to me actually when you read the spec like a really good idea and it's a bit of a shame that it never actually took hold but interestingly it was implemented in some form by Microsoft in Internet Explorer and it actually was used to gate access to a small piece of browser functionality so if developers wanted to use that functionality they needed to have a P3P policy but it didn't actually matter what the policy was because it wasn't validated by the browser so the most popular value for this header in HTTP archive is the phrase this is not a P3P policy so it's a total waste of space it is a total waste of space wow was there anything else that you found so I guess P3P you considered to be a really niche find one of the more common headers is is expires everybody knows what that means and what it does it's 80 percent of the responses in the archive include an expires header the weird thing is that the vast majority of them are all set to the same date and it's a very specific date it's like 4 p.m. GMT sometime in December 1994 and what is significant about that date it's the example in the spec so if you read the HTTP caching spec there is an example of an expires header and that is the date that is given so a couple of things I find interesting about this the first is that the HTTP date format includes the day of the week so if I wanted to think of an arbitrary date in the past and it didn't matter what that date was because I just want to indicate that my document has expired then I could think of a date in the past but then I would need to know what day of the week that was in order to form a valid HTTP date because I can't do that mentally it's easier just to copy the example from the spec the other thing I found interesting was that if people are serving expires headers that intentionally contain a date in the past then a better practice is to use a more modern header like cash control and in fact we quite often see patterns where people include both a cash control header and an expires header and in those situations most commonly the expires header would be redundant and will be ignored by the browser so there really actually is no point in setting that header at all I see you've also done research into x-frame options what did you find there yeah so x-frame options is an interesting one because that was one that I was using myself and I found that it's a very popular header and it's it's used to prevent clickjacking essentially so you just set this header if you want to prevent other sites from putting your site in an iframe on their page and I thought this was the best practice I was then subsequently I subsequently found that there is a content security policy directive called frame ancestors which allows you to achieve exactly the same thing but it has two significant advantages one is that it's fewer headers because you already have a content security policy or at least you certainly should do and the second one is that the directive as part of CSP is much better specified than the x-frame options spec so you can count on better interoperability between browsers so that was one where I actually changed my behavior as a result of the research there was also a blog post that accompanied your presentation and a lot of the discussion in the comments had to do with the Viya header can you describe the gist of that conversation so I think this was a result of some bad phrasing on my part describing this particular one because the because Viya is interesting in that it is both a request and a response header so as a request header it's actually very important it performs a very useful function as requests get passed from one proxy server up to the next you know if there's there might be a number of hops in the chain to get back to the origin and the Viya header ensures that all of those hops in the chain speak the same version of HTTP and also to some extent that you don't end up in a loop where you know say you have a CDN or any other kind of edge network that uses another CDN as an origin and that CDN uses the first CDN as an origin you could end up you know destroying the internet or something so the Viya header to some extent is is used to prevent those kind of request loops and it is important that it does that now the Viya response header in in contrast is informational it is added to as the response comes back through all of those hops and in the browser you get to find out all of the proxies that your Russian response translated that information is not particularly useful to an end user so I felt that it was not necessary to keep that header and I think those two concepts the request and the response got conflated in in people's discussion of it but I think that was probably confusion in the way that I phrased it in the blog post and how did CDN loop play into that oh so CDN loop is a a concept that we've have been shopping around at Fastly and amongst other CDN vendors and the idea is that we take that part of Viya that is really useful which is to prevent loops within edge networks and proxies and put that in a dedicated header which will avoid some of the baggage that comes with using Viya and provide us with a clean mechanism for preventing these loopbacks between CDNs so this is a good example of a new HTTP header that not from many people know about it's not yet supported by any browsers it also needs to be supported by the major edge networks and CDNs and it's a also a good example of cooperation between people in the industry so you know browsers have actually been used to cooperating and making web standards for a long time you know I used to sit in a room with people from Google and Microsoft and Apple and Mozilla and Samsung and you know we would all talk about web standards and that exists in that industry in a much more advanced way than it does in the in the CDN edge network industry and I think CDN loop is a good example of us starting to push that standards collaboration forward. You had mentioned CSP earlier do you have a sense of the success of the adoption of privacy and security headers like that? So content security policy yes it's disappointing it's like two and a half percent it's depending on how you look at it it's somewhere between two percent and ten percent but it's it's too low considering how important CSP is to preventing cross-site scripting attacks which is still one of the most common ways for people to to attack vulnerabilities on websites people just are not using this defense mechanism enough and I think one of the problems with it is that it is reasonably complex to implement CSP you know the average length of CSP that I found in HDB Archive was 600 bytes it's one of the longer headers you know I used to think that we were committing terrible crimes by having so many cookies on the sites that I used to work on in in in my previous jobs and then I realized you know that if you serve a content security policy header that's this big then you know that becomes your biggest your biggest problematic header. But now that we have header compression it's not so bad. Well that does help and we also might come on to things like origin policy which help as well but nevertheless the just writing those 600 bytes is an overhead and I also found in the archive signs that people are trying to generate these headers automatically because I found one that was over 10k in size which was listing you know hundreds and hundreds of advertising third parties that might be loaded onto the page and you know it was probably unnecessary it was probably something that could have been manually optimized but I suspect what happens is it's generated automatically and then no one looks at it so you know you get that kind of effect and that is really terrible because that header it needs to be loaded prior to the even the first byte of html so you know that is that is it's stuffing up your critical path. And it's a wait list it should be as small as possible so that you're only allowing a certain number of. Well exactly yes so so CSP is problematic it's even the two and a half percent of websites that implement it often do it in a way that offers them little to no protection and yet it remains a very powerful tool if used well. Other headers like HSTS that's a HTTP strict transport security is much more widely used this is a header that forces browsers to connect to your website over TLS SSL HTTPS so even if somebody types in the address of your website with no HTTP s colon slash slash which is obviously you know everyone the browser will even then make that very first connection to your site over TLS so you won't you won't need to suffer the latency of a redirect and also you won't expose your site to that tiny window of opportunity for an attacker to man in the middle of the connection. So HSTS is a really simple thing everyone should be using it and it's very simple to implement the only value that you give it is a max age so incredibly simple to apply. And I see adoption is around 20% which is good. Yeah exactly so you know as you as you would expect it's much easier to implement so a lot of people are using it. How about refer policy? Yeah so refer policy is interesting because you know where you've got CSP has very low adoption but it's hard to implement HSTS much higher adoption easy to implement refer policy is also very important thing to consider and yet it's very low adoption about 2% and it's extremely easy to implement so that was something that surprised me and I think the reason for that might be that whilst things like cookies are foremost in the minds of all of us when we're building websites because we have a lot of legal obligations and compliance that we need to deal with that sort of aspect of security and privacy is distracting us perhaps from the data leakage that we might be suffering by not setting a refer policy and just to be clear refer policy is a policy that prevents browsers from sharing the full URL of the page that the user is on when they click off your site to go to an external domain so if I click a link on your site to I don't know to a Wikipedia page that you've linked to then Wikipedia will see the full URL of the page I was on before I ended up on Wikipedia now maybe that contains some personal data in the URL maybe you've included my name on my email address in a query parameter in the URL that's the kind of thing that the refer policy will help to prevent. So what else did you find related to performance headers? So the stuff we've talked about so far I guess is mostly security and privacy related so the headers have also been brought to bear on performance a lot I think the most important one that we use today is linker I'll preload so link is an interesting header because it's incredibly generic like you know the origins of linker in the semantic web the idea that you would you would as metadata for web page you could attach links to the previous and next chapter of the book or whatever it is that this web page is you could have a link to an index or contents now we don't really use the web like that anymore some would say that's a shame but I think the header that offers us just this ability to generically link to another document and say that this is related in some way is quite useful for saying these are resources that are going to be needed to render this web page so please load them as early as possible and so that is what spawned this link rail equals preload which particularly is used for fonts because fonts tend to be discovered quite late in the process of passing a web page so by shipping a link rel preload header for all the fonts that you're going to use you can drastically reduce the chance of a so-called flash of unstyled text so that will so that that gives you a huge benefit it's some would say not as good as it could be and so this is actually something we've been working on to create a new spec for something called early hints so the idea of early hints is that instead of waiting until the server is able to determine that the response is going to be a 200 to send headers we can send the headers even earlier so we receive the request before we even start thinking about whether this is even a valid request or not we immediately emit some headers that say you know what it doesn't really matter whether this turns out to be a 404 you're probably going to need this font so please just start loading it anyway and then if it does turn out you've you've given us a garbage URL and this is a 404 page then well our 404 page is probably branded anyway so we're probably still going to write it in that font so it makes sense for a small number of really critical resources to be loaded even when we haven't determined the basic status of the response yet so early hints is very early stages it requires implementation not just in browsers but also in web servers and also in edge networks and CDNs so this is a very hard one to ship but we are excited about it because if we manage to optimize that period of time when you know we're waiting for the server to think of a response that is time that is currently really critical because the end user is sitting there waiting for one response and nothing else is happening on the network so that is time we could really make use of looking ahead are there any headers that aren't necessarily available in the wild yet but you're excited about absolutely so we spent quite a lot of the time on the tag looking at things like feature policy feature policy I'm incredibly excited about it will enable us for the first time to start to reduce the size of the web platform which is interesting because you know all these standards that get shipped all the time are constantly increasing the size of the platform adding new APIs adding new elements adding new headers and it's become this fairly unwieldy beast and also if you go back and you look at sites that haven't been maintained for a number of years they might now have security vulnerabilities that they didn't have before because there are whole apis that have shipped that didn't exist when that site was made so feature policy is a way for you to list features of the web platform and restrict them to certain origins so rather like content security policy restricts access to network destinations from the page feature policy restricts access to apis within the page so I can say for example that video autoplay is turned off and that means that even if I write code in my page that attempts to autoplay a video it just won't work because I turned it off in my feature policy now this is really exciting it means that not only can we we can we apply some really simple policies to all of the things that would otherwise trigger user permission prompts so things like notifications you know everyone's starting to be familiar with all these pages that you know on page load will immediately say can we send you notifications it's a terrible anti-pattern obviously I've no idea whether I want notifications from a site that I've only just opened in my browser sites can start to say we are a good actor we we are not going to do that by declaring a feature policy and just turning off features of the web that have generally been agreed upon as bad practices and and also we can prevent third parties on the page or maybe our own development teams from accidentally using things that we feel we shouldn't use in the future I'm hoping that feature policy will get expanded to include all the kind of patterns that have have been maintained since sort of the antiquity of the web things like document.write or synchronous xhr which still work and they still need to work because you know the the web has this philosophy that that geocity site that you wrote in 1997 should still work but if you write a site today you're not going to use those same techniques and it's reasonable to say let's just turn them off because if we turn them off browsers can optimize things better search engines can give you potentially ranking bonuses for having good practices on your site and you can rest more assuredly in the knowledge that you have a smaller attack surface for vulnerability. So I am really excited about feature policy and you know and and also things like cgn loop that we talked about earlier on. I'd imagine that if you have that data in the hdtpr guide you could see the features that people disable and that would also in aggregate be really interesting to analyze. Yeah absolutely and I think people will be doing that and and understanding you know how successful feature policy is at enabling that uplift in standards across the web. So if developers want to learn more where could they go? Well you talked earlier about the the talk that I did and I turned that talk into a couple of blog posts. One is about best practices one is about anti-patterns they're both on the Fastly blog so it's all there. Great Andrew thank you so much for being here. If you'd like to find the links to all these resources we have them in the description. Thank you for watching we'll see you next time.