 Your web developer wondering what a reasonable JavaScript bundle size is. Or you're a PhD student researching HD2B do adoption. Perhaps you're on a web standards committee and you're surveying how an API is used in the wild. Where do you go to find answers to big questions about the web? In this episode we're looking at the HD2B archive and how you can use it to learn more about the state of the web. You can think of the HD2B archive as a data center full of machines continuously testing hundreds of thousands of the most popular websites and recording everything there is to know about them. From how many bytes of JavaScript were loaded, how long it took to download them, if any images could have been optimized and much, much more. As you can imagine with this much data we can learn some pretty amazing things about the web. So how do we start making sense of it? HD2Barchive.org is the place to go for web stats and trends at your fingertips. Common questions about the web like the size of the typical web page or HD2Bs adoption are all answered here. You can even go through seven years of historical data to see how the web has evolved and where the trends are taking us. As a community of web developers this kind of data is crucial to know if we need any sort of course correction or get confirmation that we are in fact heading in the right direction. And this is actually a really good time to be using the HD2B archive because there's a new version being released in early 2018 with an upgraded UI and lots more modern metrics. But what if the stats you're interested in are so specific that it's not available on the website? This is where BigQuery comes in. The HD2B archive data is like an iceberg with a hand-picked set of interesting metrics exposed on the website but so much more to be explored beneath the surface. On BigQuery you could mine terabytes of raw data using simple SQL queries. So let's dive in and see what kind of insights we can extract. The summary pages data set contains high-level data for each crawl. Many of the stats surfaced on the website come directly from here. We can go deeper into the raw lighthouse results to learn more about the progressiveness of the web. For example, we can query for how many websites pass or fail the audit that checks whether a service worker is installed. According to the data about 2,400, 0.6% of the sites tested actually have one. When we compare this to the available historical data, there's a clear increasing trend. Let's look into another hot topic on the web, which is the use of cryptocurrency mining JavaScript. Since every request is logged by HD2B archive, we can query for patterns like whether the URL includes a known mining library. Of course, there are ways to conceal it from the URL and there could be false positives, but this gives us a rough idea. Things really heat up when we start exploring the particular websites that include such code. For example, what will we find if we limit our search to .gov or .edu websites? So those are just a few examples of the power of the HD2B archive. It's a super useful tool for learning about how the web is built. In the upcoming episodes, HD2B archive data will form the basis of many more of our insights. Also, be sure to check out discuss.hdtbarchive.org where people get SQL help, share interesting analysis, and stay afloat with the latest changes. Thanks for watching. .