 Good morning, and thanks for joining me. My name's Russ Biggs. I'm the director of technology with OpenAQ. I'm going to talk today a bit about wrangling the world's air quality data and how OpenAQ does it. So just to give some kind of context about why air quality. So we've kind of come up with these four main points about air pollution and air quality and why you should care about air quality. First of all, air pollution is deadly. According to WHO, the World Health Organization, 99% of the world's population breathes unhealthy air on a daily basis. And on average, this causes about 2.2 years of less life expectancy around the world. Some parts of the world, that's as high as seven years. Also, air pollution is ubiquitous. The whole world is now covered in air pollution. As I said, 99% of the world's population breathes it. It's everywhere. And air pollution harms some more than others. And we've seen this, different parts of the world have much worse pollution than others, particularly in the global south. And then sources of air pollution and climate pollution are often one and the same. And I see they have this type of relationship in that air pollution causes climate change and climate change causes air pollution. So they unfortunately work well together. So at a high level, what we do at OpenAQ is we harmonize data from the world's disparate air sources of air quality. And we bring that into a single data format and a single data platform so that folks can fully maximize the use of air quality data. So following this, just our general plot, we've got this large group of disparate sources. We've created a universal platform from that to harmonize the data so that many different use cases can utilize that data. So just to give kind of a quick primer on what air quality data is, we collect, primarily we collect these seven main pollutants. And these are often called criteria pollutants. These are the ones that often primarily affect human health. The first two are PM 2.5 and PM 10. That stands for particulate matter. This is small particulates that are in the air. The number is the size of the particle. So 2.5 microns and smaller. And 10 is 10 microns. The PM 2.5 is a very dangerous form of particulate matter. This is smaller than the cells of blood in your blood. So it can pass through your membrane and causes a lot of health problems. This comes from wildfires, cars, cigarettes, these types of things of particulate matter. We also collect some gases, sulfur dioxide, nitrogen dioxide, also a couple different flavors of nitrogen oxide. So NO2, NO and NOx, which is just a generic nitrogen oxides. Carbon monoxide, black carbon, which is a subspecies of PM 2.5. This comes a lot from diesel burning and ozone. On our platform, we have a couple other different types of pollutants. Different flavors of particulate matter, but these are the criteria ones that you'll see are often most relevant in policy and advocacy. So, how many people have heard of an AQI before? I'm kind of curious if that's, okay. So an AQI is, for those who haven't heard of this, this is an air quality index. And this is probably the most common form of air quality data that you've seen out in the world. And you'll see something with this kind of scale, this, I like to say kind of the stop light. Green is good, all the way to purple is hazardous. At Open AQ, we don't ingest air quality indexes, and I'm gonna explain a little bit why. So an air quality index is essentially just an average of air quality data, and it can span multiple pollutants. So we see it primarily, it's a policy and public health tool. And in some ways, it's a very effective tool because it tells you, is it healthy for me to go outside right now or is it unhealthy? That's its job. It doesn't tell you though why it's unhealthy. So an AQI of 100 will say, this is unhealthy for sensitive groups, but it won't tell you what kind of unhealthy. Is it particulate matter? Is it too much ozone? Is it too much sulfur dioxide? That type of thing. And because of that, it wraps up a lot of the nuance in the data and we choose to pull just what we call raw concentration measurements. So this helps us better share the specific pollution sources and eventually then you can roll into your own AQIs, but just to give some general context, what we do and don't ingest and this is a unique attribute of open AQ compared to other air quality platforms. So also really quick, looking at kind of how air quality is collected. These two devices that we see here, I mean one's even bigger than a device, it's kind of a little shed. These are what's often called a reference grade monitor. The one on the left is called a BAM and these are very expensive tools. Reference grade meaning that they're of the highest kind of scientific quality. This is kind of the gold standard. These are, as I said, very expensive. They can cost in the range of tens of thousands to hundreds of thousands of dollars just to acquire one and then often cost in the hundreds of thousands of dollars to maintain. These are also often referred to as regulatory machines. In countries where the government regulates how much air pollution is, these would be the tools that are used for that. Regulatory and reference aren't the same thing necessarily. Generally, regulatory devices are reference devices, but not all reference devices are regulatory. And then a kind of new device on the scene is what can be called low cost sensors or we call them air sensors. This is kind of a consumer grade product. You can see they're much smaller. The one on the left is a very popular brand called Purple Air. Middle one's a brand called Clarity that has an integral solar panel and then one called Habitat Map. These are consumer products, so you or I could buy them if you have the money. These range in the $50 to anything under $2,000 technically is considered this. So kind of a high level overview of how OpenAQ works. We get data from a lot of different data sources, so a lot of government data sources, private sector partners from some of these low cost sensors and we aggregate that through a fetching process and ETL kind of pipeline. We store that data and then we make it accessible via an HTTP API and then a bulk file download system. So kind of the problem that we've identified is that there's lots of different sources across the world and governments when they report data are very inconsistent in how they do it. So we're pulling over 150 different sources right now. All sorts of different formats from JSON. We have FTP servers that we're pulling from. We're scraping tables off of web pages. The much loved CSV is a common format. A lot of these data sources only exist in the moment. They report the real-time measurements and then disappears. So we're scraping the web constantly to pull this data in. A lot of these aren't meant to be interoperable and so that's our goal is to pull them in, harmonize them so they are interoperable and usable. So kind of looking at the scale at which OpenAQ is working at. Like I said, we're pulling in around 150 sources. We cover about 150 countries in the world right now. We're right around 50,000 different locations so that's 50,000 stations monitoring data. And just yesterday we were at about 34 billion measurements in our database. The kind of growth I pull in about 15 to 20 million measurements every single day. We have this going kind of constantly to get real-time data across the world. And the map here kind of shows the geographic spread of what we're pulling in. Diagram didn't show up. Okay, so then how to get data from OpenAQ? I just wanted to kind of show some of the tools. We recently launched an Explorer platform. This allows you to browse the world's air quality on our platform. You can sort by different pollutants and then download data in a CSV for a given site and across different pollutants. For the more kind of data savvy, tech savvy, we also, as I said, we have an API. Also a file archive which allows for downloading of GZIP CSVs. So this is a great method for bulk downloading of data. And all this is available through our documentation platform which is on docs.openaq.org. And I wanted to just quickly run through, I could show the, our world map. So this platform is kind of just a visual way to be able to inspect the data across our platform. If we come to Buenos Aires, Buenos Aires measures with three reference grade stations, as you can see right here, they only measure particulate matter 10, so we can look at these. And these are run by the city government which we pull in. As you can see, we last pulled this data in 24 minutes ago. And then we can kind of kind of look and we can see this data trend over time of how that particulate matter has changed. These are quite high levels, just for some context. And you can see the data is very spotty, this happens. This is kind of the type of problem we're trying to solve is that this data is not easy to keep up with. And then within this we also have, like I said, downloadable CSVs. And you can access all this data through our API as well. I also, since it's CSVConf, I wanted to share our data download which is through the Open Data on AWS program. This has our entire data archive available as GZIP CSVs. And I have an observable notebook that helps describe how to pull this data down. So you can pull data down location by location. This is a location in my hometown. Is the default here. And then you can also just graph the data as you wish. But this is a great way to download large amounts of data across many locations and across a large time span. So I'm gonna go ahead and stop there. But you can find us all over the web, openaq.org. We're on Twitter, we have a Slack space. And check out our link tree. That has all of our documentation. Our platform is a fully open source on GitHub. So feel free to check out any of our code. And I'll stop there and open any questions. Thank you very much, Russ. Do we have any questions? So is there someone working with deep learning to predict our quality in some place that you can't collect the data? Is there something like that? Do you know some initiatives like that? To predict where we can collect the data? Is that the question? Yeah, for example, let's say a location that has no sensors. But you have certain features that you could use to train some machine learning model to predict how would be the quality of the area in that location. Okay, yeah. So there is a lot of modeling out there that uses things like deep learning. And often out there you'll see interpolated surfaces of air quality. That's a very common thing. It's a very complicated thing to do. And I can't speak to how effective that is. We don't do as much on those types of data products, but I would take those types of sources sometimes with a grain of salt because it's extremely complex error is a very complex phenomenon. So what these stations monitor is, I mean, the characteristics of the device is very dependent on what that means. So they're very sensitive to heat and humidity. So it's actually a really complex interplay. Even among these different pollutants, they interact with each other differently over time and with heat. So as the temperature rises, ozone can actually diminish other pollutants. So it's extremely difficult to model, I think effectively, unfortunately. So as, and what we advocate for is better monitoring coverage because the more you monitor theoretically, the better you can model. So that's kind of what we're advocating for. But in very unmonitored areas, it becomes very less reliable, I would say. So, and if you saw the map, I mean, you can see quite clearly there's a huge disparity in the world where there is monitoring. And so there's a lot of interest in that type of work, but first what we need is more monitoring. So places all across the African continent, large parts of South America, very little monitoring. It would like to see more monitoring to improve that. Thank you. You may have mentioned this a bit, but how much of that gap can those really cheap sensors fill in or how much, you know, like if there's 5,000 miles where there's no regulatory sensors, but we find people to put up a little consumer sensor and their house, does that help or is it kind of, are those kind of, yeah. Yeah, that remains to be seen, I think. These new, these air sensors are kind of new. We're learning what they can do. They can only measure certain pollutants as well. So most of them on the market can measure particulate matter and a very common sensor out there that's in a lot of these devices. It's become more obvious they can only measure PM 2.5, which is a more critical version, but a lot of them measured PM 10 and it's become obvious they can't actually do that very well. We believe at OpenAQ though that low cost sensors can be really powerful to help start to fill that gap. Even if they have certain weaknesses, that's a great place to start. As I said, the reference monitors are extremely expensive, so the disparity between the prices is quite high. So in some parts of the world, buying a half million dollar device and then maintaining it is just out of the question. So there's a lot of work that's happening to try that, but unfortunately it remains to be seen how effective they are. In the United States, you can see there's a lot of coverage in the United States particularly. This is kind of very, these are mostly these low cost sensors from consumers, citizen science type of thing, install one in your backyard. Yep, yep, but you see less of them in South America and in Africa even, because even at their price point, they're still somewhat inaccessible, unfortunately. And have there been instances where governments have invested in these expensive stations, but then don't want to share the data in order to like, don't look bad or transparency problems, like. Yes, that's a big barrier we've seen, is that particularly in areas of the world where air quality is really bad, there's sometimes hesitancy to share that data. There might be internal government workings, why they have that. One nice program that the US government has is on top of every embassy around the world, there's an air quality, one of these reference grade stations. So that's helpful to create one proxy for some areas. But yes, air quality is a very political issue. And so some of the gaps that you see on the map are politically related in that they're monitoring, but they're not sharing the data. And that's something that we advocate to change, which is that we want more monitoring. And we've seen some examples actually going back to the low cost sensor example in which low cost sensors have been used wrongly and they require often some correction. A journalist got ahold of some data in one country that was a low cost sensor and said, oh, we have the worst air quality in the world. And the government had to kind of pull that back and go, whoa, whoa, whoa, this is wrong. We actually have reference monitors, but they weren't making that as easy to get. And so the journalist went for the easiest one to get, which was fair enough. So openness, that was good. I think that was a success story for that government that they said, we need to make this easier to get of these reference grade machines to prevent those kinds of problems. Any other question? If not, let's give Ross a big round of applause. Thank you.