 Thank you. We're really excited to be here today. Again, my name is Krista Hazenkoop. I'm an atmospheric scientist and this is Joe Flasher who is a developer at Development Seed and together we're co-founders of OpenAQ. Our mission is to open up the world's air quality data. So we'll talk about the why, the how in a moment, but first we're gonna go to Ulaanbaatar, Mongolia, where the seed of this idea first began a couple years ago. So about three or four years ago, Joe and I spent a couple years living in Ulaanbaatar and we were doing research and doing other work in the tech community but we saw that the air looked like this and so and there's very little access to air quality data for the public even though it is one of the most polluted places in the world. And so with colleagues at the National University of Mongolia we built the country's first automatically social media posting air quality instrument that shared data with the public and even though it was a side project it wasn't something that was our main work there it actually had the most impact. It was something that eventually garnered national level attention around air pollution mitigation issues that elevated the role of science and data in making policy decisions and so for us it was a turning point in realizing the power of environmental open data that it can have on a community and I did want to point out in some places in Ulaanbaatar actually to give a sense of just how polluted it is you experience the kind of the same kind of air one would experience fighting a wildfire so it's extremely polluted but the thing is is pollution is not an issue for one city or one country it's a global killer it's responsible for one out of every eight deaths in the world and to give you a sense of scale that's more than the number of deaths in a given year for HIV AIDS and malaria combined and it predominantly impacts developing countries it's also not an issue that's going to be going away anytime soon as far as we can tell a recent study projects that while the world's population will grow by about a third by 2050 the number of deaths due to air pollution are expected to double and often it's the case that in the most polluted places there's also the least data or the least studies that have been able to be done so in this graph shows that so if we look at the the x-axis on the bottom this shows the number of air pollution papers published in the scientific literature for a variety of different cities and then on the y-axis you see smoke and dust levels for over the course of a year or p.m. 10 values for those same cities and the the cities in red are actually the top ten most polluted places in the world you'll see you'll embotter up there actually and the you'll also see that there's very few papers that have been published studies that have been done for them and if you sum up all of these papers you get this dashed red line and that dashed red line is 20 times less than what you find for London and London has an order of magnitude clearer air so I think this graphic is great for depicting the serious divide between the air quality have haves and have nots and and that's a very serious issue and brings up a lot of key public health questions such as the one we'll show here so in the EU and the US we have a pretty clear understanding of the relationship between air pollution levels and risk of death that comes from lots of studies large large cohort studies epidemiological studies and lots of air quality data but that's not the case in much more polluted places where billions of people live that understanding is is not nearly as robust and big questions remain a big factor in this question mark is not having air access to air quality data and so this is one example the public health sector that could benefit from access improved access to air quality data let alone real-time air quality data another one is in the media for example we've seen all over China when air quality data real-time air quality data access was available or really scraped from government websites third-party apps developed all over the place sharing local air quality information in the way that made the most sense for those communities another example is in India when last a couple years ago at this point a comparison was made between Beijing air quality and New Delhi air quality in a graphic shown here this this one graphic really galvanized an entire nation and galvanized India around air quality issues it really elevated the conversation around mitigation policies there another sector that has not yet really been able to tap into real-time air quality data but would be tremendously benefited is a satellite retrieval to do air quality measurements the same goes for on a different scale with low cost sensors that for that citizen scientists might use both would benefit from being able to ground truth and calibrate their instruments with with real-time access to air quality data we also see that environmental activism groups would be very interested in this data there's a group in Houston right now who's just waiting to have access to their air quality data they have a ton but they don't have a way to get to that data they want to tweet out ozone violations in their region and of course in terms of air pollution policies the public can judge whether those policies are actually working or not if they have access to air quality data and so we've shown this picture with these years interlocking but this isn't actually how the picture looks and we'll talk about why so what is the state of real-time air quality data across the world kind of looks like this so there's over 16,000 publicly sharing government sponsored typically air quality monitors around the world that share their data online they're all in disparate forms sharing more or less the same pollutants but some variety and there's a key thing that a lot of this data appears for maybe 15 minutes or an hour and then it disappears and it's gone from the record and so there's a few efforts out there that are aggregating this information and putting it on maps or making it regionally available through a api for a cost or they're putting out air quality index values which are often good for public messaging but not good for many of those sectors they rely on actual air quality measurements so we think these efforts are awesome but they're not sufficient so we can see that there's all this real-time air quality data out there and krista showed the need for all this data but you can see that there's obviously a gap and so that's a problem and so we think that we know how to fix that and that's with open programmatic and historical access to air quality data and so our three guiding principles are open historical data access and programmatic access so this means open it's freely available for everyone there's historical so you can go back in history and get data so you're not losing access to the data and also it's programmatic access so this means that people can build tools on top of it we also believe in open source and so we want it to be open for everyone to see so transparency and knowing where the data comes from and how the data is being presented but it's also open source so that we hopefully get contributions from others right and finally it's community driven so this means we're reaching out to a lot of partners and asking them what they need in the data how they would like to use the data so we've had a lot of conversations along those lines but also we're looking for partners in building up the platform we are not looking for this to be controlled by a small group of people but rather to have it controlled by a global community so a couple highlights this would have been much more impressive hopefully a lot of you didn't see the thompson reuter numbers so this would be much more impressive for you but we've only been up for about two months we've got over 400 sites krista mentioned 16 000 so we have a lot to go but we've got almost a million measurements which is really big right because keep in mind other than the most recent measurements for all these places some close to a million of those data points wouldn't be available anymore they would just be gone from the record so we capture seven different pollutants we think these are the key pollutants to look out for and on the map here you can see the areas in dark blue are areas where we think we've gotten most of the data within the country and light blue is where we just have some data in the country so the system currently is pretty simple we have a mechanism that runs every 10 minutes to go out and fetch data and we'll talk about that a little bit more in a moment it saves the data to a database and we've built a restful api on top of that so you can query and filter the results that you're getting and then we have built a website on top of that so there's no tricks with our website we just use the same api that's available to everyone else so what we build and what we build out would just be one example of what you can do with the data available so let's look a little bit more at the data ingest piece so like i said this is something that runs every 10 minutes we're currently pulling in data from both websites and so this is via scraping and from apis right so some places have apis which is awesome a lot of places don't and so we're actually going through and doing the painful process of scraping the data from the sites so we have the concept of sources so sources is an api url or it's a website url and then the source has an adapter associated with it and this is where the magic happens this is where you actually go through the process of converting the source data into our data format and then before it gets saved to the database it goes through a bit of validation and this is really technical validation so this is making sure that the number the value field is a number and it's not a string right or that if it has geographic coordinates it's stored in latitude and longitude and those are numbers and not strings so the the important thing here is we're not making value assumptions on the data that we're getting we are simply getting the data and then saving it to the database so for example there are very large negative numbers in the database right like there's like nine minus nine hundred and that probably points to a problem with the instrument but we're not making those value assumptions we're merely saving the data that's being presented so let's take a look at what the data looks like we store what we think are the the most important things so date we're storing both utc and local utc is great for programming but one of the big powers of the system is going to be allow you to do morning to morning comparisons globally right so you could have one map where you're showing the pollution for morning to morning everywhere and you can't get that with utc we of course store location city country information we have coordinates geographic coordinates not all of our measurements have coordinates but about 80 of them do and it's easy to add them after we get them and then we also store an attribution field so this is a way that we can be very transparent about where the data is coming from so just a few example requests if you're looking for a weekend in Beijing currently we only have one Beijing instrument this is sitting at the embassy just for a weekend and you can just get a time box study region so this will get you all the measurements for pm 2.5 for that in houston where we have i think something like 60 sources coming in this would get you it's a threshold study so you would get all the pm 2.5 values that were above a certain level so in this case a hundred and this one's interesting there's a quiz right at the end of this one so i hope some some of you are local this one's for great britain currently that's just london so we have about 16 different instruments and here we're saying give me the top 20 pm 10 measurements and so i did this right before i came here there is one location that i think had half of the top 20 i don't know this is probably like the worst quiz ever does anyone want to take a guess okay good probably it's better you don't guess but it was ealing horn lane i don't actually know where that is so i'm not making any judgments here so on the horizon what are we looking to do for the platform obviously there's a lot more sources to pull in right so this is sort of the unfun task of going out and getting these sources and making these adapters to put their data into our system we also can provide a lot more functionality uh via the api so things like aggregating and averaging so with one api call you can say i don't just want the measurements i want you to give me a daily average or a monthly average or a yearly average i want you to tell me certain numbers associated with this and so we can do a lot more of that for you and so it's easier for you to handle the data we also want to build up a lot more maps visualizations and stories around the data and finally as we get more data and as we get more demand we will also need to build up more infrastructure and so as joe emphasized a key part of our work is really the community that we're building for this platform to to use help build and build off of it and so we're conducting one way that we're reaching out to our communities is we're conducting workshops to engage with the local science journalism and tech communities our first one will be in Ulaanbaatar Mongolia since that's where this story story first began in just a couple weeks in fact and we'll also be at a couple other conferences in the U.S. in the couple months but we firmly believe in the mission of opening up the world's air quality data and if you two are passionate about enabling truly previously impossible science influencing policy and empowering the public around open environmental data we invite you to join us thank you very much