 Good morning, everybody, and welcome in this yet another CloudFarro webinar. Today's topic will be observing air pollution from space processing Sentinel 5P data on CREODS. My name is Cyril van Beijmen. I'm working as a remote sensing data expert. And I've been working a bit with Sentinel 5P over the past year, and I would like to share my experiences in how I used CREODS to process this data. So today's webinar, the agenda will be, I'll talk a little bit about Sentinel 5P satellite mission and data products available. Then the next step is how we can access Sentinel 5P data on CREODS. Then I will talk about processing this data on CREODS, and I will finalize, finish with a live demo in which I show a Jupiter notebook, which I run through a process of searching for Sentinel 5P data, processing Sentinel 5P data on CREODS for a specific case study, one that has been in news recently. I'll talk a little bit, start with the Sentinel 5P mission. Sentinel 5P stands for Sentinel 5 precursor, so it's actually a test mission before the real Sentinel 5 mission will be launched. It doesn't mean that the system is very experimental, and the data is not very validated or applicable. It's a very popular mission, and the data products are used extensively for all kinds of different applications. So it's a satellite mission that will focus on monitoring the atmosphere with the onboard system, the tropospheric monitoring instrument, the tropomy instrument, and the main objective is to monitor atmospheric measurements with high spatial or temporal resolution. So every day there will be new recording, and this will be used for monitoring of air quality, ozone and UV radiation, as well as climate monitoring and forecasting. There's more information about the satellite mission and the data products and the link below. Yeah, so I recommend you to, if you want to reach further to go there. And yeah, the data from Sentinel 5P is produced to quite a wide range of level 2 products. A quick walk through what main products there are, there are a few more than on this list, but these are the most important ones. So yeah, the first one is the aerosol absorption index, which is a measure of aerosol density. Aerosols are tiny particles in the atmosphere, and they can be related to dust outbreaks, volcanic ash, biomass burning. Yeah, it's a measure of the aerosols in the atmosphere. And then we have carbon monoxide, which is a result of combustion of fossil fuels, biomass burning, atmospheric oxidation of methane and other hydrocarbons. There's a lot of very complex chemical processes in the atmosphere that are ongoing, and these products, they are giving an indication of parts of these chemical processes. Yeah, then there's one very short cloud fraction, so just a cloud map, basically. Also very important for meteorological purposes, but yeah, also gives a good indication where our clouds, so where our other components, other than clouds as well. Then we have the formaldehyde, which is an intermediate gas that's formed after oxidation of also methane, if I'm correct, and eventually leads to CO2, because that's the big one always, CO2. Yeah, there's no, as you can see, there's no CO2 in the lists. Yeah, but there's work now on new satellite missions that are more focusing on that, but this, a lot of the products that are created by Sentinel or created from Sentinel 5P data are giving an indication of CO2 levels as well. So methane, which is a very powerful greenhouse gas, it's emitted naturally, but most of the emissions are now anthropogenic. Nitrogen dioxide, quite well known, is a result of combustion of fossil fuels, biomass burning again. Ozone, yeah, ozone is known from the ozone layer, but it's also a greenhouse gas in parts of the troposphere. And sulfur dioxide is also a result of combustion of fossil fuels, as well as volcanic ash. So there's quite a range of products and a lot of them are related to greenhouse gases, a lot of them to general air pollution. So it's nowadays a mission that's creating a lot of attention. As you can see there, the use cases, some of them have really been in the news. So the data is used for monitoring of air quality, which can be related to seasonal variations or events, forest fires, which will be the topic of my live demo later, the forest fires. You may guess which forest fires I've been looking at. And also what was really in the news last springtime is the effect of the lockdown measures around the world related to coronavirus on the air quality. On the top right, you can see it. I hope it comes through nice quite well on your screens at home, but it's an animated image of air quality. I know two emissions over China in springtime, where you can see a big drop when the corona related lockdown measures were taking place that the air quality suddenly improved a lot. There's just a lot in the news. Yeah, and then also the use case, a lot of greenhouse gas emissions, as I called methane, carbon monoxide, formaldehyde ozone, they're all related to greenhouse gas effects or greenhouse gases. So, yeah, on the bottom right, there's also an example, which is quite interesting, which people didn't know. From Sentinel-5B data day in parts of Siberia, they suddenly noticed there were a lot of emissions in places where there's not really a lot of population density and along a line, a certain line. So they figured out it was actually along a pipeline, which transports gas from one part of Siberia to another. Along this pipeline were pumping stations, which pumped the gas to a certain pressure further along the pipeline, which also image a lot of NO2 is this case, a lot of emissions. Nobody really knew that there was emissions sources, but yeah, from the Sentinel-5B data, it was quite clearly shown that these emissions take place there in very low populated areas as well. So just a few use cases. So we'd like to continue now to, well, the Sentinel-5B as it is on Crayodias. Sentinel-5B is hosted on the Crayodias platform as just another satellite data source. Sentinel-1, 2, 3, Landsat data, other data sets as well. Sentinel-5B is just another one. It's stored and it's indexed, so which means that you can search it through the Crayodias finder or in the Crayodias browser or through API access. Yeah, it's stored in a net CDF format with a suffix with an extension NC, which is basically a database format, which is very commonly used in meteorological data or large scale global data sets often. So it can be accessed. I will show in my live demo how this data is accessed through API querying. Yeah, and then processing data processing can be processed in a lot of different bits of software. I've been focusing a bit on the software that I've learned on a training course last year, which are the HAARP tools, which is developed especially to ingest a large amount of mostly global or lower resolution data sets in an automated way. And the software capable of ingesting a lot of different images and creating monthly, weekly, daily averages of specific areas of interest. Yeah, that's what I also would like to mention is that the data, if you work on a Crayodias virtual machine, there's absolutely no need to download the data. You can work right there. You can access the data. Is it where a folder on your desktop or on your hard drive? So there's no need to download the imagery. And there's the computing capabilities are flexible. You can scale up if you need to process a lot of data. You can downscale it if it's not so much, if a lot of computing power is not necessary anymore. Yeah, these HAARP tools, this HAARP software, more information can be found in the link at the bottom. So, yeah, then I would like to go to the case study. Well, a recent example is actually still ongoing. It was in the news last week, the last two weeks, and a huge amount of forest fires in the western parts of the USA, mostly in California, Oregon and Washington state. Yeah, and the news reports show that, yeah, it was one of the biggest fires in decades. And that also they noticed the air pollution, the air quality was really bad on the west coast, as you can imagine, close to forest fire. But also on the east coast of the United States, they noticed that the air quality decreased a lot. And yeah, from satellite observations, they could see that, yeah, that was related to the forest fires on the other side of the country. So I decided, let's have a look at what I can find in the data and see if you can see the process of the data of the pollution related to the forest fires. Yeah, how it is transported through the atmosphere. So I decided to use the aerosol, the aerosol absorption index, I should say, a parameter that's available in the Sentinel-5P data to monitor that, to make maps of that, to monitor this drift of air pollution to the east. So my study area is the entire contiguous USA, so that's basically the USA minus Hawaii and Alaska. The study period I decided, let's focus on last month, so from, let's say, mid-August to mid-September. And I decided to create weekly averages, because Sentinel-5P has a daily revisit time, so every day it passes over the same part of the globe. But yeah, the sensor doesn't record data if there's cloud cover. So if you make weekly averages, yeah, you minimize the data gaps. So yeah, so the idea is to look for the data for a week and collate that with using this hard software. So yeah, I've developed a little processing script, which consists of three main parts. Or I would say a more processing workflow, because it's not all in one script, which I explain later. And so first step is to query the Korea's Finder API for Sentinel-5P aerosol data over the USA in the past month, and then split per weekly period. And the second step will be to ingest the Sentinel-5P in this hard software and create the weekly average images. These two steps are all done with Python, as you can see. The last step I didn't do in Python, but it can be done in Python, but I just happened to have used R quite a bit. And I've always been more impressed with their mapping and the mapping capabilities, the plotting capabilities in R. It's just a matter of my personal preference. But yeah, the first two steps I've created a Jupyter notebook, which are the most important steps is searching the data, processing the data. And yeah, that's in a Jupyter notebook, which I will make available after the webinar. The R script, I plan to convert that to a Jupyter notebook as well. But yeah, I need to simplify that still a bit. But yeah, I will show in the Jupyter notebook already how to make a nice animation in Python. So okay, then it's for me time now to switch to my live demo, which I will run and create as a virtual machine running on Ubuntu. Yeah, and for that, I've just spun up a virtual machine, installed Jupyter notebook or Jupyter, which is fairly straightforward and the necessary Python libraries. So that's all you really need to do this, to get it running on your own virtual machine or your own machine. Okay, then I switch off the presentation for a minute and I go to my dear trusted X2Go client, which is my preferred access mechanism to a virtual machine. So here I got a connection to a virtual machine and I am building a connection. I hope it works, which he usually does and always takes a few seconds for it to go running. And yeah, a lot of connections being made. And there you go. And there you go. So here I got my virtual machine. I have already started a Jupyter notebook, so I'll leave that. I think the size is good enough for you to see at home. So here I got my Jupyter notebook running in my local drive. Yeah, so I would like to just quickly walk through the script and show you what steps I have been are taken to create these maps. So first of all, yeah, some of the libraries need to be installed. If necessary, is they not installed yet? And especially these hard tools, it's only available through Comba, which is another repository than the standard PIP one, which can be installed on the command line, but it can also be installed by uncommenting this command. So here, now if you run this, it will start to look for this hard software and we'll try to install it. It's already done here, so I'm not going to be wasting time on showing how it's done again. Also, if you order libraries that are included, it can be installed also by uncommenting these commands here, geopanda, sentinel, which is a specific one. Yeah, just to convert some geojson data to a format that the API will understand. So after installation, we will need to import a few necessary libraries in Python, so the script will be able to run it. So I just click on run, and yep, they are imported. So then I have defined two major functions to run the process. The first one is a function which I called S5P, S5P search. So it's a search function which builds in this function. If you give it the right input parameters, which is geoj, which is an area of interest, date st is date start, date n. So that's the start date and the end date of the period of interest and the product type. Those parameters need to be passed to this function in order for it to run. So yeah, then it will build, you can see here it builds an API query with input data. Then it will run the query and the response is coming back and it's with all the different data and that is found on the different satellite data that's found. And then it's changed slightly to include a path to the net CDF data file in the data folder. And it will then return all the different paths to these specific Sentinel-5P images in a long list. So I will run this function later. It's just here in definition. Then the next one is this processing function to process the Sentinel-5P data using this HARP software. So I have a function here named HARP-PROC process. This will need to, this function, a specific parameter that needs to be sent to this HARP software. So it will understand how to process. So it will need to know the size of the image in pixels in X and Y direction. And also the minimum XY, so the lower left corner coordinate. So that information is extracted from the area of interest that I define. I will come back to this later. And then there are some specific parameters that need to be set to, depending on the product that we're using. So this is for the SO2 data. We need to set specific parameters. Then we have the NO2. Here's the Aerozol index, which is the file, the folder name of the Aerozol data. So we need to set specific parameters to run the processing. Pixels in X and Y directions and also resolution needs to be set, which I will discuss later as well. And then we can run all these parameters into an import function, which imports all the Sentinel-5P data that has been found in the API query step. And then the next step is to convert this data to a weekly average. And the final resulting image will be a geotiff of a weekly average of the area of interest. This is just a function to give the file name that makes sense. So I decided to make a file name that includes the first day of the weekly average and the last day of the weekly average. And then we need to extract the actual from the net CDF data that's included, the right data, basically a little database with including the data that we're interested in. So for the absorbing Aerozol index, it's this variable that needs to be excluded. And the last step is to just export this image to a geotiff. So these are the two main functions that are of interest. So now we can define the search parameter. So what area of interest are we looking at? What period are we looking at? I shouldn't forget actually to run these two because otherwise my script won't be done already in the next one. So the Jupyter Notebook has then these functions as they are defined stored and it can run them. So first I would like to set a working directory. So I just choose a directory. And then I here define this next one is I define a period of interest. So I'm looking at the date range. So I do a weekly date range. So I start in this one on the 2020, 08, 15. That's the 15th of August 2020. And I end now. So I run this and I get here a overview of the different dates. So the 17th of August was a Monday. So I have chosen that my week start on a Monday, the 24th, the 31st of August, the 7th of September 14th and the 21st. So these are all the first days of the weeks that are fall inside my area, my period of interest. Okay, which is nice. Next one, I want to define an area of interest and AOI. In this case, we are interested in the Western part. That's not completely right. That's the United States. Change that. And, yeah, this script needs a geojason to be present in the working directory and it will look for it. This geojason can very simply be made on the website geojason.io. I can quickly look on it. Yeah, here is the website. Oh, there's already a nice geojason of the area of interest, as you can see. I'm going back to my script. So I need to define the name here. And then I need to extract the bounding box coordinates of this geojason. So I will need to open the geojason. I haven't done that. And extract the extreme coordinates. So lower left, upper right. And run that. Yeah, and here you can see the bounding box coordinates are from between minus 128 and minus 63. Degrees that should be longitude. And the other one is between 24 and 49. Oh, what's this? I need to turn off my, why is this on? Between 24 and 49 degrees latitude. So that is, seems about right. But I can, I can plot it, make a plot of it and see if it's, if the geojason is loaded correctly. So I make a plot of the geojason. Wow. Fantastic. A square or a rectangle, I should say. This is the geojason plot just in a field of nothing. So you don't have any clue where it is. So there's another way to display the geojason is to do it in this geojason.io software. So here you go. I just click this link and then you can see it's loaded in this website with a map on the background. And you see, yeah, it's a right geojason. It covers the entire area of interest. So next step is a bit, not so tricky, but it's, yeah, Harp needs to have a certain resolution defined for the data because it needs to be expressed in degrees. So if you take, if you have a very large area of interest, let's say the United States is a very big country. So then it's probably not very advisable to use a very fine resolution to make your maps because the processing will take very long. Yeah, but if you take too low a resolution, then it will not show all the detail that you want. So I've made a little calculation here what resolution can be advised and made some simple approximate calculations of how big the area of interest is actually. You can extract the latitude, longitude information from the bounding box again and get an approximation of how big this area is actually and then give an advice about what resolution is advised. So what I said is like for here you can see for optimal resolution for countries like Poland, like a midsize European country. I advise to use a resolution of 0.0, 0.05 degrees, which is more or less seven and a half kilometers and approximate between seven and eight kilometers. You should know that the spatial resolution of Sentinel-5P is maximum three and a half by seven kilometers. So it's a much lower resolution satellite than, for example, Sentinel-2 or 1. Yeah, it's not as high resolution as you would see on land. There's all kinds of reasons for that but Sentinel-5P is already a big step up from previous missions, atmospheric satellite missions. So if I run this for my bounding box in America, this little function, I get a bit of advice on how what resolution I should use for my maps here. So the approximate length in latitude range is more than 2,000 and the longitude range from east to west is about almost 6,000 kilometers. That's a lot. It's a big country in America. So an optimal resolution used for processing is about 0.1 degree. So that's still quite high resolution, which will mean that the processing of each image will take some time, a few minutes. But if you do a lower, even higher spatial resolution, it will take a lot of time. Okay, now we need to still define also which product, Sentinel-5 product I'm interested in using. So I have to find here this L2 AII, which is the official name for the aerosol absorption and absorption aerosol index. Actually, people are writing that wrongly. So okay, that's also defined. Now I would like to make an output directory. This is optional. We can put it wherever you want. I just put it here. Okay, fair enough. So okay, so now it's time to run the workflow. I've defined all my functions to run the API query and the HARP processing. And now I, and also the area of interest, period of interest and output directories, the product that I would like to look into. So now I can run a search query. So there you go. Here's this S5P search function that I'm using. I put the AOI as the area of interest from the dates, the date range. I take the first one, the first date. So that's the 17th of August. And as a first date, the end date, my period of interest is the second one. So that's the next date, seven days after my first date. And my product is product, which is the aerial aerosol index. So I will run this. It will sort of send a API query to Crayola's finder. It will return the amount of images found and then it will extract the paths to the actual products. And these paths will be sent to the HARP processing step next step. Okay, now it's running. And there you go, result. And I put also a little, oh, oh, oh, oh, going too far down. Yeah, there you go. I print also the amount of images found in this query. You go go to the top, number of Sentinel 5P images found 108. So there's a 108 Sentinel 5 aerosol images covering the United States between the 17th of August and the 24th of August, 2020. And these are all the paths that you can see. And yeah, it's local paths on this EO data, Sentinel 5P, that's where they store this data. So this information, these paths, this variable can be sent to the HARP processing tool. Yeah, now I come to the next step with this processing. Well, as you can imagine, it's 108 images. They have to be sent to, or they have to be ingested in the software. Then for every pixel, we need to calculate weekly averages. This will take a bit of time. This is not something that's run in two seconds. This will for, well, my experiences, for one image of one weekly average will run like three minutes. So I am not going to run this step here because it's just done. I've done already, but if you want to do it, you can do it yourself later, of course. If you want to run this yourself, then you can do this. I've already have a list of resulting images for the five, no, six weeks that we're looking at. So I make here a list of the files that are found. So if you can see, yeah, so I have these files and I see here I have a file export between the 7th and of September and the 13th of September. And okay, that's a bit the other way around. It's going back in time. So the last in the line is the image that I found between the 17th of August and the 23rd of August. So I would like to display one image to see how it looks like. So with that, I used Rasterio software. The images, yeah, they have a bit of the projection is not set correctly. So first apply a correction or apply the projection to the images. So they have a good projection. And then I open the image and let's see what comes. There you go. It looks like, yeah, an image. And for some reason, you can see an X and Y, the latitude, longitude. For some reason in this plot, the latitude is, yeah, from low latitude on the top to high latitude at the bottom. I think it's something to do with the way this GeoTIF is defined. If you put it in GIS, it will be the right way up. And you will see that this image covers most of, well, America. And you can see these bright spots here on the left in the image. They are corresponding to high aerosol content to the West of the United States. So, hey, there's something going on there. So this can be looped. I have now one, well, in theory, if I would have run the workflow, I would have one image of one week. But now I can just loop it. So for every date in the date range, I just send an image, send this command. So first I extract the paths through the API query. And then I process all the paths that come in with path list, paths to it. And then it will run. Well, this entire step runs in about 20, 30 minutes. So I'm not going to do that again. But I will show you some results of the images that I've already created. So I want to make an animated GIF because that was my goal in the end. And actually that I can do that nicer in R, myself. But this can be quickly done also in Python. It can also be done nicer in Python. But this is just to show you an example of how it can be done. So here I search for all the files that are in the output folder with extension TIFF. And those are added to a list. Yeah, they are combined to a animated GIF. So I will run this. This will get some error messages, but lossy conversion error, warning messages. Probably it will be better off if I would convert the data from float to integer before. But OK, it still works. And then I have my last step here is to visualize this GIF. So I just press this one and there you go. Something is happening. And I'll zoom in a little. See if, oh yeah, yeah, yeah, yeah, yeah. And here you can see something going on. Still have to think it's still upside down. So north is on the bottom and the southern part is on the top. But you can see white dots here in the left side. So that's the west of the United States. And you can see also like plumes of ash or what is it? What is it? A forest fire. Yeah, ash. What is it? Emissions at least. Drifting towards the east. So there you go. You got what you want, an animated GIF of data over America. Yeah, I would like to move back to my presentation and show you what I've made in R of this final map. So I go back to my presentation and yeah, go to the next slide. Yeah, and there you can see it looks a bit nicer also with the map of America below. It's also animated. So it's not the most clear, but you can see it's flowing into each other. It's maybe not the best way to do it, but you can play with this for ages if you want. You can see there's red dots in the West initially. And then at some point there's a huge cloud, which is corresponding to the week of 7 to 14th of September. And in the last image, you can see it's also going east. So if I go back, I just close this one and go to just the viewer, then I have an animated version. So you can see if we're starting on the 17th of August, you see there's dust around San Francisco in the Central Valley in California. There are some things and you can see. Yeah, I can zoom in like this. There's also this bright red dot that you ignore that's probably related already to a forest fire here. This parameter not only shows you the aerosols related to forest fires, but also to natural occurring processes like desert dust. So that's why you see in the West of the United States there's like a permanent dust cloud. There's a lot of deserts in the West in Nevada and Arizona. Yeah, there's always more dust flying around. If we go to the next week, you can see others. It's not so much changing when you see the bright red dot north of San Francisco, which is quite likely a forest fire related thing. If I look closely, I'll see a small red dot here in what is it, somewhere not far from Texas. Also where that could also be a fire. Let's just go back. And then I go to one week later last week of first week of September, you get more red dots appearing in the West and still the red dot north of San Francisco. Okay, and then the week after, oh, something is going wrong there. There's a lot of aerosols. I think this was the week when they said, yeah, it's really getting out of hand. You can see there's a lot of aerosols building up over the West. And you can already see that they start drifting east. And then the last week I have, yeah, you can see that there's this cloud basically slowly drifted east. And you can see that also Washington, New York, the aerosols are higher than normal. You can already see a red dot over the Atlantic Ocean in the far right end of the image. And related to probably the aerosols drifting, drifting for East. And I looked at the data of today or yesterday and yeah, this plume is somewhere in the middle of the Atlantic. So I expect air quality to drop in Europe soon as well. So it's a very nice way to see what's going on and just a nice way to visualize data over time. And with that, I would like to conclude my presentation. And I would like to thank you all for joining this webinar. Yeah, my name, my email address and all the links to Twitter and LinkedIn and Facebook of CloudFerro are at the bottom. And if there's anything, please send a message. Yeah, I will try to help you out as much as we can.