 So, hey also, Siddharth is going to talk about the view from above, getting started with satellite images. Looking forward to the interesting talk, you can go on. Thank you. I'll start now. Yeah. Okay, great. Hi, I'm Siddharth Srivastav and on behalf of our group at Alba Asylum, I'll be presenting a talk on getting started with satellite images. So if you've never actually wondered why you would want to go for satellite images, there are very interesting bunch of data points and information. So if you consider something like how the Amazon forest has changed over the years or starting climate change in my new detail, satellite images are the way to go. And they're the inexpensive, less effort way to go. So I'll set the agenda here. What I want you to take away from this is after this talk is done, you should be able to quickly set up your machine and get started with satellite images immediately, right? So our aim is to basically reduce the barrier to entry here and hopefully as we keep walking by everything would be very lucid. So I'll start out with introducing why you would want to go for satellite data. And then we'll move on to a small primer for remote sensing. In part three, I'll be talking about what the Python environment for satellite images looks like, what are some tools and methods, and I'll be introducing Earth Engine as the API. In part four, I'll be giving a short case study to talk about the end-to-end pipeline so that you have a concrete example of how you can use these things and not just abstract things in your head. And finally, in part five, I will be concluding. So depending on where you're coming from and what your background is, your motivations for using satellite images would be different. If you're a researcher, you'd be looking to see what kinds of hypotheses you can test and what kinds of research products you can come up with. And if you're a web developer or an app developer, you would be more interested in what kinds of applications are possible, like Google Maps or something. So number one point why you'd want to consider satellite images is they give you amazing spatio-temporal data, and they give you accurate spatio-temporal data. These are an absolute goldmine of information. And if you're a researcher, these things are very useful for social sciences, because not only can you see how things have been changed spatially, but you can track how these things have been changing in a temporal scale. And recently, there has been a flurry of papers in AI for social good, which have been using a lot of satellite images and consequently making models out of it. Point number two, and a very important point here, is you can use satellite images as an exogenous source. So what that means is, say you have collected some data from sensors, and you want to validate that data. So say you're estimating something like what is the population in a region. Now, usually you conduct sensors and you conduct surveys, but another clever, easy way to do this is to sort of see how many houses there are in a place from satellite images, maybe see what is the average number of people in one house, and try to approximate what the population might be. So of course, you can approximate scientific inquiries with this, and this validation of data applies to ideas too. And the last point I want to make here is by now, but this is the perfect time to get started with satellite images. And the answer to that is, as is usual in Python, you have amazing API support today. Not only do you have a lot of libraries, there is a very vibrant Python community behind this, and they've been working, they've been coming up with great, great things. So to talk about some applications here, number one would be a great, great paper by Neil Jean and his team at Stanford. And what they basically did was they tried to estimate poverty, right? So this is what I was talking about. You can transcend things like sensors and service, and you can be clever about these things. So what they did was they looked at daytime satellite images. They had a convolutional neural network to sort of extract features out of it. And using nightlight satellite images, they used it as a proxy for what kind of economic activities are going on. They were able to correlate all of these things and see what the estimated poverty in a region would be. And it worked amazingly well. Another example here would be our work in Africa and Nigeria regions. If you're not aware, in Africa, even till date, there are about 100 million people who have no access to electricity. So this work basically focused on finding the optimum places to install solar panels. So not only does that promote renewable energy sources, but it actually did end up increasing access to electricity in a lot of African regions and villages. And these aren't isolated cherry picked examples. A lot of research has been done with satellite images that has turned out to be very useful. And these are the kinds of applications that really matter, right? Because you can inform policy making with this, you can inform decisions with this. And a team at Harvard has also done a study where, from satellite images, you can estimate micronutrient deficiency. So that's some really clever work and I'll be happy to share all the resources through at the end. So to give you a small primer about what the terminologies are and what remote sensing is all about. This field has usually been shrouded in esoteric terminology and some things that computer scientists should not really care about. And I'll just give you the bare fundamentals of what you need to know to get started. So number one is a demarcation between what kinds of orbits and satellites there are and I'll tell you why this is important to know. So number one is a geostationary satellite. So these satellites are directly fixed above a single point on the equator of the Earth's surface. So it observes the same area of the Earth's surface at all times. And the physics behind this governs that these satellites should be around 36,000 kilometers above the surface. Now what that usually means is you'll have very low resolution images. And of course, depending on your application, you'll come to know when you require low resolution images and when you'll be fine with sort of grabbing more temporal data. Another thing here is sun synchronous satellites, which are a type of polar orbit satellites. So polar orbits, if you remember, are the satellites that operate in the planes, which contain the North and the South Pole. And India is sort of famous for the ISRO's PSLV programs, which are the Polar Satellite Launch Vehicles. And if India wants to launch a geostationary satellite, they actually have to contact other agencies, not ISRO, maybe some foreign agencies. So here, you can observe the entire Earth at approximately the same time each day, right? And the physics behind this is a bit relaxed, so you can go up to just 6,000 kilometers above the surface. So a bit of terminology here is something known as a swath. So a swath, you can think of as a field of vision of whatever sensor is on the satellite. So this basically means it gives you an area of wherever the satellite is passing above the Earth's surface, right? And this is important, and so is knowing geospatial and sun synchronous satellite because of resolution. When as a computer scientist, as a programmer in Python, you're working with satellite images, you need to know what resolution works for you. And when you've decided that, you need to go for the satellite that gives you that resolution images, right? So if you consider something like frequency of repeated coverage, a swath is important because the more coverage you have, the less resolution you get. So if you want resolution in a high area where you want resolution, say, one meter or something, it might take weeks of satellite coverage. But if you're fine with the one kilometer resolution or something, even daily coverage would be for this point. Another thing that I want to mention here is once you've moved on from deciding satellites and resolutions, you need to decide what kind of data you're looking for. So you need to think of the sensors and bands. So any satellite can have one or more sensors which allows you to have these independent multiple data streams. These sensors themselves can observe energy in multiple bands of the electromagnetic spectrum. To put that in context, the human eye has, of course, it can see three bands, R, G, and B. And the last distinction here is between passive and active sensors. So the energy emitted from Earth and observed by satellites, these sensors are passive. And something like radar, which is deliberately sending radiations out and then measuring the reflection from the Earth's surface, these things are active sensors. So like I said, you can have multiple bands there. And if you have bands in the vicinity of dozens, these things are called multi-spectral. And if you have hundreds of bands that you're looking at, these things are called hyperspectral images. An example here would be best to demonstrate why bands are important and why you can combine these bands to sort of fit to your context. So the example here is of NDVI, which is a vegetation index. So you can basically measure the vegetation in an area, right? The principle here is that the plants reflect at different frequencies as they progress through the stages of their life. So a dead leaf here you can see has a different radiation spectrum. A healthy leaf has a different radiation spectrum. And this is the principle that satellite images use to get what kind of vegetation that area has. Also, things like temperature, pollution, and in general, any atmospheric monitoring that you do. You need to take care of the bands and the sensors that these satellites are using. And these things are things that have to manually implement because there are a lot of libraries, but you need to sort of look through the documentation to know what exactly you want. So the last point here as a primer for remote sensing would be the intermediate processing. So these satellite data, they're not completely amenable to any solution that you're planning. These need to be processed in a mediating fashion first. So I wish to describe two main intermediate processing methods here. And again, you don't need to know the details of how the algorithm is implemented because we'll be describing some APIs for the Earth Engine, which of course is a single line API call if you want to do any of these things. But of course, if you need to do these things, you need to first know what these things are, even if you don't want to know the bare details. So the number one method that I want to describe here is ortho rectification. So if you imagine the swath of the satellite, you realize that not all of the land is evenly captured, right? Because the Earth is curved and you get all the land at different angles. So accounting for this angle and sort of correcting the distortion that comes from elevation difference, that is ortho rectification. So that gives you an even image of whatever land has been covered. The second method here is something known as mosaicing. So mosaicing basically is another intuitive thing that you can grasp very quickly. So if you consider the swath again, and this is of course a continuous swath. So you'll be clicking multiple pictures of whatever you're interested in. So stitching these images up and trying to create a coherent single image, that is mosaicing. So whatever region of interest is, and you grab overlapping swath images, you stitch them together and that's it. And this helps in reducing noise. Because as you can imagine, say if you take one image on one day and the next image on another day, and say on day one, the image that you obtain was very cloudy. So on day two, it should probably be better. And once you stitch these together, you can reduce that noise that you got on day one, right? So now I'll be talking about the Python environment and how you can get started super quickly in Python. Before I jump into what is very minimal code, I wish to describe what the data sources are and what has been going on. So like I said before, this is the absolute best time to get started. Because a huge amount of data is available today and it has been made public by a space agency like ISRO, ESA, NASA. And these things weren't possible a few years ago. So once you couple this availability of data with the abstraction layers created by Python libraries and APIs, quickly realize how powerful this time is to get involved in geospatial environment. So if I start describing all of the satellites, it'll take too much time. So I picked a couple of satellites that I'll highlight over here, which are one of the more famous ones that you will probably end up working with if you're working with satellite images. So the first data source satellite is Landsat. Landsat is a family of eight satellites. These were launched in 1972. They have eight bands and they cover spectrums like visible infrared, short wave, etc. And the resolution is pretty good. They cover till 30 meter resolution and of course the data is absolutely free. So Landsat is pretty famous for things like if you're trying to measure how urbanization has changed over the years or how land cover has been changing over the years. And similar things like forest cover change and surface reflectance. So Landsat is the perfect data source for all of that applications. The next satellite here is Sentinel. And Sentinel, again, there's Sentinel-2 and Sentinel-5D. And these are relatively recent launches. So these are launched in June 2015. And they have more bands and better resolution. And Sentinel has sort of gained this reputation of being the best in atmospheric monitoring. So say if you're looking at something like precipitation or the NO2 particles in the air. So Sentinel is the sort of the ad hoc thing to go for. And you can also measure things like pollution with Sentinel. Some other data sources that deserve mentioning here is a satellite called MODIS, which is again very good like Sentinel at measuring pollution. There's DMSP and BIRS, which are very good for measuring night light. And there's SRTM for elevation and if you want to make digital elevation maps and seeing what the terrain looks like in an area. And of course, these aren't the only satellites out there. If you quickly Google it, you will find so many more data sources. And depending on your application, you can of course use any of them. So just a bit of the recap of what we've looked at till now and what our pipeline looks like to handle satellite images. We've seen satellites and that helps us in deciding what kind of resolution we're looking for and which satellite we might find it in. We've looked at some sensors and we have seen maybe combining different sort of bands can give you different sort of applications, right? So you've looked at sensors, we've looked at bands. The Python part is more concerned with the data and the algorithms and applications are of course dependent on where you're coming from and what your background is. So if we talk a bit more about the data part of it, here's what the picture looked like a few years ago. You went to say a NASA website, you grabbed some satellite data in HDFI format, you did manual pre-processing on it and that's when you started using it for your application, right? So what's wrong with this? This is basically unnecessary stuff that you don't need to be doing. You need to be using satellite images. That's all you need to do. So there's no need for you to understand what in HDFI format is. There's no need for you to go manually processing each bit of data that you get, right? And this is where APIs like Earth Engine enter and they're immensely useful, right? Because they allow for rapid prototyping and testing. And another wonderful part of this is all the data is on the cloud. So you can use something like Google Colab and just quickly prototype your ideas. And a quote by Brett Victor Fitzbest here, where he said, you shouldn't separate the creator from the immediate effects of what they're creating. So painter is immediately able to see what they're painting. A musician is able to hear what they're making. But programmers are the only ones who sort of sit and quote and after some hours they get to see what they've made, right? So APIs like Earth Engine and this power of abstraction, it allows you to quickly see what you want. And depending on that, you can progress ahead. So when you're using Earth Engine, you just need to do four simple steps and you get an image almost immediately, right? So the first thing you need to do is you need to specify what satellite you're looking for. So in this case, we're looking for Landsat, right? And of course, you've just imported the Earth Engine API and you do some function calls. Once you specify the satellite, you specify what dates you're looking at, right? So you can put the ranges of the date or you can put one specific date. Once you've done that, you can specify the bands. And if you don't quite know what bands you're looking for and what combinations of bands you're looking for, of course, the programmer way to do is is to look through the documentation properly and try to make sense of it yourself. But in most famous applications like I'll just demonstrate, you can just pass the data you want, right? So you don't need to specify bands. You can just type in what kind of data you're looking for and I'll give an example of that here. But the last thing, of course, you need to specify is what area you're interested in looking at. And once you've put in these four pieces of information, that's basically it. You set the visibility, you center the map and you display it and you have an output immediately, right? So this is what you want. When you're working with satellite images, you want to see what's going on. You don't want to deal with HDF5 and you don't want to sort of pre-process your data for hours and then get an image out of it. So talking about restricting your area of interest, the easiest way to do this is to supply coordinates, right? So whatever area of interest is, you get the coordinates for that and you supply it in the geogescent file and you can make sort of a geometry object and you can pass this as a parameter where you're specifying your area of interest. So if we go back here, you can see on the fourth line, there's a dot filter bounds which specifies your area of interest and you've passed the geometry object there. So like I said, to give you an example with NDVI, it's the same four steps and in this case, you don't even need to specify the bands, you can just type in NDVI and Earth Engine understands what you're talking about. So you specify the satellite which is modus, you specify the dates, you specify what area of interest is, in this case, it's Maharashtra and of course, this has been passed as a sort of key value thing where you're working with geogescent files and you select NDVI and depending on your application and your context, you can put in the ballot appropriately and you get an image immediately, right? So this is NDVI, you can get things like NO2 similarly and in this case, we're using Sentinel-5 satellite and Copernicus is the program and this works for everything, right? This works for night light, this works for surface reflectance, land cover, forest cover, you name it, whatever is possible to get grasp on satellite images, you can get it quite easily. So now that we've talked about the fundamentals of how you can just grab satellite images quickly, I want to demonstrate a case study of our own work where we'll sort of, I'll try to display the entire end-to-end pipeline so that you don't only have an abstract idea of how you can grab satellite images, you can see how you can go about using them, right? So the case study I'll be talking about is a simple hypothesis and the hypothesis that we had was the spatial temporal change that you see in night light over the years. This should be indicative of economic growth and if you think about it intuitively, it makes sense, right? And that's absolutely what we wanted to quantify. So night light and low-manosity in general should tell you something about economic activity. And as a measure of economic activity, we took GDP of course, which doesn't provide a fine granularity but it works for our purpose. So in the images, of course, you can see night light has been increasing over the years and here's the difference between 2012 and 2016. So let's see if we can test this hypothesis as quickly as I just talked about. So the number one thing that you need to do here is of course restrict your area of interest. So say if we start with just one state, in this case, we're starting with Maharashtra. So you make a geogest and file out of it, you supply the coordinates and you've restricted your area of interest, right? You take out the data and on the left-hand side is the image that you actually see. So this is the data that we got. Now the problem here is that that data is not completely useful because we need values out of it. And if you see the image, we'll be doing some pixel math and if the entire image is black and only some parts of it are white, which can't be properly demarcated, that doesn't work too well. And it's not even properly discernible to the human eye. So we do a bit of pre-processing here. We convert the entire palette to green and we put some night lights in red. And now what we are able to do is we are able to write a simple function, simple equation to get luminosity out of this image. So you sum across the value of red pixels and you normalize it with black pixels or practice from the total pixels. And you basically have values. You can extract this as a CSV file and we've done this for all the states. So of course, at the risk of belaboring the point, if you need to change states, you don't need to change much, you just need to change that one parameter of your area of interest. So we did this for all the states and now we have CSV data and now we can actually test our hypothesis. We have the values, we have GDP values separately and now we can see if there's any correlation behind that. And what we found was actually true, right? So if you look at states like Tamil Nadu or Bihar, there is indeed a positive correlation between luminosity and GDP. So you can see there are admittedly a few data points here and the trend line isn't exact, but there is still some amount of positive correlation that is clearly visited. But in some other states like Maharashtra and Madhya Pradesh, this thing was actually, it was either a bit of a negative correlation or there was absolutely no correlation there. So this is how science progresses. We found that there are some outliers in our hypothesis and now we need to refine our hypothesis, right? So that's what it is. But the key point that I want to demonstrate here is this took us a lot of time to do, which should not have been the case, right? And this is exactly what I hope you will take away from this talk, is that basically this is a very simple concept that you should have been able to test within a few hours maximum, right? And as a result, it just took us a few days because we did not quite know how to get started and go about extracting images, values, and handling satellite data in general. So once you have your entire toolkit ready, you can test your hypothesis in the application that you can do all of this at an expedited rate. So that's the point here, the point that you can test specifically luminosity in GPD is that you can see how quick you are with your work. So in conclusion, I basically have just two points to tell you and now, like I said before, today you have more data, more platform and a lot more library support than you would have had even three or four years ago, right? So it's almost trivial to get started and once you've collected the bare resources, it's literally just as easy as setting up a library on your machine and you can quickly get started with satellite data. And we've talked about some APS like Earth Engine. There are other APS depending on your context. So if you're working with geospatial data in isolation, you can use something like GeoPundas. And if you've been using Pundas for quite a while, that won't be that steep a learning curve. GeoPundas is pretty simple. And this applies to GDAL too, which handles basically Laster and Vector spatial data. So there are a lot of libraries, there are a lot of methods, there are a lot of data sources. And I'm hoping after this talk, you can quickly set it up and like I showed you with just four lines and you can get images out of it very quickly. So basically there's nothing to stop you from testing out your ideas, right? And if you're more interested in making up applications, there's still nothing stopping you, right? So bottleneck here used to be that it was very difficult to grab satellite images and extract data out of it. But recently it has become so easy that there's absolutely nothing to stop you now. And that's the main takeaway from this talk. That'll be it from my side. If you're interested in more of the programs that I showed you, you can visit our project repository where we've put experiments of the economic activities of course, but we've also been tracking urbanization and we've been studying precipitation through four year and wavelet analysis if you're interested. And you can visit us and you can reach me offline at any time. And I'm also very happy to hear if you come up with any great applications. So I think any questions, yeah. Yeah, yeah, there are the chats literally filling out the questions. Unfortunately, we are kind of out of time but I'll pick a few. So first off, I'd like to say really interesting inference that you made about luminosity versus GDP. Really interesting one. So let me get on to the questions at least a few of them. So one of them is, could you post the link to the papers both to the poverty estimation and increasing electricity access? Right. I think I'll post the link to the slides and I'll add all the resources in the slides itself. So that will be much easier to go through all of the things. I think I'll send it to you Ritesh. You can share it with everyone. So you could go ahead and put it up on Zulev, actually. Yeah. Okay, great, yeah. Yeah, that works. I'll put it up on Zulev for everyone. Cool. And also there was another question about are the outputs that you showed on the PPT real time? It depends. So if you're working with, say, Landsat or something, they update it quite regularly. It is not absolutely real time if you're considering something like Landsat. But I know of some startups and some organizations that have been working on making all of this stuff open source. So you can actually get real time managers and you can see what everything looks like from above and within three seconds. So right now there's not much support for absolute real time imaging. But even normal data sources are being regulated. They're being updated pretty regularly, almost daily. Okay, cool. So I think we are on time and maybe the crowd or the audience will be really happy to reach out to you on Zulev. So thanks a lot, Siddharth. That was a great talk.