 Thank you very much. Good morning colleagues who are on site and good afternoon evening for anyone else who are on Zoom. I'm sorry I can't make it. I really was planning and looking forward to coming here to actually meet with a lot of interesting colleagues and I'd like to know you better but unfortunately due to the COVID and some other issues that I can't make it. So my topic today is to present more or less four piece of work basically. They are somehow related. So the first one is about a seamless data cube of satellite data construction. And then from this we can do all sort of information extraction. In this case I present you land cover mapping and change extraction. So you can see this is a paper published last year in remote sensing environment. We call this product sets IMAP World 1.0. We all know that hundreds of satellites are flying over in the space collecting images. So millions of satellite data are being collected but they are collected at different time different days under different weather conditions. So when such data are collected the best you can correct for the geometry so space wise they are right and then stack them together piece by piece each image is covering a small area of the world. So you can actually put all of them together into a stack. We call it piece wise raw data cube because then you can for one position you can pull all the information from this vertically a long time or you can do things in space. So but this kind of data as I mentioned they are discontinuous. The frequency varies at different locations and they are not the quality are not of the same you know if you have fog you have cloud then you don't have good quality data. So in order to overcome this problem in the recent years that people propose that concept of you know seamless data cube in the hope that we are able to produce data that in space they are continuous of same quality with no wasting you know for example cloud covered areas everywhere we have we have the data and also in time we could say that say how about daily you know can we produce daily datasets. So in this paper the production of seamless data cube for Landsat data we all know Landsat was launched first date was in 1972. So it has a long history but the data archive the best and more richer volume of data are being accumulated since we could say around 1985 or 84. So that's why there are lots of data archived in chips we could take advantage of all of those data and produce something like what we have on the right you know seamless data cube. Before I go further you know we have this kind of data actually which means it opens a door for us to do a lot of things. Now one example is to do image classification. So I'd like to share with colleagues that gave you some rough review on what is involved in image classification. So the goal is to make maps for example this this is our final destination but we need to prepare the data. So seamless data cube is one way that solves the problem on the geometrical quality solves the problem on radiometric quality. So we can restore that we even can assume that topography can be flattened. We can remove cloud and we need to fill in the missing data so there are various ways for people to do it. After you have the data in traditional image classification or mapping we start from the imagery we go to image classification that means that we use certain algorithm that can be applied to the image and to tell us what type of surface cover it is or land use it is or anything that we want to extract from the raw imagery or processed imagery to you know category. So from those categories eventually we are able to prepare maps. Now in the image classification in order to do that lots of people try to derive additional information from the original imagery we call to produce features. So feature is an important thing to extract and now we have a rich stack of information in time series so it can do a lot of time series analysis. Time series analysis allow us to detect change basically. You can do change from here you can also do change by comparing maps derived from different dates of the data so you know land cover or land use changes and of course we need some algorithm to do that. Nowadays most people use machine learning algorithms. One type of the classification need to be the classifiers need to be trained so it's a supervised classification you need to collect samples. The sample means that from the imagery you know you you can label each individual pixel or one patch of the imagery with different type of classes that we want to eventually produce the map for. So we could do them in pixel based sample collection or image chip based collection. I will go back to this why I have image chips collection for mapping. So this is a basic review. So the three critical components or elements in mapping is data. We need to prepare them as good as possible as consistent as possible so that the more consistent we can produce the data into their original physical condition. So easier for us to use you know samples collected in different times that we could still apply them to the data and the classifier is you taking the samples to train the classifier apply to the data eventually to derive the maps for us and the samples can also be used to validate maps so of course they need to be independent sets of samples. So with millions of line set imagery from 1985 to 2020 we developed a seamless data cube. So this actually involves you know data cube preparation special temporal cube reconstruction. That means you have missing data you have to fill it you have gaps you have to fill it. So some of the concept I already mentioned. Then we pull all the data into a data lake remote sensing data lake and we also need to prepare the samples right so that and also extract features and eventually we can apply the algorithm to the data lake eventually to derive the classification results and allowing us to do post classification processing. We did this all in Amazon using more than 10,000 core cloud computing so it cost us about two million US dollar to be able to prepare all the world on a daily basis. So from 1985 to 2020 a seamless data cube. With that we are able to basically map line cover on every day because we have daily data set we could do every day. Of course normally we don't do that because we need we do not need that detailed amount of information. We can do it for every season so sometimes seasonal data is useful for applications like you know climate related studies. We could also do that on an annual basis so we produce from 1985 to 2020 a 36th layer of global line cover you know on an annual basis. As I said from the seamless data cube you can produce all sorts you know data layers for example albedo for example you know leaf area index for example population for example urban extent for example urban land use for example you know wetland lakes you know whatever feature you try to extract from them. So what I present you know in that particular paper in remote sentiment environment we further did IMAP the world so that's global cover. If we summarize all the pixels we do a statistics from 1985 on the annual basis to take a look at how the forest areas are changing so this is the curve we get so annual trend of forest grassland, bareland, water, cropland and you know basically each one of them represent one one different land cover. This is the annual land cover trend in the statistics of the whole world you know in there in terms of 10 million of kilometers square kilometers. We could also do seasonal trend you can see that some of the forest you have deciduous forests so forest leaves are up or down you know in summer or winter. Grassland also you know so there are many lakes are coming down due to flooding season or what whatsoever so lots of the categories actually has seasonal patterns but normally you know not so obvious for you know urban human settlement land and you know permafrost or you know ice and snow covered areas they are much smaller there are changes so we can also take a derivative using not derivative compare all the annual changes with 1985 to see whether or not you know the general trend of the global land cover some of the areas are increasing for example water surfaces for example cropland but for forest the whole world is actually having a declining trend during the past 35 years in the time we mapped so this can be done for every country every city every province every state anywhere you know because we have produced this data so this is basically a presentation introduction about the kind of technology using cloud computing that we are able to achieve nowadays not down before you know people before cannot really do for global data so easily to make maps on the annual or seasonal basis for such a long time period so this kind of data the same list data cube as I said is the raw satellite imagery standardized so that we can derive all sorts of other information but the global land cover trend and change data set can also be used for other applications so before I further continue this I want to do a little bit theoretical introduction about how we were able to do this seems to be relatively easy we know data we need to prepare them and satellite continuously collect them for us the AI and machine learning scholars in this community people develop all sorts of classified really what we did for global land cover is to have a complete and timely invariable sample set you know that can be applied to data in any year of satellite imagery say we collect the data in 2015 as a sample would they be applicable to data satellite data in 1985 the answer is yes you know there are some conditions to do that basically these are the three component like an element like what I mentioned earlier so in 2019 we published a paper in science bulletin in china called stable classification with limited samples the reason goes back to here if data more important or classifier more important or samples are important so they are all important you know we need all of them but data can be refreshed all the time sample if you try to do the sample collection every time it's going to be very labor intensive and classifiers are relatively ready but are they you know can you blindly just select one classifier you don't really need to worry about the classifier selection for the global land cover classification certain classifier may be good for a specific theme of classification but they may not be good for certain areas but we find as a global scale the most important thing is actually you know having a good set of samples and you don't need a lot of samples as long as those samples are sufficient you should be able to apply them to the whole world and as a global scale the classification can be stable our proposition for that came from you know sample requirement analysis using local areas so we selected 13 classifiers some of them are ensemble classifiers some of them are the traditional statistical classifiers such as maximum likelihood classification such as linear uh discriminant analysis those kind of things as a complicated ones including you know neural network some of them are single classifier some of them are ensemble classifier so we tested them in a local area of eight classes in Guangzhou it's a subtropical city so urbanized areas usually are relatively complex so we choose a city like that to do the experiment our idea was to see if we vary the sample size from 20 per class to 240 classes these are all very carefully visited class samples and apply it to satellite images and using different classifiers this is basically when your sample size is very small you know they are not sufficient so the classification accuracy is low but gradually they converge you know converge to a certain range that none of the classifier all the classifiers can reach a certain level of high accuracy you know the variation among them actually is about five percent or so but the sample size variation causes 10 to 20 percent of accuracy uh change this is basically to show the classifier their performance actually are smaller so about five percent difference only try to use so this gives us a lot of interest to look into the sample you know having a full set of sample of 240 or can we use a small subset of sample still achieve relatively good accuracy so we did something you know when we reduce the sample size the classification accuracy could still be you know relatively high so that's basically what motivated us to do global land covering mapping also try to do this I'm sorry for the Chinese character for some of the colleagues but you can just see you don't need to pay attention to that I can clearly explain to you how we did this we have a global set of samples the total we can see 100 percent of the global set of samples we could use the samples to train a classifier in this case we use the random forest and then we could estimate use another set of samples to test the random forest classifier and try to get the classification accuracy the accuracy we call overall accuracy in percentage you can see when the sample size increases that the accuracy stabilizes they become within one percent you know less than one percent starting from about 40 percent of the samples you can achieve you know very stable classification accuracy so this is the using 10 000 rounds so that means when you have a smaller set to say 40 percent of the sample which we draw from the total sample we could do 1000 rounds randomly and then you see the variability the standard deviation is very small so that gives us a lot of encouragement in the future we could actually use a smaller set of samples to do global and cover mapping we also did something we know that samples are error errorness you know some samples even though we try to do very good job samples may not be totally correct so if we on purpose try to alter the samples so we can add artificial error into the sample when we begin to pure sample adding five percent of errorness into the sample and gradually adding more we find by adding up to 20 percent of the errors into the sample of training we still have a relatively stable classification accuracy so this is basically the piece of work we tried to study you know we with one single classifier of global land cover sample sets we are able to achieve some insight so that means oh we don't really care so much if our samples are wrong by 20 percent we may still be able to get relatively reasonable classification accuracy what is the implication of this this means if the global land cover area changes you use a mismatch of sample set you collect the sample in 2015 but your data is actually 1985 would it be still okay to apply the sample collected in a different time to classify the data in a different year the whole world the land cover didn't change by 20 percent in 20 years if it changed that much we probably wouldn't be able to hold the change you know so that's why we we find that we could use samples collected in one time and to be able to apply it to multiple years basically in the whole 20 36 years of time our result has been repeated by some other people doing work in urban land use mapping so they find some similar rules like this we additionally experimented with some other classifiers now using other classifiers you can still find similar results i will not spend more time to talk about that for too much so the third piece of work i want to introduce to colleagues is uh you know the three meter resolution global land cover mapping so we collaborated with since time it's a company that they they do actually face recognition in china but they also have access to acquire data from planet we all know planet has hundreds of satellite flying over the space and collecting three meter global land global data sets on average they could do it every two days imaging the whole world so we collected one set of their three meter resolution data we were able to map the whole world so we actually did 30 meter mapping the first example and the second example actually is transferring 30 meter uh samples to map 10 meter european you know sentinel data to produce global land cover at 10 meter resolution and now i'm trying to show you that we completed a data that is three meter so this is the three meter resolution data from google image you can use as a reference you know three meter on all the villages that can be mapped and uh you know this is our 10 meter resolution this is our 30 meter resolution and there are some other global products you know at 30 meter resolution uh 500 meter 300 meter resolution and 500 meter resolution so these are basically some global land cover products but none of them have annual uh data capability for the whole world at 30 meter resolution and this is the only one three meter resolution uh data layer so basically you can you can tell the spatial details that you can get you know including urbanized areas agricultural land areas even the golf course you know our detailed map and you know in some of the desert areas that you could actually get a much better mapping of all the circular features so the reason i want to show you colleagues earlier all the maps are mapped at up to 10 meter are on the single pixel base but the three meter we used you know image chips we used you know the deep learning algorithms to do the three meter classification so your samples that need to be expanded expanded or augmented to image chips so each sample is one image chip that's why at the beginning I said we need to do image chip collection to form the training samples so again you know at the global scale you know trying to look at the product at three meter resolution doesn't really help us a lot so they are they are similar to a 30 meter resolution when we look at them at this level of detail so I'm sorry for these are basically agricultural forest land you know for different type of land they're producers accuracy and users accuracy so I tried to show you that with three meter resolution the overall accuracy can be better than 80 percent so my last set of slides is to show you that to study the impact of urbanization on climate last year we published a paper on China's urbanization and how it affect the global climate so we find actually the winter warming in North America has been used by China's urbanization we all know that China's urbanization has expanded you know the greatest in the past 30 years so that's why we started this particular study instead of most people study how the climate change impact urban areas we use urban land expansion to study their effect on the climate so recently these are the three meter resolution mapping and also this set of slides are not published yet so recently we do a global study so we find observations showed that urban weighted warming is greater than the background climate warming I mean rural areas but except for South Asia you know there are some reasons you see for North America the urban weighted warming you know from 1985 to 2015 is the red line and the background is a blue line so in North America in the 1990s there was also you know urbanization actually are having a cooling effect but not anymore after 1998 also okay so because there's an urban heat island there's also an urban sink island so we we have published some work on that before if you look at East Asia the effect is very obvious you know urban areas has a much stronger you know heating effect and Europe also you know not so obvious but South Asia you know in the past 10 20 years that they have a much stronger you know urbanized area are actually having are acting less contributing less to the heat so we also find that the intensity of extreme heat in the urban areas is increased more seriously than that with the background climate that in rural area so there are some examples for North America East Asia and Europe again you know in South Asia there are some effects which is a little bit you know the whole world at different times may have different effects basic can't really see they are all consistently at the same we all know in California now you can't really irrigate your lawn because of water shortage so every but every family in their garden they are trying to plant not no not anymore grass but they try to use some you know a desert area plant you know water conservation plants in their garden so obviously if we we water our garden then that will have a cooling effect to the urban area so that's more or less the reason so we are using urban land area and land cover from 1985 and 2015 in the 30-year period we could compare their difference in this case we compared 1985 to 2018 and so there's an urban land expansion so we could run models using NCARS community common earth system model we could actually simulate the global local land cover change urban land cover contribution to climate so we find that in North America in Europe their urbanization rate didn't expand you know in about 30 some years time didn't expand a lot and East Asia and South Asia actually expanded a lot more than the developed regions in the world and we find we could actually attribute to temperature increase or temperature decrease by some of the factors such as you know albedo change I don't really have all of that clearly listed here for you but basically you know albedo change emissivity change aerodynamics resistance change evaporation resistance change you know different factors we can see consistently you know the um evaporation resistance change contribute a lot in East Asia not in South Asia but also in North America and Europe so basically you can attribute using the models to attribute what factor contributes the most in the urban area that you know the urban area are being altered some of the physical properties and each one of these physical properties could contribute to the temperature change so we did some study also focusing on regional effects we find that you know there are different degrees of warming for the same level of urban expansion so in Europe if urban expansion by one percent it you know has the greatest impact of a temperature increase and North America is also very strong China is less and South Asia is even less so that's basically the kind of message that we learn from you know what I tried to show you some of the studies we did and we have some ongoing parallel works for example how future urbanization in different parts of the world would affect climate we're working on the paper on that we are also doing urban systems model can we model the climate system like can we model the urban system like we we model the climate system so we are we have a group of people working on this we're also assessing urban development and urban health so we have a city health outlook work of the world and we recently published a paper on the population exposure to the green greenness in cities and we find the global source and global north differences that is going to appear in nature communication very soon so with all of that colleagues I will stop here I want to acknowledge some of the contributors to this set of work mostly from my colleagues in China for this set of work thank you very much thank you very much so do we have questions no questions online so one question I have so really fantastic set of data that you have shown us I wonder what information can you take from this data on the status and the characteristics of the ocean about that did you there is any work that you have done already or or not okay I the reason I want to share with you colleagues I couldn't really gave you a link for this particular set of work we used Amazon originally they gave us 20 200 000 us dollar computing and I I ran over and I was able to test all the results and algorithm methodology but the data has been upheld by Amazon I need to give them the money in order to make the data to do further work so because I can't really assemble enough money so I might have to redo it in a cheaper place I this is all very frank colleagues I haven't been able to do further you know analysis of the data set we produced but we are hoping by the end of the year we will totally reproduce the data in Pengcheng lab remember I mentioned a group of colleagues that the first group actually is in Pengcheng lab in Shenzhen so they have the cloud computing allowing us to do continuous work I mean this is only one part of it so for lots of data we can we are able to share for example the global urban extent from 2000 1985 to 2018 so and also the some of the urban land use data global and cover 10 meter global and cover you know 30 meter are also shareable not on the annual basis but on some selected years hello Peng Sandra Diaz here I have a question related to the ongoing parallel work you mentioned in the last point inequality of urban population exposure to green environments I wonder if you can share just a little bit of what are you finding there because I find it extremely important how about this I will share you with the manuscript through a Massili so basically we find that the global south cities are actually having a substantially lower you know exposure to green space areas thank you what I hello thank you for the presentation maybe somewhat of a technical question but early on when you had several different classifiers that you were using with the the urbanization effect and you mentioned that the sample size made a big difference right but actually what I took from that graph was that the the method you used was a bigger determinant in some cases even with 20 sample size you got as good of results as with yeah this one exactly so like LR and LMT are just as good with the small sample size as some of the other ones are with a larger sample size and so I'm just wondering like if your conclusion was more about which of the different methods you're using rather than sample size because I assume there's also a trade-off in time of effort with the larger sample sizes yes for this particular experiment actually it's only in one city it's not a very large area and we we did this at our initial exercises actually I would have to say classifiers plays a role I'm just trying to we did actually a survey of more than 6,000 papers we published a meta analysis of global land cover you know it's actually localized land cover at the whole world and we tried to see you know different people's finding lots of people would claim that they develop a new algorithm so they improve the classification accuracy you know our work here is trying to set a benchmark you have the same set of data and see which algorithms doing well or you know better so you pointed correctly that in this particular case actually they all are the logistic regression algorithms for this particular set of classification that they perform relatively higher than other classifiers you know using a small set of samples for this set of data but when you go to a lot larger areas the classifiers sometimes they don't consistently ought to perform another one so that's why you know on the other hand you will always be feeling that you know representative samples is important so I think in the more general when we generalize our observation and try to be wiser on doing the analysis we try to put more emphasis on how to build a good and representative and concise sample set because the algorithms could be taken nowadays relatively easy thank you for your comment any other question uh widow thank you very much for the interesting talk also maybe a slightly technical question I didn't quite understand how you did your noise corrupting experiments so you said you feed the algorithm 20 wrong data and the accuracy doesn't change but how's that possible presumably the the algorithm is a random number predict a random predictor on the on the wrong data I guess well what I did the the meaning here is basically that you know I have one set around 10 000 sample locations I know around 100 000 sample locations for the whole world so I have samples you know when I use all the samples I don't have any choice and I only have this single set of samples so I could use the sample to train random forest classifier or any other classifier in this case I try to uh change certain percentage of the categories for example if it is water I randomly change it to for example grassland or whatsoever to a certain percentage so then I will use that change five percent of the samples are being altered and to train the random classifier the random forest classifier and then use that to classify a standard set of 38 000 sample for validation so I get the accuracy so that's basically the exercise I'm showing you here when I change 20 percent of the samples I'm still able to have relatively robust classification actors so that means most of the algorithms have a certain level of robustness in you know sample error so I use this to assume if my sample doesn't it's correct but my they are collected in different locations they are correct for 2015 the year I collect the sample but they may change in time in 1985 for example then I will say the whole world didn't change 20 percent they may change 5 percent or 10 percent but not 20 percent so I could still use this setup so this is basically a test to show the algorithm and sample validity in extending it to a historical time okay thank you very much so I think we thank again Peng for this very interesting talk and