 Thank you. Well, my name is Hermann. I'm a researcher from Connisset here in Buenos Aires. And I lead the Computational Social Sciences Lab in National University of San Martin, so which the research I'm going to talk about is based in fact on data in this research lab. So the main idea, the main objective of these presentations is a first exercise regarding what we could call the different forms or the different modes of agrarian expansion in Argentina. So the question would be, how is the agricultural frontiers expanding? It's moving forward in Argentinian soil. Is it displacing other activities, such as other crops, livestock breeding, or something like that? Is it displacing local peoples? Is it displacing forest? So we're talking about deforestation or processes like that. So what I would like you to show you here is the first exercise in building a map with the highest possible resolution, which allow us to identify three types of situations in Argentinian territory. First, recent agricultural frontier expansion zones. This is zones which in the zero, let's call it that, in the first period of time were forest or shrublands or something like that. And at the end of the period, they were transforming to agricultural use. And we'll try to divide this in more recent expansion and more consolidated expansion. And I'm not going to show you we did this because of the time, but we would like also to map recent urbanization or urban settlement zones. So we will use this data set from the ESA, which is the European Spatial Association, which produces this pixel data set. I will talk you with a little bit detail about that. Of land cover classification. So I'm pretty sure you all know a little bit about pixel data. But it's this idea that you divide, you kind of create a grid, square grid along the surface of the Earth. And for different time periods, each pixel is classified with one land use class. These pixels, you can see that they are the spatial coverage. It's pretty damn good. There is, of course, there are some zones in Africa which we have no data and stuff. But from the case of Argentina, we have a little bit, very good data set to work with. The time coverage is also, at least for the standards here, it's pretty good. It's about 1992 to 2020. So we have almost 30 years of data about land use coverage and land use change. And each pixel, it's about nine hectares, each one approximately, the surface of each pixel. So the original data set have 22 classes of land coverage. We go into nine more aggregated categories. So we have something like this, because we have, for each pixel, and for those five periods of time, we have a land class category, land class measurement. So what we'd like to do is to analyze the trajectory. This is to say, from the same pixel, how the land coverage has changed over time. So there are a few methods, several methods to do that. One, a lot of them involved aggregating this pixel into higher units. Perhaps we could take all pixels within class in our forest and make a polygon about, and make the vector data set with that class, and the same for each class. The problem with that is that each pixel could change over time, so we have no consistency between all the polygons in each time frame. We also could aggregate this pixel into higher units, perhaps census track, or county, or provinces. And then we could calculate the frequency of each trajectory type, and then perhaps keep us, keep the more frequent one. But what we will try to do is to keep the original unit of the data set, which is this pixel-based data. So what we did is we had about 2,500 different sequences in Argentina. So we had to perform some kind of complexity reduction because there were too much to classify each one at hand. So what we did was try to think of each pixel sequence as if it were a word. And then we applied what is called edit distances. I'm not going to enter in many details, but I'll try to give you some intuition. If you think of this as a word, and each class as a letter, sorry, we can count the number of operations that we have perhaps to transform this word of the sequence into this one, or into this one. Perhaps with words could be a little bit more clear. If I have casa and cama, which, sorry, this works in Spanish, but it would be like house and bed in English, I have to change one letter to transform this into this. But I have to do at least two operations to transform casa into gozo, which would be house and thing, something like that. So these two words are more similar between each other than these two words. So if we do this kind of operation counts to transform one sequence into another, we have some kind of dissimilarity or distance measurement. So we calculate that for the 2, 500 to 2,500 type of sequences. And then over that, we went and do a good old-fashioned hierarchical clustering, that thing that always work. And we find 20 groups. OK, so oh, the curve is killing me. Sorry. We have here the more important in terms of pixel count trajectories. All the yellows are what we call agricultural stable. This pixel were all over the period agricultural. The greens were all forest stable, all the time forest or shrubland, depending on the color. Perhaps the more interesting stuff are the red ones, which are this agrarian frontier expansion. These were pixels that were forest or shrubland and were transformed in some point of the trajectory into agricultural use. And those purple ones are the urban settlement recent and consolidated urban settlement zones. OK, this is pretty nice, but we have to do some kind of validation of this data. So the problem here is we don't have so many information, so many data sets in order to do some kind of quantitative validation of these results. So we did a little bit of, well, a little bit not a lot of, sorry, biological research and tried to find some qualitative information about some local areas. This study is from 2010. And for that time period, it detects, I insist, qualitatively, for a front of expansion of agricultural frontier, which pretty much coincide with some of the ones that we have found. But since this data set, this data, is more recent. It ends in 2020. We can see that there are some new agricultural expansion fronts in San Luis and in this area here in the Mesopotamia. And then we do some localized validation. We went to some, we'd select some areas that we knew for field work or for bibliographical research that we kind of knew what kind of processes were happening there. So I will show you one of them. We did this with several areas. With that area over there, which is the limit between Santiago, El Estero, and Chaco in the northeastern region of Argentina, which is called Monte Gemado. So you have a picture of Monte Gemado in 1982. That should be the start of our analysis period and satellite image of Monte Gemado in 2020. So you can see that there is some good matching there, because all the red parts are parts that were forest in 1992. And this is not good. This is a forest fire. Sorry. But this one, they are all now agricultural surface, agricultural pixels. And I'm not sure if you see, but this one over there are what we call puestos ganaderos, which are small farm households which breathe livestock. So that's why they appear in yellow, because probably these are some pastures or some forraje for feeding the livestock. All right. So that would be some area right here. We have, effectively, we have some drawbacks in this approximation. The first one, they're much more bad. The two I'd like to point out for you, is that we found some several classification errors in the land cover data. These errors are, first, they are documented in the ESA documentation of the data. And then we did some kind of validation with some specific zones. And this classification tends to confound forest and shrub land. It's mainly based on the vegetation density. They tend to confound these two classes. And besides, we are not able to discriminate for each pixel that is classified as agricultural use. We don't know what kind of crop are there. We know if it's soybean, if it's wheat, if it's cotton, if it's maize or whatever. So that would be interesting to try to discriminate that. But on the other hand, there are some advantages that I would like to point out. The first one is this. It's the freedom app at a high resolution in the long run, land use trajectories. Basically, for a lot of Argentinian territory, it could be rather easy to expand the frame of analysis to Latin America, or even more. And it has a relative low computational cost. I run all the processing in this computer over here. So it does not require a very high complex infrastructure, cloud computing. Of course, it would be nice if one could use that. But at this point, it's not necessary. But there are two advantages that I think they are the most important, at least for my work, which is we do a lot of case studies and fieldwork in local areas. So using this information and another that we're constructing right now, that we're building right now, we could have a general framing of these case studies between more general processes. And the idea that this information should be useful or it could be used in order to select new cases, not based only on what I think is happening in the ground, but with some kind of information in that sense. And just to summing up, we're trying to expand this work in mainly two areas. The main one is I'm pretty sure this is getting recorded, so perhaps I'm kind of sure that we could use some more advanced text mining techniques to improve this clustering techniques. The thing that we're thinking is something like warden beddings. I'm pretty sure that it could do a better work clustering these land sequences than these edit distances. And we're exploring new data sources with higher spatial and temporal resolution. And I'm not sure if you know, but there is this amazing and crazy data institute called Dynamic World, which it has like 10 to 10 meters, a pixel of 100 square meters, which is crazy. And it has like, I don't believe it's daily or weekly estimation of land cover data. Of course, Google is involving that great experiment. So we're trying to explore this new information. I think I'm done. Thank you very much. Thanks very much. Do I have questions? Yes, I see one in the front. Yes, thank you for this very interesting talk. I want to ask you the size of the data set you handle and which tools did you use in your local computer to handle all of this work? I think they were approximately 200,000 pixels, approximately like that, for Argentina, excluding Patagonia. The second option was the framework, the stack. The image processing we did with some Python libraries, GDAL and something like that. And specifically, the clustering sequences we did with R and a package called Traminer, which is developed for live sequence trajectory. But it was transferable, the idea. Other questions? First of all, thank you for such a good job. And two questions. If you could connect this information to distinguish between what area were legal expansions and illegal expansion connecting to the planification of that area. And I think that information also could be useful to reach maybe journalists to let us people know about this. Yes, it is possible to, in Argentina, we have what's called the semaphore, which are areas that are allowed to cut the forest. I'm not sure how to say it in English. And some are that they are completely forbidden. So that should be rather possible to cross that information, because there is the vectorial information of those areas protected. Laia over there has worked with other stuff like that. What I wonder if there is maybe some government happy to, for example, given a latitude and longitude, what kind of legislation applied to that point and cross it with this information? Yeah, I'm pretty sure. Yeah, I think so. It shouldn't be too hard to do. Thank you. Other questions? Yes. So I might not, perhaps you said it, but I didn't get it. But what's the, so if you want to divide pixels between places where it's stable agriculture and places where you see the forestation. So why can't you just look at the first pixel, look at the final pixel? And if you have forest first and then cultural, then you have a deforestation area and, well, the whole. Even why we didn't do that? Right. Beginning from just two points. Yeah, or perhaps like linear regression or something like that, just, and instead you did this whole thing with the words and things. Because there is a thing that, when we analyze the 2,500 trajectories that emerge from the data, we have some, not a lot, but some that in the middle they go back, forest, then shrubland, then agriculture, then shrubland again. So we didn't know how to work with that. So we said, OK, let's try to develop a method that cluster all based on the similarity. Yeah, I think that, but I think that was the main issue. Yeah, another one is, so with your methods, like, would you take each class as a letter, basically? But changing from one letter to another, it's always the same distance. But if you change from forest to urban area, it's the same distance as if you change from forest to shrubland, which is not really what you want. So is that what you perhaps would solve if you use embeddings? Yeah, I don't know if it's what he said, but each change is the same. In a world, it has, in a real world, it has sense, but perhaps when you pass from forest to shrubland to agriculture, you could think that that operation, it's not worth the same from that. But yes, that's why we're thinking on perhaps trying something like word embeddings or something like that. All right, thank you. Other questions? If anyone wants to ask a question in Spanish, that's also possible. Yes, please. Yes. Y la buena español. ¿Hay alguna, o hicieron o exploraron alguna análisis como de predictibilidad de bueno, dado de que alguien generó acá algo ganadero, entonces en los alrededores se generó esta consecuencia. Y por ende, eso también puede explicar que si en un futuro alguien hace algo ahí, va a generar este impacto alrededor. ¿Hicieron alguna análisis así con estos dos datos? Yes, OK. Have you made any kind of like a predictability analysis that you say, OK, given that in this place, they put some farm, the surroundings were having this consequence. So if someone else in another place is going to do the same change, we can predict that the surroundings will have like a similar effect. Yeah, I can answer in Spanish and English. No. At least, not yet. I can answer in Spanish and English. Yes, the truth is that I still do not insist that this is like the first exercise. I think that in part the issue is that they open as a series of interesting possibilities. I mean, strictly, we did not make any spatial analysis. We just looked at the trajectories and clasped them without including still spatial information, but a key that still emerges, let's say. According to a geographer, he told me that the space is completely overrated. That is, the space is overrated. So, I mean, but, yes, it is something that can be done, but you see that there is space autocorrelation, it jumps out of sight, let's say. I think I ended too early, because there was a lot of time for questions. Hello. Yes, I heard. Maybe I missed in your presentation, but did you find the other way around, like places that were farming places that became or lost, like became natural again or something like that? Yeah, we call that retraction of the agrarian frontier. There are few, but there are some. They're not the main situation, but there are a few areas that happen. And we check the photos, and at least when you look at the satellite photos, it seems that way. You put it in Spanish, and then you say that. Yes, because there are English speakers to that. Do you think this can replace the European sense? Do you think that this could replace the agricultural senses? No way. No, because this only gives information about the land use. The agrarian senses gives more information about the exploitation, about some economical characteristic of the people living in the farm, people working in the farm, the owners of the farm, et cetera, et cetera. So this is just a small part of that information. Unfortunately, there's no other time for questions. So that's now time for a coffee break. But before that, big round of applause Thank you. Thank you.