 Good. Okay. So finally, we arrived to our last webinar, our fifth webinar today. We've been through all this and we arrived to digital soil mapping in R. It's been a pleasure. I hope for you too. So today we will have three parts. We'll start by a brief introduction to soil mapping and conventional soil mapping. Then we will focus more specifically on digital soil mapping. And then we will have a session, a practical session in R, in looking at the code and mapping in SOC, soil organic carbon, in IAS racing with the profile data that we have already. So first of all, I would like to say that I am not a soil expert. First of all, I know you will probably, Mustafa and others probably know much more than me about soils and more an expert regarding statistical analysis. But well, we all know how important soils are and this has been recognized by FAO and they deliver many ecosystem services. This is a nice infographic made for the international year of soils about all the soil functions and the importance. And this is why it is so important to have soil information in the context of land use planning and integrated land use planning. And having so maps of soil characteristics and a good soil assessment will be key to assess land sustainability to support, for example, different crops or other land uses. And to finally determine an optimal land use scenario. And as we've seen in our last webinar to monitor degradation. And this is the methodological framework for land use planning that was, this figure was developed by Haki and Vastidas. And as you see in this second part of the methodological framework in which we assess the current situation, soil assessment is part of it. But of course, soil information is important to prioritize what to do where and whom and its use along the methodological framework. So let's see first what is conventional soil mapping and how it's digital soil mapping is different from conventional soil mapping. So in this map from IAS Basin, what we see are different land units which are discrete. So soil information in conventional soil mapping is represented in discrete units. And the boundaries, for example, when we go from yellow to violet, these are two different types of soils. The boundaries are very extreme. So changes do not happen gradually in these types of maps. But in sharp lines. So this is one difference from conventional soil maps and digital soil maps. Also, in these types of maps, there is no quantitative information regarding the variability within each land unit. It's a discrete model of spatial variation. There are many examples of conventional soil maps. For example, the FAO World Soil Map. I think it's a nice example because it was the first World Soil Map and it took 20 years to produce it. And it was made with a collaboration of soil scientists from around the world. And these types of maps took a lot of time to make a lot of effort, a lot of feed something. And they include a lot of knowledge from the soil surveys, the people who do the maps. Well, and these maps are dated from many years ago. It's the first step towards soil mappings. Then, approximately 30 years ago, geostatistics were introduced into into soil science. And these concepts of geostatistics allows considering the autocorrelation of the data into mapping. What is autocorrelation? It's how similar, for example, samples are and how these changes when the samples are separated are more separated, for example. So this approach maps the soil and considers the soil as a continuous body. And it changes gradually in space. And these types of, I like this figure because it was made by Hannah was looking at. Hannah was one of the fathers of geostatistics. And when he did his thesis, he did all the figures by hand. Fortunately, when I did my PhD, I didn't have to do the figures by hand. But what is representing this figure is a semi-variogram. And these models, these functions allow to model this spatial autocorrelation. Actually, here we see the semi-variance. This is a measure of how, for example, two sample points are different. And what we expect when there is a lot of autocorrelation is that as the distance between these two sample points increases, the semi-variance will also increase. So we will, when we apply this framework of geostatistics, the idea is to first build an empirical radiogram, which are the points in here we see. So we see all the pairs of observations that are separated at this distance. And we see the semi-variance of all those pairs of observations. And then we fit a theoretical model. They are different models. These are the basic parameters in geostatistics. And with this model, we model, I'm repeating model a lot, we model the autocorrelation. And we can consider that to predict values. Because the point here is when we make soil maps or any maps, is that we cannot measure everything everywhere. So we need to interpolate. We need to predict values in areas where we haven't, where we do not have measures. So how do we do that? How do we make that prediction? And that is where all these different methods rely. When we, in conventional soil mappings, well, you delimitate these land units and based on some observations and the knowledge of the soil scientists, okay, you paint these polygons all in the same color and you're interpolating somehow, you are interpolating. With geostatistics in interpolation in the prediction, you also consider the autocorrelation, the dependency of the data in space. Okay, so there are all GIS software and in R, there are many libraries that allow to use geostatistics. And what is great about geostatistics and the change is not only that it allows to map variables with a continuous frame, but also it allows to represent the uncertainty of these predictions. And this is very important. So geostatistics, geostatistical methods are also part of digital soil sampling. But there are other methods also that can be used in digital soil mapping. So what is digital soil mapping? Well, there are many definitions, but basically the idea of digital soil mapping is that you couple the observation data, the field and the lab data with environmental data, and you use a model to couple this data and then to predict any geostatistics that you are interested in in other areas where you do not have field measurements. So this will be digital soil mapping in a nutshell. So you have training data, for example, soil organic carbon, you have measurements in different sample points. You have a lot of covariates that can be derived from satellite data, for example, and that are important, that are related to soil forming factors, and you couple these two, you train your models and you use the predictive model to predict, for example, soil organic carbon in your study area. This is the basic idea of digital soil mapping. In ordinary creding, for example, you don't use these covariates, not necessarily in, there are many different ways, there are different models of creding. Some do use covariates, but some not. Okay, so let's take a look at the training data. And this is what we were just discussing. Now field sampling is essential for digital soil mapping. There is a misconception sometimes that you do not need data or that you don't use field data for digital soil mapping and that everything is done by the computing and that is absolutely not true because digital soil mapping is based on the field data and it's not solely based on remote sensing and field validation and local knowledge can also be used and are necessary and assets for digital soil mapping. And computing, as I was saying, does not replace profile description and lab analysis. Computing is a core tool for digital soil mapping and it takes the advantage of the computing power that we have nowadays and all the available information in digital soil mapping that makes a difference and it's really good and you take advantage of that availability of data and computing power but you are always based on the data that you have from the field, from the lab and the knowledge. So for ISB, after this webinar we are planning to have a training in which we will use data from ISB from the soil analysis that are being conducted. I think we have seen these maps before but there are soil profiles that were in 40 locations and many samples, almost I think 1,000 samples in which different variables were measured. Unfortunately, the data for the soil probe is still not finished. The idea is to have it finished for the training so that we use ISB basing data. But the soil profiles are finished and so today I will show you, we will see an example using this data. So when you want to map any particular, for example, soil characteristics, for example, soil carbon or any other, the data preparation is the core of ISB. You spend a lot of time preparing your data and also sometimes you use legacy data, data from other services from other moments from many years ago and harmonizing the data regarding, for example, geo-referencing and the units, the classification, the depth, et cetera, is a big part of the work. It's the first step that we need to do. So, for example, in our data set, we are using, these are the 40 samples of the different sample sites, the different profiles, and they have different layers. Some of them have one, two, three, four, five, which are here in this map. We can see the number of samples in each sampling point. The ones in red are the ones that have six. In orange, they have five samples in each sampling point. So, how do we harmonize this data? For example, for SOC, for organic carbon, there is a target depth that is used, for example, for the Global SOC Organic Carbon Map of FAO, which is the first 30 centimeters. So, as you see, if I only use, for example, for this sample, the SOC that was measured in this sample, I will not be representing the whole 30 centimeters. So, a great tool for this to estimate any SOC characteristics for a target depth are the splines, which are models that are fitted to the data and allow to estimate the SOC characteristic for any standard depth and also to fill gaps. And this way, we harmonize the whole data. And adjusting a spline, here we see a figure that shows how, for example, you can model how a characteristic changes with depth. We can obtain, for example, SOC for the first 30 centimeters in all the sampling points. So, another thing, for example, for SOC that we will see now in R, how we did it, is that to model and to map a SOC, you need to estimate it. And for that, you need not only the carbon concentration, but also the bulk density and stone content, which in some times it's not measured, so you can also estimate it. So, I did this, I here are the results for IAS basin, for organic carbon stock, for the first 30 centimeters. And as you see, we have 40 samples and the mean value is 4.1. And this is organic carbon stock in kilograms per square meter. And here is a very big important issue to consider. We will be using in this exercise only 40 samples. This is very low for digital SOC mapping. And so, it's more like an exercise and an example. At least 100 samples are needed to obtain reliable results. But this is what we have for organic carbon, because organic carbon was not measured on all the probes. It was only measured in the profiles. So, here we have, for each point, is reflecting the amount of organic carbon stock. In red, we have the lower values. And in blue, we have higher values. So, what we want to do is we want to have a map in which we do not see points, but we can predict from these points soil organic carbon in other areas. One first step is to see, for example, how, if we are going to use, for example, land cover and the soil classes, the soil map and the geological map as co-variates, we need to see how our sample, our 40 points, are represented in the classes of these maps. This is very important as a first exploratory analysis of the data. And you probably already know that most of the samples, even though IS basin has many land covers, most of the samples of the profiles were done in arable land. So, 37 of the profiles were in arable land. That is 93%. One was in permanent tree crop and two were in permanent shrub crop. So, these are the only land covers that are represented in our data sets. I was also checking about the soil map, the different soil classes. Most soil types were represented in our third, in the 40 profiles. So, in profiles, and there is also a geology map available for IS basin, and most of the, also, of the classes were represented. Some were not, but the most important issue here is land cover, because if we will use land cover to train our model and to predict, here we have to make a decision and it was, okay, we will only predict in arable land, permanent shrub crop and permanent tree crop, because in the grasslands and in the forest where we do not have samples, we already know that many soil characteristics change a lot with land cover or land use. So, it's not appropriate to interpolate or predict in those areas. We will go back to this later. So, the first big important step that is to harmonize the data and prepare the data set, we have seen. Now, we will see a little bit about the covariates and why do we use covariates and how. So, digital soil mapping is based on this notion that there are, that any attribute, or a soil attribute can be predicted from these factors, the soil forming factors. This is a nematemic, the scorpion function that what he's saying is that soils, any unvisited site, okay, this can be predicted from the soil information that we have, for example, soil maps or soil legacy data. Or nowadays also there are information available from remote sensing, can also be used. Climate has a big influence on soil properties, organisms, this includes land cover and the vegetation. Relief topographic variables are the key, of course, parent material, it influences the chemical composition of the soil and other characteristics of the soil, age, time and location. And of course, a series, a random component related to the residuals and it's usually also auto-correlated residuals. So, here is where all the previous webinars also make sense in this context. I don't know if you've seen the videos from our first webinar, there were three videos. The last one was like an introduction to all the tools that we were going to see and in that webinar we said that digital soil mapping is something that integrates all these different tools because for obtaining these covariates that relate to climate, relief, land cover, etc., we can use GIS and Google Earth Engine, we've seen in this webinar all the available data that is in the cloud and that you can download and use. And also other monitoring indexes that we've seen in the last webinar. So, within the project we've been building a GIS database using many layers of information, including many layers of information for age-racing. One of them is a soil map that we didn't do it, of course we didn't. And a geological map, this will be used for mapping SOC, now we will see. Also climate, there is a world database with 19 bio-climatic variables from WorldClimb which has a one kilometer square kilometer one kilometer resolution. And these 19 variables are related to temperature, mean temperature, seasonality, the maximum and minimum of different quarters or months and the same with precipitation and it's very easy to download and use and it's available. We have a lot of digital elevation models available, 30 meter, 90 meter, well for IAS there is a 10 meter DEM. And from this information about the height we can derive other variables like the aspect slope and also we saw in the webinar of Google Earth Engine that there are other available indexes, topographic indexes, the landform, the chili, other indexes that describe the relief in different ways. And that can also be used in this context. And regarding organisms, well there is a land cover map for IAS basing and we can also use for example mean annual NDVI as another predictor or another covariate. Okay, so once we have this GIS database we need to decide where we will predict for example in our case soil organic carbon and for that we need a prediction grid in for this exercise we made a prediction grid made of hexagons, one hectare hexagons each hexagon is one hectare and for each of these hexagons which there are approximately 115,000 hexagons because that is the area in hectares of the basin. We obtained the information regarding all these covariates from the GIS. In the case in which we have vector maps for example the land cover map of IAS and the soil map and the geological maps are We can hear you. We can hear you. Okay, I will answer. Don't make any noise, I think he is gone. Ingrid, hello Ingrid, you are off. We can't hear you at all Ingrid, hello, hello. Ingrid, we can't hear you, hello Ingrid. Who wants to? He will probably connect again. There is this. Yeah. We can't at home. Okay, I will go back. Sorry. Did you see this slide? Yes. Yes. Good. Okay. Yes. Okay. We've been having a lot of fires near my house and a lot of problems with the electricity. Okay. So, well, I will go briefly. We made this grid to predict soil organic carbon. Each hexagon has one hectare and for each hexagon we extracted information of each of the covariates from the GIS database. So what do we do with all of this information? So you can imagine we have a, you can imagine it as a map or as a table, a data table in which for each point we have the value of SOC and also the value of all the covariates. That's our training data. And then we also have a table for each row. We have one of these hexagons and we have all the information of the covariates, but we do not have SOC. So we need to predict SOC for each of these hexagons using the covariate data. And for that we need to use a predictive model. There are many predictive models that are using the DSM context. For example, as I was saying before, also, so we can also use geostatistical models such as co-crigging and regression-crigging, which is very nice because it uses multiple regression to model, but also the residuals of this regression are used for the prediction using a cruding, using the autocorrelation of the data as input data too. And now we can make use of all the data mining techniques and the machine learning methods that are available. These methods are great because they do not have assumptions on the data. And this is very important because in geostatistics you do have some assumptions. Random forest support vector machines and many algorithms are used for data mining. Random forest being one of the most, the uncertainty of the estimations. So we can also map the uncertainty of the estimations. And data mining is, algorithms are really good for these scenarios where we have hyper-dimensional spaces. That means like a lot of covariates that we can use. It distracts the relevant information from all the available data. So here is the result from the prediction using random forest and the SOC data. As you can see in this map, there are some parts that are in white and this is where there is no prediction of surorganic carbon content. And these are the exagons that are, for example, grasslands or other land uses that we are not representing in our sample. So here the prediction was done only on arable land, permanent crop crops and permanent tree crops. And blue is related to higher values of soil organic carbon content. Well, and you can see, then we can look at it in QGIS. We can zoom in, zoom out. In these areas, for example, hitting the irrigation valleys. In these areas, we have a higher levels of soil organic carbon. And here we have lower levels. And if using this prediction, two million tons of soil organic carbon were estimated for this type of lands in IH basin. The prediction error, this is also a very important step after you model and after you predict, you need to validate your model with 40 samples. This is really just an exercise. We should have much more samples, but you can have a, you can make a cross validation, for example, and estimate the error in your predictions related to the mean, for example, and that would be 27% for this. And still we need to do something in the chart. Mustafa, if you want to ask, if this is a question, I think it's... So is this map only for those between the depth of 0 to 30 centimeters or not? Yes, indeed. Yes, what we did was to, with all the profile data, we harmonized the data and only extracted the value for this first 0 to 30 centimeters. And when I didn't map the uncertainties, I should... For this, you need to adjust, for example, the quantile regression forest algorithm and here is a common random forest that you can see is the importance. It's kind of like the importance of the different variables that were used. These are the variables are used. Bio are the bioclimatic values. This is a topographic index, the mean of NDVI, the soil map, the elevation, slope, EV, the geological map, another topographic index, and the land cover. Is there another question in the chart? My connection is unstable. Yeah, probably because I'm connecting from the phone, but can you hear me? Yes, thank you. You're good. Thank you. I'm very, very sorry about the connection. We are still without power. Fortunately, I have battery in my computer. Okay, so this is the map we were just looking at. And here is another map. I did only using coding, ordinary coding. And this is a geostatistical, this is a geostatistical estimation. I just fitted a semi-variogram to the data and predicted the values using coding, ordinary coding. I will not show the semi-variogram because it's really not nice because it's only 40 data points, so it's very difficult. The general pattern as you see is similar, although here we see more variability and that is respecting the natural disco variants that we were using. Here I didn't use co-variants. Here I did use co-variants. And I wanted to compare with the other from maps from global products, such as the geosoc map. And here you can see that it's very different because for this map you have higher values in this area than here, and we obtain not necessarily this pattern. The map that we obtain is more similar. Here we can see in yellow higher values of SOC. This is the map from Soil Grids. And as you can see here, we can also see these irrigation values, probably because they used NDVI or other data that is forcing to show this pattern. But also they obtain higher values of SOC organic carbon in this area, which we did not. Okay, so now let's do the demo section in R. Let's go to the hard part. And as an introduction to the training, I also wanted to mention that there are technical manuals made by FAO, which are available in these links. And you can, here you have all the code and a lot of information and you can base yourself and you can base on these and are very easy to follow. There is this for SOC organic carbon and there is also this for salt affected soils because digital, we will be using SOC organic carbon today, but you can use, of course, this technology for mapping any soil attribute. Okay. We are studio. R studio. Do you have, I would like to know if you have experience with R, if you can put it in the chat. I would like to know how much you work with R. Can you hear me? Yeah, it's all good. Okay. So yes. Okay. So how many of you have worked with R software? Mustafa says no. Okay. Good. Okay. So I don't see any more, any other answers, but R is really, really, it's a great statistical software is free. You can use it for, okay, many have, okay, good. And there are libraries for almost everything that you would like to do. And this is R studio. You need to program and learn to program a little bit. Okay. Okay. Many have good, good. You have experience rates. So this is R studio. And when I started using R, R studio didn't exist. Made my life much easier now. I will open the code. I prepared it is this one. As you can see. Okay. Let's start. So these are the libraries I will be using. And this is where I have all the data. I, I, I prepared the, I prepared the, I prepared the data. I remember that we saw that the first big step is the data preparation. So I prepared the. It was not that hard because it's only 40 samples. But you need to specify which is the upper limit and which is the lower limit of each horizon in the, in the data. So I prepared that. That is this data. Data. It's called sock. So I'm going to go ahead and read the data. And I will. Specify which are the special coordinates. Okay. Here is this. This is because when you estimate. In stock. So you're going to come on stock from bike density and from stones. And you need to, to have sock in percent. But when you. When you calculate in. But then you need to have it in per million. So I made a new column in my data set. Please have. I am here with this function depth. What I'm making is from this data. I'm changing it into making creating a soil profile collection object. So I'm going to go ahead and look at the type of objects, especially for a soil analysis. If you don't have a experience with our, don't worry. I mean, this is only a webinar. It's just to show. And quickly how things work in our training. We will go. Deeper on each of these things. And with an introduction on what. How we code in our, a little bit of that. I'm going to create an object or read an object. I can see it here in the environment. So now my data. My data, my object called data is a soil profile collection. And that accord is a, a, has three variables and 40 observations. Okay. And here I see I haven't had any errors so far. So here this is the spline fitting with this function in our, these are functions. And here we give the arguments between brackets to the functions. And we say over which data set, which variable from the data set. And here is where I specify that I want to estimate a sock in this from zero to 30 centimeters. The spline function will not do it if, if for a, if we have only one sample, if we have only one depth, it cannot estimate anything. So if this is what he's telling me here. Now in with this, in which I will put these characteristics, the, the coordinates, the, the, the name of the site, this is where I'm preparing my training data set. This is to eliminate empty cases. Okay. So I need to estimate bulk density because I do not have it. And this is what I was trying to explain before to estimate bulk density. We will use these methods. I need, there are different methods here. They are, but we will use this one. And for that to estimate bulk density, which is something that I need to calculate the stock of soil organic carbon. I will estimate it using also sock data, but in percentage. So here are the, the commands to estimate bulk density as you see bulk density is BLD. I named it BLD. First I say that there is no bulk density. Data. So I create, you know, this column here. I estimated with the method with this method. Sorry. Not the other one with this method. And now I will have a BLD column in my data set. You always need to check that this is okay. So you can, if you ask a summary for, for this here, we can see, can you see, or should I make it bigger? Maybe no one is complaining. Maybe it's okay. But here we have the minimum. We have the maximum. We have the mean of bulk density. It shouldn't be lower than zero and not higher than, than two. I see it's fine. Okay. Good. Let's see an histogram of the bulk density data. Here we can see it. This is the, the, the way to, to plot an histogram with East. And you can specify how many breaks if you want to then less breaks in your histogram. You just change it like that. And we have your histogram. So we have sock data. We have now bulk density data that we estimated from the sock data using this method. And the course fragment data, which is named like this. I will name it like this. And it's also not available for our data. So we will set it to zero. We will say that it is all zero. But we need to run this. So from our course fragment data, our bulk density data, our stock density, our stock data, we will now estimate the carbon stock. And this is where you need it to be in per meal. And that is why I will use here. I will use here. And I will use this function and create, estimate that everything is going smoothly. Now let's see. What we have. As you see, the mean is, is 4.1. For our, for our, for the, for the, for the, for the, for the, for the, for the, for the, for the, for the, for the. For our data set. And as I said before, this is carbon stock in kilograms. And per square meter. So if you want to change the, into a tons per hectare, you need to multiply it by 10. So it's actually 40 tons per extra or four kilometers. A kilograms per meter. Okay. So here we are working with our training, data sets remember that it was the square meters yes this is a we are working with our 40 samples and I can I can write this and obtain this data set and write it and have it in my in my folder and here for example we can see these values how they in the the density the distribution of the values sometimes this is a skewed in a right skewed so then you need to apply a transformation a logarithm logarithmic transformation but this is not our case so we will like that we can see here the distribution of organic carbon stocks in our data set as you can see the L is 40 it's our 40 40 profile data 40 cents okay here we will load this library in our as I said before you have different libraries we have which have different functions the library raster and allow us to work with raster in data and we will work with the covariates I will just do a couple of things because I actually did this part in QGIS I use a I compliment R with QGIS or any GIS but you can do everything in R if you'd like this is the the shape file in which I see the borders of I- there are no what I did this error or these warning messages are because of the the versions of which the packages were built and the version of art that I have it's this is not important at all and for example here you see I have it's very easy to load a raster image I have the the DEM at 30 meters in in a tip and you just and so first you plug them in and then you put the the limits of I- and you can see here it's loaded and like this we can do it with different a covariates but I will not do it all here here for example we create a stack for the climate barriers variates and covariates because the world claim you know you have different bands so you create a stack in R to see the different bands and these are the names and what each bio climatic variable is about for example the number one be all the be a one is the annual mean temperature and be all 12 is the annual precipitation for example I will not here I can load this can you hear me what I think the light is I will connect for this now we have power back it should be okay okay so here I loaded all the climatic data and I will plot these two variables these four variables and which is annual a temperature the seasonality in temperature and annual precipitation and a seasonality in precipitation for I as raising okay so now I need to link in our table in which I have the solar organic carbon content content to the covariates and that is what I will do here because I I already prepared this table with the covariates as I said in the GIS which is called profile data and here I can see a summary so now I have a lot of variables in this and this is the land cover as you can see it is in a continuous here is a numeric variable so I will need to change it to a categorical variable for land cover geological map and soil map and all the others are also numeric variables this is what I will do here I will change that into factors I need to merge the data the soft data with the covariates and here I will change all my variables that are a numeric here two factors and they are numeric because I rasterize them first okay so now as you can see let's go back here you see this is telling me that land land cover it's my majority because it's the majority of that form is a categorical okay and this is the prediction grid I will upload it oh no sorry yeah and from an excel file which is an excel file in which I have an ID for each hexagon and all the covariates for each hexagon as I said before I did it in in QGIS I will also change each of the of these variables make them factors categorical variables and I will check that the names in the grid are the same as the names in the data set it is telling me here that the first are not which is okay because the first are the ID this is the the data of the location of each excellent the left and right and top and bottom from each excellent so I will not have these in my in my data set these variables in my data set but all the rest which are the covariates have the same name in both data sets that is very important because if we are going we are going to train the model in our data set and then we will predict in our grid we need to have the the variables the covariates with the same name so this is main look and also we need to change we need to also what am I doing here I'm checking that the levels of a land cover for example are the same in levels in the in the grid because the the algorithm will not work if I have for example of course if if in my training data set in my profile that this is what we have seen with the pie charts before if in my profile that I have samples a I do not have samples in in grasslands and then in the grid I have exagons that are in grasslands it will not work because it is that level of that variable is not representing in my training data set so I need to take care of I need to check that beforehand but the other way around also happen so I need to check that also they are not levels in the in the data set that are not present so I take it for each of these categorical variables let me look at the time I don't want to take too much time so that we have time for questions is there a question in the chat is there a mustapha do you have a question you can you can interrupt me don't please worry okay so now we can I can I ask something sure of course the values from 40 points in the huge area it's represented nicely can you provide to the data in percentage in terms not in kilograms or something but in percentage yes we can we could model a sock in percentage yes we can do it in he was walking over you I'm sorry can you come over and he wants to say something for in one square meter of 1.25 grams per centimeter when we do this way there will be 37.5 kilogram soil weight so looking at your figures I think it was 4 kilogram organic carbon or something so this is this is too big I believe in average the area might be small but I think this average is so high because you know especially in nitrogen fertilizer this is how we do the calculations and 80% of the turkish soil comes with very small organic matter and it is also true for the organic carbon that's why the figures that you show the average number seems to be so high to me what do you think well it's what the what is in the data but if you see in terms of in terms of Hector you're right but when we turn it to duckers still I believe it is high well I'm I this is only arable land right it's not the rest of the land cover so that is that could also be one of the reasons here we do not have any various any other land types are not represented it's only arable land tree crop and and shrub run sharp crops but I specifically checked that it was consistent with what is in the global soil organic carbon map and if you multiply this by 10 you will obtain it in tones per hectare so here is kilograms per square meter so if you multiply it by 10 you get this in tones per hectare so it would go from from 2 to 4 or 3 to 5 yes in this scale I agree with you when it comes to the scale no problem with the scale but still the values found I believe are high in Ayash region in Turkey in general especially in arable lands in Ayash organic carbon I believe the average organic carbon in this percentage is like point 0.8 percent that is what I believe and I know those values in percentage indeed yes you can take a look at the data if you'd like but yes the values in percentage were point 0.4 0.5 0.8 but when you transform it to stock these are the the scale changes a lot and it's consistent with the other organic carbon maps so maybe there is a so this is carbon stock it's not in percentage it's different of course too when you when you model carbon stock you use bulk density and you use the coarse fragments since we didn't have that data I had to estimate it with what we have so maybe if we had that data we could better adjust the stock from the soy organic carbon in percentage and maybe that is a source of we have the stone values in Ayash once you have data from the probes I think you're gonna have them okay good good and when then we can maybe match the stone value data to the stock data in the profiles and use it to better estimate so yeah but yeah the values of percentage where as you said in in that 0.8 0.7 but when you change it to stock these are the this is the scale okay I will share again we'll go back to R okay so let's go now to the to the modeling itself so we have the stock data we have the a covariate data now we need to train our model and here these are the variables the co variables I chose I didn't put all the bioclimatic variables I chose the ones that I consider less redundant for example for that you can do a principal component analysis and see which are the less correlated variables or you can use your knowledge I'm sure this could be much improved and other variables could be included and for this we use the library correct which has a lot of machine learning functions in there it uses other libraries and it's really useful for digital soil mapping so here I'm specifying which is the dependent variable which is the carbon stock which are the independent variables which are the covariates or the covariates I will run this part of the script here is loading the libraries that are required okay all is in order okay and here this is one of the parameters used for random forest which is the number of variables that are used in each of the trees I will not go deep into what random forest do we can also take a look at this later but it is a forest so you use many decision trees in a random forest and you use a specific number of variables for each tree so you can use here I'm saying that you can try it with these different numbers of variables it is one of the parameters that you need so I'm specifying these parameters here and here this is the function in which I fit or train and create the random forest model this is the it worked well everything is in control and here I will create this graph bar in plot is the plot in which you see the importance of the different variables it's not importance per se because it's actually it's a measure that is called node purity and it reflects because these decision trees in each node you separate your data by one variable so if this is the true groups that are created have internal purity that means that they are more homogeneous within each group and you have more node purity so this is what this graph is showing us this is the one that was in the presentation okay and this is this graph is also important because we see if it stabilizes within different number of trees so we have we made this algorithm with different number of trees and as you can see if you use not many trees it is not there is a lot of error so you need to check back with 200 trees is enough we use 500 because it's the default that is I didn't specify this parameter is another important parameter in random forests but the default is 500 and with this graph we see that it is enough if we have a lot of error we can specify that we need more trees in our random forest okay and here we predict so here we take our grid and predict you see how fast this is it's just a second it's it's only the grid has only when you take out all the exabones that have a other land covers we have I think it's 54,000 exabones so it's 54,000 hectares approximately so you make the prediction and it takes one second using all these variables okay I will create a table and I can I can check if the head is okay so I have the idea of each exabone and the prediction here I have the soil organic carbon content in kilograms per square meter okay let me see what okay it's it's already 915 here I have the code for a creating the other map that I showed and I will go fast I do not want to show you the diagram okay I will show it because it's really really horrible because we have very little little points you see here this is what we the semi-variogram so what we we expect to see is that at higher distances we have more semi-variance there is not much autocorrelation and as you can see this is the fitted semi-variogram here we make the prediction with creating with click function here we have the code for regression creating which we will not do today and this is the cross validation we can have a cross validation with a dividing our data set in 2 for example with this parameter we decide that we will use 70% of the data to train the model and 30% to validate it we can change that to 0.8 but since we have only 40 data points it's really it's really bad it changes a lot so what we will do is I leave one out cross and validation here we calculate the error the prediction error and you see it's 33 but if I do it again it will be very different because the the number of we see 26% every time I run it it will give me a very different prediction error and 23% because we have a very small data set so if we train the model it depends on the it's very dependent on the subset of data that is used for training the algorithm so here I created a function to use a live one out both for for the creating model and for the random forest model so here it is running so what it is doing it takes 39 samples calculates trains the model predicts the the one that was out to calculate the the error and that's the same many times with all the samples 40 times with the both models with the random forest and the creating model and and then calculates the the mean relative error it will take a few more seconds as when you when you have this here it's because it is running it is working I can click there and make it stop but it will not take much longer and let's see it should give us if this if we do this many times it will be very stable not like the validation the cross validation I was doing before in which I divide the data set in 80% and 20% here it will be more stable I will not do it many times because as you can see it takes some time I will go to QGIS so that we can better explore our map and visualize our results it shouldn't be much longer well there is a question in a chat box in grit they're asking whether whether they can do taxonomic classification in digital soil mapping or not so they say as far as we see we can do we can map only the soil characteristics in the digital soil map or can we really do taxonomic classification in the digital soil mapping I think if I understood the question correctly if you can for example use conventional soil mapping with the with the data set so you can you can in if you have different polygons land areas land that you delimitate previously you and they are consistent regarding different variables using this data you can also characterize for example soil organic carbon and average those those values for that type of land so there are different methods to do this one is the class and referencing or the geo geo class matching and geo matching these are two different methods for example to do conventional mapping using these data for example and the covariates and all the data I don't know that is what you were asking Ingrid I have another question actually if I may yes please well the thing is I wonder well any parameter in soil it's variable but those variables are related to the stable parameters so we see how how much there is change when you try to relate a the right thing to a stable thing this is about cringing and all the others but soil mapping is something different I believe soil mapping is about taking samples from a certain depth depth and with a variety of parameters you try to get a series of soil series data of soil and then you try to do taxonomic classification in the soil by doing digital soil mapping this is what we wanted to see actually can be by using digital soil mapping can we do taxonomy taxonomy classification taxonomy classification in soil or not by doing digital soil mapping so you mean like your your dependent variable will not be a continuous variable it would be a class that is what you want to do do you know about taxonomic classification in soils tax because what you are showing is nothing to do with taxonomic classification let's say in general you you you try to model variables that are the this is why we have this in first introduction in which our goal is here is to map different soil characteristics not taxon not not classify soils and with a taxonomical classification of soils but to map different soil characteristics that might be very useful for example to in the context of land use planning assess the land suitability for different purposes for example and with the notion of that these soil attributes are continuous in space it's like a different paradigm than this conventional soil mapping you try to to map variables that are continuous in a continuous way so you can map gradual changes with however I really do not know if you can use DSM to I haven't seen examples of what you are saying but I guess that basically this taxonomical classification of soils it's a variable that is categorical also the different taxonomic types of soils are a categorical variable so you're here in if you want to train a model to classify your soils in different classes you have a dependent variable that is not continuous that is categorical and there are models to do that but the family families of soil for example can we create families of soil taxonomy is about that as you know yeah so you need to train the basic concept is that you need in that case instead of sock because here our dependent variable is sock you will have the taxonomic classification of soils right that is what you want to model you want to predict in an area where you do not what type of soil you have or the family or you want to predict that you don't know and you want to predict it so you want to use covariates to be able to predict that so you have to train your model regarding that and I guess you can do it but you would need to have a first of all you need for all your samples you need to know which type of soil they are or which is the taxonomic unit they they belong to is it if you know what I mean the importance is of okay what I think I don't understand because because in each horizon of the soil well forget about the 30 centimeters there is this a horizon the lively horizon the lively layer of the soil can be five centimeters or zero centimeters I'm sorry or 30 centimeters you can't just depend on 30 centimeters there might be a horizon be horizon ac horizon etc but the live soil there is a different in each soil so I think this depth should be should be different for each soil type I and I think for digital in digital soil mapping we can't do it there will be thousands of variograms if you try to do it and there will be thousands of stable variable if you try to do it this way I haven't seen an example actually around the world so I was curious whether you have seen an example or not but you moved into organic carbon maybe you I thought you wanted to start simple then you would go into details of taxonomy but as far as I see taxonomy is not relevant in the digital soil mapping and digital digital soil mapping will not help the taxonomy classification if I understand can I take the process how can I say something before I am actually see the other way around you know you consider the taxonomy to predict any soil property that your target soil attributes that is what is mostly done I haven't seen an example of that but I think it's very interesting and could be could be explored but it's different from what we are doing now I will stop short so we can see yes very briefly I understand your concern not so far I understand your question the thing is in the conventional methods used by the ministry because of the law actually soil classification is done in the conventional methods by the ministry so you are curious how you can do soil classification using the SM instead of conventional methods well but you know the SM can be used for soil classification as well I believe but there are examples in certain countries about that and the next step would be machine learning employment of machine learning techniques and certain computed techniques and do soil classification you can do it because when you doing the classification you ask about you take soil parameters and train it into the model well this can can be done using some digital data in hand this is a simplest explanation I can give so this is doable but you're asking is doable it can be done and I can share with you certain documents examples in English so it can be done but can we do it here right now I don't know because the depth that we employed was different here but digital mapping today digital mapping combined with additional techniques gives you an automated system with which will help you doing any classification I think so just bear in mind I think you can do soil classification with this I can share with you some example exam three documents later thank you in rough and colleagues okay I understand but we are not producing anything well let me put it this way maybe for each parameter including sock or clay or sand producing maps for each of them separately then combining them in a program maybe possible this is this is this way but this cannot be done in a digital soil mapping this is what I understand I don't know which country is doing it and rock but I don't think this is possible in DSM Ingrid I interrupted you I really beg your pardon I just wanted to really underline our concern and the expectation in the first place we were expecting digital soil mapping to be used in soil classification but it was not the case I'm sorry thank you yeah I mean you can use digital I mean when for soil classification you probably you use many variables and for example you can use digital soil mapping to obtain maps of all those input input values that you use for soil classification first you can use also I mean you you can definitely use a digital soil mapping to have a soil classification so first you can use digital soil mapping for all the input variables that you need for that classification so you can predict for example the different texture variables the the chemical variables and you can create maps of all these variables then you can then you can definitely use an classification algorithm to classify every cell in the grid using those variables for example but you need to have what you also need to have is in this model to classify data to classify soils based on different data sets you need to to to to have a training data for that so for your samples you need to also classify them previously to train your model for example or you can have an unsupervised classification and then you decide which type of which group corresponds to which type of soil but yeah I mean you can do it it's not the idea of this of this today's webinar or of the training because the idea is to really understand and be able to create these layers of of continuous maps of different soil characteristics then you can use that as input for any classification to that is okay you can do it okay so before we finish because we are already today I wanted to show you the map that we created for SOC for example how with more detail you see that you have a color or a value for each hexagon and the white hexagons are areas in which the land cover is not arable land or tree crops or trap crops so this is the grid that we created you see and here is our predicted SOC and you can use it as a new layer and the same with the many other characteristics so when we have the data from the probes which we have more data points and we will have other characteristics we will create maps for each of these characteristics that is what we have planned and what we intend to do and what is digital soil mapping mostly used because these layers are very these characteristics these soil attributes are very important for making decisions and achieving sustainable use sustainable soil management and if you want to use them also to obtain these maps this classification of the soils you can also do it and I will look into it and also try to find to see if I find any examples about that good are any other more questions okay so I have a question so far we've been working with FAO on the Ayash basin as a soil expert I'd like to say that it's all about bringing different parameters and creating maps so whatever you are explaining now they can be done in different ways in different methods and we're actually applying these methods it's either you misunderstood our institution or we misunderstood you so the the words promise in the beginning were different but now I am expecting the response to our expectations in the beginning okay yeah I mean I don't know maybe Sarah or Haki if you can help with it because the idea of these webinars since was to give tools we are in a project that is about integrated land use planning and we need in this part of the methodological framework we we cannot do the whole process within the context of this project and we are basically focusing on these first steps which are at this stage at least which are assessing different and can I say something so far whatever has been done is not a training it's about raising awareness and giving information about different tools this is not called a training can I take the floor I guess Ingrid has mentioned since the very beginning has been mentioning since the very beginning as Mustafa mentioned also it's about raising awareness about tools and this is a plan of the training we wanted to show what kind of tools we will be using during the trainings what kind of data we can use during the training and and we were planning to raise capacity and increase capacity if you think that all the variables if we we should be discussing the result space and all different varieties variables so this is something we have to still think about and we have to shape it together but the trainings I do understand I do understand you actually it's good that you are raising your concerns and I see your excitement and your expectations that is good and we can deepen our work depending on the need but the purpose here or one the purpose of what Ingrid has been explaining it's not about showing the maps it's because we have to do a lot of preliminary work we unfortunately cannot go to the field this was not the purpose of this webinar is as Swat mentioned it's about raising awareness yes it's about increasing capacity about various tools and different methodologies and different tools to provide this basic logic on different parameters so what is important in digital soil mapping is to ensure the quality of each thematic tool and then once you approve and verify the quality you can create algorithm to do the classification so and you can even create your classification system of your own so classification issue is let's say it's one-fifths of the entire task or entire work then actually that's how you do in the field as well you have to collect data first so you have to collect a different various data and parameters and then you harmonize them and then you provide and create an interpretation map so it could be different map it could be a different classification for Turkey it could be private classification or own classification of Turkey you can do it so there are such examples there is a simple way of doing it for this series let's say ph above this or below this we have this in those parameters you will just select the criteria and then within that series you just create and classify this is not very a complicated thing so therefore that part can be done technically it's possible but and there are countries that do it maybe it will be interesting to you India is one of the countries that is doing a lot of work on it and China as well because these are big huge countries in terms of geographical size and these are the countries that need such systems for example Russian Federation is another case and it's an example so I can share the documents with you but the purpose here it's not about interpreting the results it was about showing the tools and Ingrid has been doing her best to show them to you you should not misunderstand us it should not we are not here to do the interpretation of the results so but we can discuss it still but of course you have to raise your expectations and concerns that's important for us too esteemed you are the person who would understand us most mostly I don't know how good Ingrid is with soils but in the very beginning when we were discussing we were wondering what digital soil mapping is and we wanted to know how to what extent we can benefit from it the thematic maps we can create it ourselves we do create them maybe they lack something maybe they are not good but we can create those thematic maps of course we appreciate everything that has been discussed and that has been presented here in terms of I of course I increased my capacity I just wanted to voice what was actually our expectations since the very beginning excuse me may I intervene? I believe you hear me right and thank you very much Mr. Trokhtutin for your question I believe the the agenda of this webinar was already shared with you so I understand that digital soil mapping or more technical training is something you will be looking for but I believe from our side we follow the agenda and what we have promised because this series of webinars have been agreed with the ministry and with our colleagues especially with Nurjan Hanuman the Ibrahim Bey so I understand that maybe there might be needs for more technical interventions and technical capacity building but the the idea of these series of webinars was basically to explain what we have done so far with the soil analysis and what type of tools they are as as you saw we also I mean Ingrid is basically one of our best experts that we have for also statistical analysis so you know that this was not only about digital soil mapping but the whole process and as you know the the project is about integral land use planning so we have relatively a small project and there are a lot of components so within our capacity considering also the online trainings and not being able to to visit in person this was what we have proposed and was agreed by your side so I think the webinar that was conducted was basically within our promise that what we have promised we have basically fulfilled now if there are any requests for future we are happy to hear that if it is possible within the context of our project and we will be able to fulfill it within the remaining time we are happy to hear that because a TCP facility is basically for capacity building and awareness raising and for sure you have already a lot of capacity so by no means we are here to tell you this is what we have to do or this is not this is like a brainstorming and show you our process within the context of integral land use planning now we are here to listen to any suggestions as I mentioned if possible within the context of project for remaining few months and if it is not within the context of project it's for sure something to be considered in future for other projects thank you let me say thing it's about integrated land use planning I do understand but mapping and soil studies is the basis for this in project without addressing the without creating these bases you will not be able to do a lot in terms of integrated land use planning that's my understanding of the situation I'm sorry I I think I'm not very clear about what exactly is missing from our side okay Mustafa also has already mentioned what is missing here and what were our expectations about the digital soil mapping so it's about combined map mapping taxonomy classification by FAO that was our expectation but what you are explaining now is about creating thematic plan maps sorry that is what digital soil mapping is about creating maps of different soil characteristics different soil attributes which can definitely be used in the context of land use planning I mean not just soil taxonomy I understand what Mustafa said and I think we can I can look for examples for that to achieve that and for a land use planning if you can map different variables and for example use them for suitability analysis that is great that is very important you don't only need a soil classification that is not the only useful information having these layers these maps using digital soil mapping is I understand but the data you are well let me put it this way data I talk about maps I talk about Mustafa is talking about are the maps which matter actually so when you have the map that I talk about you can draw any parameter from that map including stone including carbon so I'm talking about general maps general combined maps soil classification maps which can be used for any purpose actually you can draw you can withdraw any parameter from one of them one of such maps okay we can discuss this further I think that if why would you I mean if you already have the parameters having each parameter map map is is having the raw data instead of having the classification but I think we can keep with this discussion I think the important of what Sara and Haki Haka are saying is that we are in a TCP project which is about land use planning the soil assessment is key and is very important but is part of these process of land use planning is is not the only part and even within this part of assessing the current situation these series of webinars was a way to I think you understand me because you are the most competent person in this topic here so I think you can talk to Emrah later and this question can be solved among yourself thank you very much I don't be I don't want to be taking much of your time well there is no need to develop on this discussion I think any further because as as discussed different capacity building programs can be suggested and we can talk about it internally a solution we can try to find a solution to meet your expectations but as Sara already said this was agreed agenda anyway we were in coordination with the ministry when identifying this agenda in the first place maybe expectations changed in time I don't know since we identified the agenda because it's been sometimes since we identified the agenda together maybe your expectations changed in the meanwhile I don't know but we can talk it out talk it out later if a more specific specific detail training is needed activity is needed we can try and do our best if we can do it we do it this is it I don't want to really take much of your time I thank you I won't be talking any further I won't be taking much of any much of your more of your time I think Mustafa already left he loves them he has a business to catch up he said to me says so what so he left okay we can talk about it later then thank you thank you very much