 Hello, everyone. I'm Olivier Narvez, the animation coordinator of the French Baudivercite Data Hub, aka Pôle National de Donnette Baudivercite, which is a national e-infrastructure for research and proposing fair tools and services around biodiversity data and metadata. This e-infrastructure is led by the French National Museum of Natural History. As you can hear with my accent, I'm French, so sorry guys. I just hope you will understand everything that I will show you for the next hour. Now, I'm going to share with you my screen, and please, for a few seconds. An analysis by a diversity matrix with the Baudivercite Data Exploration Tools running. Indeed, in this tutorial, we will extract and explore a dataset from the famous Reef Life Survey project. So, we'll be using data extracted from the Australian Ocean Data Network Portal. We have decided to use some data only on the Molyscaphalum from the east coast of Australia between 2008 and 2021. We will explore this dataset in this view of making statistical analysis. So, we'll check the HomoCessda CTC and normality of the variables, see if some variables are collected or not, how the data are distributed, so space and time, etc. And finally, we'll explore beta diversity through the computation of the special contribution to beta diversity and the local contribution to beta diversity. So, before to start the training, I will present to you the Galaxy Ecology Interface. So, the first thing to do is to simply to go to Galaxy Ecology through the URL Galaxy URL Ecology.Galaxy.U, like this. Okay, here we go. So Ecology.U'sGalaxy.U. The interface is composed by four main parts, the headliner, where you can log in and find data and workflow, the tools here on the left, your your history on the right, and the main part where you can check your result and visualizes your data. You have to log in for the Galaxy because it's better and you can, if you are logging, you can save your history and something like that. So, it's better to print an code in simple, free and better for you in the future. Okay. So, now we will check the Galaxy training of biodiversity exploration. One of the ways to go there is to go to the support tutorials and Galaxy Training Network. So, there you can see that there are a lot of trainings about many topics like assembly, climates, epigenetics, genomics, metabolomics, proteomics, etc. And for us, Ecology. In total, you have eight tutorials and for us, especially the Bound University Data Exploration. So, we will follow this tutorial. Okay. This page is like this. So, the title, Bound University Data Exploration, the authors, and you can make a contribution if you want. So, it's because Galaxy, the Galaxy project, it's a common and collaborative project. You can make contribution and it's better to understand that you will do and it's better for the community in general. So, this training is composite of four parts, the attribution of course. The main, the first part is the data preparation. Get data, upload data, customize your data set. Just to prepare your data set to make real and edit this after that. The second part is the data checking. And here we will check and test the normality, the structure of your data, the correlation, the collinearity, etc. The data exploration for the third part. And we will visualize Bound's repetition and discussion. And the final part is the better diversity. And as I said, we will test the local contribution to beta diversity, LCBD, and the specialist contribution to better diversity, SCBD. Okay. So, for the data preparation, in this part, I will show you, you can get your data and customize your data set. And we'll use a classical Bound University data set containing taxonomic, spatial, and temporal information for illustrating our training. The first step is to get data. And just before that, we'll come back to the Galaxy page. So, we have to create a history. So here, you click on the plus button and then history. So, we name it. Because we will use a data set from Refresh Survey, maybe we can use the acronym RLS for real life survey. And Bound University Data Exploration Tutorial, like this. Okay. We have this name. And it's by just the exploration tutorial. And now we'll upload our data. You can upload your data from your local computer or from the external source. So, get data on your left. You have to put on the upload data button here. And your window appears. And you will upload the data from ZenoDo that you can find in the training. And past and flash data. Copy past and start. Okay, it works. We can close this window. As you can see, it's great. And it's not green this case. It means it doesn't ready to start formalizes or some customizing data set, etc. So, you have to wait. You can drink a coffee like this or just wait a few seconds. Because maybe it will be long just because yesterday, there was a big maintenance on the Galaxy Ecology platform for Europe. And maybe if it's too long, we will use the national instance. Here we are in Ecology Galaxy Europe. And maybe we can use the French instance. So, .fr. Okay, we will use it. Okay, just change and type FR for France. Okay, exactly the same interface with Bonner. The tools on your left. The story on your right. On your yes. And the main part to visualize your data and result after you did some analysis. So, in a mystery, we have to do exactly the same step. L, S for with live survey, diversity, data, exploration, get data from an external source, upload data. Now, you can see it's running. The color change has changed. Sorry for my English. Okay, now it's green. Perfect. With the high button or icon, you can click on it and check your data set. Okay, it's not possible to see something with that. As you can see on the first lines, it's not data. It's just metadata. So, we will use the tools. This name is remove beginning. This, we have to remove the 72 first lines from this data set. Just verify it's the same data set, but because we have only one data set in our history, there is no reason that this is, there is a mistake. So, okay, execute. The color is changing. You can see it's running. It turns. Yes, it's green. Okay. Okay, your data. Okay, the first 72 lines are disappear. So, that's perfect. Now, because it's not a good format to do something like this, we have to change the format. Just before, we can change the name. This data set, because the name is a little bit long. So, maybe life, so risk, that's good. Save it. Okay, the name has changed. And now, we have to specify the format. So, we'll change it to CSV. Save it. Okay, it's CSV. But we want this convert to tabular. So, verify, convert CSV to tabular. Okay, create data set. This type appears in your history. Convert CSV to tabular. So, here, as the name is history, you have the first type, reflection of the mollusk. It's your upload. After that, we have removed the 72 lines, first lines. And now, we convert this data set to CSV to tabular. Now, maybe you can check it. Okay. It looks like good. Because we want to use all the column here, because too much information, just for the tutorial, we will use another tools, which is advances, advances get columns from a table. So, same thing as before, we just advances get, sorry, from a table. Okay, it appears here. This is the first version. That's perfect. Okay. And now, you have to verify. This is your data set, printed by time, because we are in tabular, cut by fields. And we have to select the column eight. So, we have all the information of who, where, when, how. Not exactly, but the main information to to do about just the exploration. Okay, let's see. It's cool. It works. The second step to customize your data set is we use the tool column regex find and replace with the following parameters select from, we'll use this data set. And we will select the column four, one, two, three, four, survey date, just to keep the year and remove the all of this information, like months and days. So, I will search the tool column regs. Okay. This is, this is one. This is the good data set for the good, the good step of the story. Using colon four, as I said, because it's for the date, click on insert check and using the regex. So, you can just copy past your data set, your formula from the tutorial, but no worries. If you have some suggestions about this tutorial, you can send to me an email to olivier.norvezarobaz.fr. And you will, you can do anything to have better tutorials like contribution or just some suggestions. Okay. It's green. It works. Just check the data and it's perfect. Remove all the information about the months and the days and just keep the year. So, we have six colon, the size code, the latitude, the longitude, survey date, with only the year, the species name of the mollusk, and the total of the species, its abundance. Okay. That's cool. So, we'll go to for the second main part, the data checking. Here in the next part, we will check our data set. I mean, how structured this data, and especially we'll examine, examine, examine, sorry, the homocysticity. It's a difficult word to say and normality and the autocorrelation and collinearity. Here, we will check homogeneity of the variance with the Leven test for every species and represent it. So, multiple box plot and the normal distribution, thanks to the Kolmogorov-Mirnav test, which will be represented by distribution histogram and the QQ plot. So, use the tool, homocysticity and normality. Okay. You can just, because the word is difficult to say and to type, maybe you can just copy and paste from the tutorials, this function, these tools and that's perfect. You get it. So, click on it. Okay. The same interface at each time. I mean, okay, from where? And you will categorize your action. So, like, select column containing, select column containing spaces, et cetera, et cetera. And after that, you can execute. So, the interface could be complicated, but it's so simple because it's all the times the same things. So, we have to check the following parameters. So, input table, use the column redx find. Okay. This is the last one. We use it. And that's it. Select column containing temporal date. This is a C4 for column four. Select column containing spaces. This is a species name and numerical values. Total for importance. So, execute. And wait a second. Take a coffee. As I said, verify your email, your social networks or whatever. And you will have two free outputs. These free outputs are running as you can see. On your right part, the history. These free outputs. Which are the 11 tests for homostaticity dataset. The Kolm-Mogorov-Smirnov test for normality. And nine PNG files in a data collection. Okay. Cool. Kolm-Mogorov-Smirnov. You can check it. So, the test is significant. P value in column peer unfairly 0.5. And at least one at the end of the first line. Variants aren't homogeneous. The hypothesis of homostaticity is rejected. If the Kolm-Mogorov-Smirnov test is significant, peer value inferior to 0.5, your numerical variable is not normally distributed. The hypothesis of normality is rejected. The two tests have to be significant. So, variants aren't homogeneous and data, it's not normally distributed. And the nine PNG files appear here. And you will check with the icon for each species you have in your dataset. Otherwise, you can see Galaxy Ecology is perfect just to have a teasing of your data, to prepare your data, to customize your data. And it's very practical to discover your data and explore your data. So, come back for the LSBugDigacityData exploration tutorial. Okay, now we'll use the tools variable exploration with the following parameters. I will say to you, I will say to you. So, click on the tools part, variables exploration. That's a good orthographic specter. Okay, here it appears in the first suggestions. And we have to check some parameters. So, as the tutorial says, input table, the last one, this is it. Okay, verify it. The first line is a header line. Yes, you have to click on yes, variable is between exploration. And here, select auto correlation of when selected numerical variables, numerical value. If you remember, well, it's a six, the current six, total for abundance. And now you can execute. You will have to, you can see it's running. And these two outputs is one text file containing the auto correlation for sure value and one PNG file in the data collection, showing the auto correlation for a variable. If the bars of the histogram are strictly convenient between the dashed lines representing the interval of confidence, there is auto correlation. So, we have to write and we will check the auto correlation. We can check it. Okay, a table with auto correlation for spaces. And the variable exploration auto correlation PNG. That's cool. Here we can see there is not auto correlation. Okay. So, and for the last part of this check data part, we will test the co linearity between numerical variables. So, use the tool, the same tools before variables exploration. Okay. Well, sometimes I need a coffee, I think, another coffee. Okay, we have to check some parameters input tables. This is the good one. Verify it's yes. Okay. Now, yes, yes. And now you have to select calling it between selected numerical variables for each case. So, that's perfect. This is the first one. So, it applies directly in this order. Select column containing spaces. Space name, the column five. And now you have to select column containing numerical values. So, you can sing, you can dance, close your eyes. So, it works. We have two outputs. One describing spaces we couldn't evaluate. And one PNG file with one plot containing multiple correlation plots and the correlation value between each fireworks. So, we can check it up. This is not enough data as I said. So, we cannot evaluate it. And the nine outputs in the data collection. So, we can see the auto correlation for each case. No, no, the co linearity of each species. Now, that was a total survey date. You can download all of these images. I didn't say that, but it's very practical to illustrate your articles, data paper, technical papers, or scientific paper, anyway, or just a tutorial. And you can see in the trainings of the biodiversity exploration, some of these images. So, okay, back to our main history part. And now we will enter in the hard part. Not when I say hard, I mean, it's not difficult, but just the important part to explore biodiversity in your data set. So, in this part, I will show you, you can explore the biodiversity of your data set. So, through the abundance with the space and time of the molluscans, spaces of your data set. Here, we will visualize the abundance repetition through space. So, I can present to you a very cool tool, the presence, absence, and abundance. Okay. I go to the tool part and copy past the function you can find in your training or just type the name, presence, absence, and abundance. And, okay, this is the second one of the proposition. You have the same thing, you have to put the good history step. So, colon regress, find and represent the data form. Yes, that's it. And now, check what you want to do. So, for us, we have to use variables, presence, abundance map. Yes, that's it. Okay, select column containing latitude. Okay, perfect. This is the same name, longitude, the column three. What do you study in this, for example, mollusc of the Australian East Coast, from 2008 to 2021, I remember. From mollusc of the Australian East Coast, select column containing Texan space name and abundance, the colon six, now you can execute. We have two outputs, one with the map of the abundance through space with your coordinate and one text file to inform you about the geographical extent of your map. So, when you get, it works, just appears. So, that's perfect. That I used, longitude from and latitude from. Now, we can click on the presence accents and map this one. Oh, that's cool. It's a really cool tool because with it, you can see the repetition of your spaces from the latitude and longitude. And with the abundance, you can see on the right of this graph, more the circle is bigger, more there are important importance. And this is interesting because you can see exactly the same thing with the map, the background, but here, you can detect a pattern or maybe an interesting repetition without without perturbation of the background of the map behind this information. So, yeah, I think it's a really nice tool. Okay, so now we will visualize the number of locations where each accents are present. Same thing, we will use the presence, absence, and abundance tool, letters, formatted by the data file. Oh, this is one. Okay, click on yes, yes. And now, variables present, we will use presence count of this one for the bar plot. Okay, select colon, continue, save for, select colon, containing your separation variables. This is the site code, indexing, space name, containing evidence, say C6 for colon 6, and now you can execute. We have two outputs, one with 120 png files, one for each site, representing the number of locations where each accents are present and one text file from you about the use location. So the use location, this one, yeah, we can check it. This is the location and the 120 representation. So, for example, this one, okay, there is no so many informations on these images, but because as I said at the beginning of the tutorial, it's a subset, so just for this particular tutorial, we don't select a big data set, but for a normal data set, you will have more information of this kind of representation. Okay, back to the explosion tutorial, the main and story part. And now we'll do and visualize the raw refaction curve of your spaces. So we'll use the tool presence, absence, and evidence again with other parameters. Okay, this one, this is the good input table. Yes, raw refaction curves of spaces, size of subsamples, we can select and type and roll, select column containing spaces, spaces name, the column five, evidence, column six, total, and now we can execute. Subtitles, it will be better for you to understand my English. Okay, it works. Everything is green. So you have two outputs, one data collection, this one and one graph and one tabular file. Okay, can check it. Enter in the last part of this tutorial, the beta diversity. So here we will explore the beta diversity, which is the ratio between regional and local species diversity. And for test this beta diversity, we will use the LCBD for local contribution to beta diversity, which are comparative indicators for the ecological uniqueness of the sites and we will use the SCBD for species contribution to beta diversity, which is the degree of variation for individual species across the study area. So we have to select the tool, local contribution to beta diversity in the same tool part here. This is one and you will see this is the same tool for the SCBD. So I'll select it. Okay, we have to check some parameters. Input formatted body diversity files, so it's not rounded, it's this one. Select the column with evidences, total with location, site code, detection, space space name, date, survey date, click on yes. I'll check it in decimal degrees. A few seconds. I will start the recording just a second. Sorry. Okay, so I had to check something and also select column. Okay, we were aware for the local contribution to the diversity. Okay, select the good one, the good table, click on yes. Select column with evidences. So this is column six with location, site code, column containing taxon, space name, containing date, survey date, and here other SCBD, spatialized representation or x-yplote. So you have to select spatialized representation and select column containing latitude in decimal degrees, latitude, okay, containing longitude, longitude, okay, and now you can execute it. And just remember we forgot the box plot. So we will, the box plot in the in the check data part. So we'll do just after this beta diversity. Sorry for that. It's running, it's perfect. Okay, you have four outputs here, here, here, here. This is for the air object, for your air analysis, but so you have only three outputs. Two text files containing a table with information on the beta diversity and one text file with the list of spaces that has SCBD larger than the mean SCBD. And one data collection with page files showing multiple plots according to one type of variables in order to visualize a better diversity. Okay, okay, we can view, okay, that's it, space, which spaces as have a CD larger than mean SCBD. You can see it, the SCBD, beta diversity for space, side time, mean SCBD, and you can check it with a beautiful graph. Or if you want, you can change, but the interface and resolution of your screen to have a better information of this graph, the mean SCBD. It's here with SCBD. That's cool, no? Me, I think this is the best tool ever. So as I said just before, we forgot, we have forgotten one tool. So we'll come back for the check data part, and I will present you the presence absence with boxplot for visualizing the number of locations where the accents are present. Okay, so came back to the toolspot and type, presence, absence, and ablance. Okay, took this one, and we have to check the same parameters all the time, colon reg X, final, represent data 4, click on yes, it's yes, okay, separation by side call, containing taxon, spaces name, ablance, ablance, okay, that's it. And you will have two outputs, we have already done that. Okay, sorry guys, we did it. Okay, we did it. But we can do it again, just for the fun. Definitely, I need a big coffee for this afternoon. Sorry, but thanks to me, you can see exactly the same situation twice, like in the Inception movie or Matrix movie. As you can see, there are a lot of tools for biological analysis, for biodiversity, climate, etc. This is really interesting tools. And as you can see, I'm not a coder. I used five or six years ago, and now I don't use it because my function doesn't need it. But now, if I want to do some analysis, ecological analysis, I can use Galaxy Ecology without knowing, without a big knowledge of programming. And this is the main advantage of Galaxy Ecology. It works with a strong background of statistical analysis. Everything is checked, everything is reviewing by pair. So no worries. This is a stronger instance and stronger tool to do an ecological analysis. And as I said, it's the best tool for people without big knowledge in programming. And we have exactly the same thing that before. Yes. So we did it again. So sorry. Anyway, now this is the end of this tutorial. So thank you for watching me. And you just did a ecological analysis of biodiversity in exploring your biodiversity dataset. So now you know how to prepare your biodiversity dataset with Galaxy. Check the data inside and explore them, especially the better diversity. I hope I didn't bore you too much with technicalities and that you enjoy this training with me. And don't hesitate to ask questions on this seven chat or by email at this address, Olivier.Norvez.mnation.fr. If you have any suggestion on how to make this training better, don't hesitate to contact me as well. And thank you very much for watching. Ciao. Bye.