 Practical, or did we made it? Or did we made it open, usable, useful, and reused? Well, open first, by open, I mean, freely available and accessible without any restriction. Putting a zip file somewhere on the internet is not enough to declare your data open. And with the GBIF, we rely on a rapidly growing distributed network of data publishers. They are guided or supported by national nodes, such as Belgium by the SD platform, and supported by an efficient secretariat based in Copenhagen. These large communities are really important because we exchange our best practice. And this is how we made things open more and more. So this is a small graph that shows you, in the center, the GBIF secretariat and then different countries. On the right side, you see Sweden, Belgium, and France. And in Belgium, for the Belgium BISC platform, you see all the different institutes. The national nodes collaborate with about 20 institutions, such as universities, museum, regional, nature agencies, NGOs. And in other countries, we have similar structure that helps the local data publishers. Usable, here by usable, I mean that you're able to or fit to be used. And how does it come in our cases? Because we are describing the biolesty entities and their attributes. And we are doing that by mapping the proprietary database of the data owner into Darwin core concepts. These Darwin core terms are defined. They use closed vocabulary that are standardized by the community. And they are also evolving, of course. And these data sets are then packed into an archive file. One of these Darwin core terms is a little bit small on your screen, but is the organism quantity. So you can see that this has a definition. It has some examples and notes. So people know how to use and refer to this organism quantity. But of course, one size does not fit all. And there are many ways to express the quantity of organism. Therefore, the Darwin core has another term, which is the organism quantity type. And it allows the data owner to express different measure for organism quantity. It can be different things, number of individuals, or biomass, or any other possible terms. What is useful? I would say it's able to be used for a practical purpose in several ways. So not necessarily in the same ways as it was described by the data owner. And for this, we use not only the Darwin core star schema. I will show you in a moment, but also metadata. And Givith community has decided to use the ecological metadata language, which gives a very nice description of taxonomic coverage, graphical and time coverage. We also decided after a while to use CC licenses, three CC license, and mostly two CC licenses, that are CC0 or CCBuy. But we also let the users, the authors, give their preferred citation. And we keep, of course, the emails of the authors of the data for user feedback. So Darwin Core Archive is nothing very special. It's a bunch of data files, like you see on the left, the event core plus occurrence extensions or measurement extensions, plus the metadata and the EML metadata. And we zip all that together in Darwin Core Archive. How can it be reused? Discoring the data can be done by in multiple ways. It can be done globally in the global portal. It can be done locally or even directly from the data owner. And this can be done in various format. And through the most important programming language, it's interoperable so that the data, the interoperability of this data, we make sure that the user can make their desired queries to refine what they really want. And they get this with the in a JSON web services. So this is one example of reusing or downloading the data with the Python library of Jbiff. You can, if you prefer, and you're using error, you can do basically the same thing with error. So, but reused means also that you need some kind of citation mechanism. And since a couple of years now, we assign a digital object identifier to every data sets. And when you download the data from Jbiff, you also get your own DOI for the download, which gives a nice way to trace your usage back to the original data set that you have downloaded. And with this, Jbiff can do some literature tracking. In this slide, you see the number of peer-reviewed publications that are especially mentioning that they are using Jbiff-mediated data. So that's scientific people writing articles, peer-reviewed articles using Jbiff-mediated data. As you can see, it's rapidly growing. If you see in 2020, we are close to 1,000 peer-reviewed articles mentioning this kind. It's an excellent measure of the scientific reuse, at least, of Jbiff data. Decision-making reuse is really harder to track. But yeah, we think about that. We will talk about that. So the present now, where are we today? On these two maps, you see the data prohibition distribution in 2013 and in 2020 today. You see at the bottom, there is much more coverage today than seven years ago. And the number of publishers has kind of tripled. So that's also very good. The numbers are also quite impressive. It's 1.6 billion occurrence records coming from more than 50,000 data sets. Jbiff initiative is managed by 62 country participants and 39 organizations. And there is something like 61 billion records download per month on average. I have exactly the same statistics for Belgium and how does it compares. It's really not too bad for such a little country. We have 40 million occurrence records at the moment coming from more than 300 data sets from 20 publishers. And the Belgian scientists have written something like 400 papers that make use of Jbiff data or reuse of data. That's really good. Here you see the spread of the Belgian data by taxonomic group. It's mostly animals and plants and a little bit of everything else. But I would say it's a good distribution already. We have a Belgian BioST data portal, which is second based. It presents all the Belgian data sets and all the peer reviewed articles from Belgian authors. And if you visit this and try to make some fine grain query, you will be redirected to the global portal. The global data portal give you a very nice view on the data. It can be tabular view, gallery, map, taxonomy, and matrix. We'll see that in a moment. So that's the kind of tabular table you can see on the left that you can make your own query filters based on any of the Darwin core terms. And you can save your query and come back later to make the same query again and get more data for your query. So that's the kind of nice tools that the global portal offers you. You can also discover the occurrence through gallery because some of the occurrence, a lot of occurrence, have image or sound or video attached to it. You can also build maps. And here you see data being published by Belgian authors. You also have a taxonomical tab where you see the spread of the data in the different taxonomical group. And the matrix show various things. And I would like to show you on the bottom right, you see the pizza diagram that shows that most of the vast majority of the records being published by Belgium are in CC0 license. So it's public domain. Anybody can do whatever you want with it. You also have a country page where you can discover the data published by Gbiff and also the articles of Belgian authors. So that was the present. So now what's next and what's coming in the future? We all know this pyramid diagram with the data at the bottom. And then you have on top of that information, the knowledge, and finally wisdom. Gbiff traditionally tackles the data and information layer. But I will give you what, in my opinion, are the next step based on these different layers. First on the data layers, we know that we have to fill the data gaps. And for example, with the non-linear taxonomy, which is the kind of occurrence where you don't really necessarily have species names. But you have something like, for example, DNA sequence attached to something that happens somewhere. And that's it. The Gbiff portal is starting to deal with that. And it's getting better and better. I think also in the coming years, we will see more and more automated harvesting of data from academic sources or from journals. And we have to prepare the infrastructure because for the moment, we have 1.6 billion records. It's not that big. But the growth is really important. And with new techniques, we will see a data deluge. And we should be ready for that. We should also modernize the packaging of the data using linked data or frictionless data because Darwin core is still OK. But it shows that we all know that there are some limitations. And in this change of standards, we should find something which is more strict than Darwin core. But on the other way, also more flexible, allowing to publish other data than the one presented by standards. And on the level of information, I think we will see more and more artificial intelligence or interpreted data or even metadata. Because when you run some algorithm on the data, you can find a lot more than what a person can easily describe. So that's a very interesting future, I think. User annotation is there are a lot of trials around that. And I think it's more and more important when people are using the data that they can give their feedback, not only to the authors, but also visible to everyone. And finally, a big challenge is also to link the occurrence data with other related entities, such as people, institutions, literature, land use, ecology, legislation, and so on and so on. On the knowledge level, of course, this is probably the more important level. We know that we have to go out of the silo, of the bioregistic silo, where people know each other and we are still working on bioregistic, because this data is worth for a lot more of people than just the biologist. And probably a solution like AOSC will bring us this next level of interoperability between across different domain. Of course, the big societal challenge, like health, pandemics, food security, climate change, everything needs to be addressed with bioregistic data, or at least bioregistic data has some part to play in resolving this challenge. And for example, the Sustainable Development Goals is also something that we can tackle with the data at JBith. So what's next if we see the progress in the seven past years? You can imagine that the thing we have to accomplish is to have truly global coverage with no white countries on the map to be more inclusive with, as I said, non-linear taxonomy, but also things like indigenous and traditional knowledge linked to nature. And be part of the bigger puzzle, because JBith is not the only player in that domain. And I think the alliance for bioregistic knowledge is really the good point to address the bigger puzzle. The JBith has to adapt its funding scheme and governance because it will become truly global. It will definitely change. At the Belgium level, we need the data coverage. We need to reduce the fractured north-south fracture. And I had a discussion on a previous session on that, at least in the bioregistic domain, we see clearly that the north is having a lot more open data than the south. And we are working, let's say, every day to try to reduce this difference. To be more inclusive with the non-traditional data source and also to be part of the at our level, to be part of the bigger puzzle with the initiative in Europe like Disco, EOSCA, or LifeWatch. If you want to stay tuned and read more about that, please follow this JBith link, Tedwick link, alliance for bioregistic knowledge, and the Belgian bioregistic platform, of course. I also bundled four interesting readings for you. The first one is a recent article on data integration that enables global bioregistic synthesis. You have the GBIO, the global bioregistic informatics outlook, the 20-year review of JBith, and the last, but not least, the JBith science review of 2020. You have all that in the PDF with the reference. You want to, so if you want to read that, you have these four things. And this is my last slide. I thank you for your attention, and I'm open to any of your questions. As from Astrid, I see what is the current funding scheme and governance of JBith? At the moment, as I said, there are 62 countries that are funding JBith. So they are, in fact, taking part financially to the budget of the infrastructure. And with this financial contribution, they have a voting right to the governing board, which takes place every year. And that's how the decision are made. So it's only the countries that have something to say. Now, of course, there are organizations that have an observation status, but it's only the paying countries or the voting countries, as we say, that make the decisions. And this might change in the future. Trying to see questions in the chat, but yes, the big puzzle and different initiative. I think we are, for the moment, we are quite good in integrating all the biological aspects. But as I said, it touched two different things. It touched the data that we have. They touched the people that collect the data or that clean up the data or that does anything with the data or the institutions. And not only that, the legislation that are probably referring to species are two status of some species. All this, for the moment, is not really correctly. And you should be able, in the ideal linked world, to go from one entity to another entity without having to take care of any technical difficulties. So it must be very easy to find, from an occurrence to a species, to the red list status of the species in the UCN, for example, to a legislation in your country that refer to the species. Everything should be kind of linked to make it even more easier for scientists, but also for citizens to deal with this data. Other questions? Can you, I don't know if people can take the floor, speak, or are they blocked? I can unlock. OK, yes, if you can, please. OK, it's unlocked. Thank you. So any question? And probably even more interested from people that are not directly working in the biodiversity domain, but don't hesitate to ask any questions that you have. Auto-generated. I see a question from Inahi. Are the data available on Gbiff? Are auto-generated? What do you mean by auto-generated? They are normally, depends what you mean by generated. They are always converted to the Darwin core. But this conversion, I don't know if that's the question. They are always documented with metadata, of course. And they are interoperable, which means that you can query the data from the different source that could be, for example, coming from a museum or from a university, from a camera trap, or whatever. You can combine all this data in a query, in a very simple query, either on the global portal or through your preferred language. Yes, you can elaborate a bit more. It seems that people have problems to activate their mic. That's why we don't have anybody. Can anybody try to activate? Hello, hi, I'm Vanessa. I was able to activate it by clicking the headset icon, and then it told me, oh, you're leaving the meeting, and then I rejoin to activate my mic. So maybe that will help others. I also had a question. I think I'm a math biologist. I don't work directly necessarily with data. I'm more in the AI data science part. And I think, for me, I acknowledge that the challenge is right consistency across organizations and making everything sort of linked appropriately. But I was wondering if you had some thoughts on how do we integrate people who are outsized of this domain, maybe legislators or people who could support this initiative but from a different perspective, I guess. Because where I'm from in Puerto Rico, I know we have very few initiatives where it is open data, but our struggles are like, how do we get this to the big players that actually can enact policy? And I don't have many ideas about that. I think the first step is to be very clear on the terms that you use for the data and the control vocabulary behind that. So if you use the term, what does it means exactly for you as a community so that other people with other background can understand what you're talking about? And as I show with the occurrence, the organism quantity, what does it mean? I mean, for a lawyer, if it goes to the Darwin core, it can at least have the explanation of what scientists means by organism quantity. And this effort is not negligible, of course. It's very important because in your institution, it might be something else or in your department or in your project, you might use different terms, but the first thing is to have kind of command vocabulary of terms that you are using. Otherwise, it's just a bunch of data and you don't know what you're talking about. Thank you. Thank you. Hello. Hello, Inari. Hey, what I meant by alt-generated, was that in one of your slides, you said previously people have to manually input or something like that, that you mentioned. So I thought, as opposed to just data that automatically prone to the platform or something. So basically, it really depends on how you are publishing your data. You can do that manually, but you can do that automatically also. So we have something like, which is called an IPT. It's an integrated publishing toolkit that can, for example, connect to your database and automatically every week or if every day, if you prefer, we'll take a snapshot of your local database and publish this data to the JBS network. That kind of automation can be done, but none of the data are generated by kind of super data generator. So it's always data coming either from a machine or from a human observation. Thank you so much. I have another question. I'm an artist based in Rotterdam in the Netherlands and I'm currently working on a project involving an invasive alien plant. And I mean, specifically working with a certain plant and that's coming from water catchment areas of the Netherlands because they're unpolluted. Sorry, I need some access to unpolluted sort of patches of this plant. So yeah, I was wondering if GBIF can be useful for obtaining the info about where they distribute it basically? Yes, basically that's one of the first basic questions that people often ask is what is the distribution and the answer that JBS can give you is not to see the exact distribution area for a species, but at least to see all the occurrences that were observed in the past for a specific species, which is kind of the distribution area, but it's not equal to the distribution area because maybe there are some era where people did not investigate or report any observations and therefore distribution of a plant or more specifically of an invasive plant which can change very rapidly is not something you will discover like that on JBS, but you will see all the occurrence even the most recent one published through the network. And for example, in Belgium, we have a triage project that was doing that for gathering, so putting a data flow of all the observation of invasive species on JBS network as fast as possible. It all depends what species you're looking at and how the Netherlands are putting an importance of the occurrence of these species and if this is coming to JBS network, but I would say make a trial and see what it gives you for the Netherlands. There is also a question from Leonard in the chat. In the chat, yes. Does JBS already include or are there specific plan to include vocabulary in the JBS metadata? The JBS metadata is currently following EML which of course contains, as I said, it contains different fields or sections of fields and some of these, for example, you can put of course keywords on your data sets. You can put a geography coverage when you can say or this kind of data is coming from the Netherlands, Belgium and Germany and in some part of the metadata, I'm sure you can add terms coming from specific vocabularies that you use for that. For example, the taxonomy or other parts of the metadata you can specify your own specific vocabularies. Yes, so yes. Does it answer your question? More questions or please, I hope you enjoy the session. I will stay here for the same 10 minutes or a quarter an hour if you want. I think Astrid can give you access to the PDF version of my slides. That contains a lot of links. It's there, I don't know if you have to put a link in the chat for that or something like that or it can be done maybe. Yeah, okay, it's there. Thank you, now it should be possible for participants to download. Yes. Mm-hmm, okay. But if not, I think for all the sessions you have kind of repository of the slides, so don't worry. And the recording of the session. And the recording, but yeah, the recording you don't have the links. No, that's true. It can maybe someone confirm that you can download the slides. That would be helpful. I don't see how it. I can't seem to do it. Okay, because I click to allow it, but it's good to know that's something I have to check. Where would the button be? Can you click just on the presentation itself and then get a cloud 68? They are helping us with the infrastructure and they are now typing, so they will have the answer. Okay, great. Okay. It has to be on somewhere else on the... Are you planning to do that for all the sessions on the website or? We were planning to add the recordings for all the sessions, but then I will add your slides as well. Yeah. And then this is Leonard from Bliss. I have a quite specific question on my previous question. And since this session is more or less finished, maybe I can ask them. So GB is using EML as metadata standard, right? Yes. I think in EML 2.1 you can include vocabularies, but it's not recognized as specific vocabularies and only in EML 2.2 you can link to external vocabularies and these are used as external vocabularies. But as far as I know, as GB is not upgrading yet to EML 2.2, is that correct in other places? Yes, it has to, you have to wait a little bit here. It's still not, it's still in 2.1, if I remember well. And is there a plan or a timeline? I don't have that in mind for the moment, sorry for that. Okay, no problem. That's basically one of the most severe limitation that I see for the moment in the data sets is that you cannot easily link to other data being elsewhere in another data set or in another initiative outside of GB. And that's the thing that you can expect to change in the near future. Okay, perfect. We are working on this and checking if you can upgrade our own systems to EML 2.2. So that will also be useful for our data flow to GB. So another question. If we have EML 2.2 in our IPT, is that possible and will that pose problems to exchange data with GB? If you have for the moment 2.2. Imagine that we upgrade to EML 2.2 next week or so, or is that just not possible in IPT? In the IPT, as far as I know, it's not possible, but you should probably wait the next version of the IPT for that. Okay, thanks. Is it okay, André, if I stop the recording of the session? Sure, for me, it's fine, yeah. No problem. Practical, how did we made it?