 Cool. Hi, everyone. Thank you so much for joining our webinar. My name is Tian Gan. I am a postdoc working at the CSDMS team. And this is Mark Paper. Do you want to introduce yourself? Sure. Hey, I'm Mark. I'm a research associate at CU, and I work as a research software engineer at CSDMS. Cool. Yeah. Today, Mark and I are going to talk about data components. And can you go to next slide, please? Yeah. In our presentation, we will first introduce about the CSDMS workbench and one of its major elements, which is basic model interface. BMI actually is the foundation of the data components. And then we will introduce some existing data components, which are created by our team. Followed by that, we will show the general steps to show you how to create your own data components. We also want to talk about the CSDMS and HydroShare capability. This capability will help you to share your tutorial different notebooks for your data components with the others, so they can easily run them and learn about how to use your data component. Finally, we will give a live demo using an example soil-grace data component. And Mark is going to give a summary of today's webinar. So now let's get started. All right. Tien, you're very brave for doing a live demo, but it's going to be great. Thank you. All right. So before we get into the idea of data components, let's give some context. All right. So I want to start with the idea of the CSDMS workbench. So this is a set of tools and libraries that are produced by the CSDMS integration facility as well as with help from the CSDMS community as well. The idea is tools, libraries, software for building models, for running models, for coupling models. So three elements of the workbench are listed along the bottom here. So land lab is a model coupling toolkit. It's also a toolkit that you can use for building models in Python. It's really cool. It's the most popular element of the CSDMS workbench. On the other side is PMT. This is also a model coupling toolkit. And it's a little different in that it's used more for working with models written in other languages. So through PMT, you can couple models written in C, C++, and Fortran as well as Python because PMT is a Python package. So just as an aside, we're actually looking at incorporating PMT into land lab so we would have just land lab. Land lab would be our model coupling toolkit. But that's a little bit in the future and not quite the topic here. But I'm excited because I think it's a neat idea. All right. So anyway, there's two coupling toolkits are listed. In the middle is BMI, the basic model interface. And this is the technology that underpins both of these toolkits. So what is BMI? BMI is the basic model interface. So all that a basic model interface is a set of functions. These functions are a little special in that they have prescribed names and arguments and return types. But the neat thing is that these functions have the same names and arguments and return types across languages. So the idea is that you could write a model and it could have its own interface, whatever interface you design for accessing information in the model for running the model. But you could also put a BMI on it and then you'd have a standardized interface for accessing and running your model. So that's kind of what a BMI is. I like to start people in learning about BMI. I like to point people to our documentation. We tried to be really careful about writing up a nice description of what BMI is and why it's important and how you can make one. So let me risk the whole webinar by clicking on this link. Okay, good. All right. And maybe, Tien, if you could put the link in the chat. That'd be really cool. Okay. Thank you. All right. So this is the document patient page for BMI. And it's long. We've got a lot of information here. What I wanted to show you was this table. All right. So this table of BMI functions. So this is what the BMI is. It's about 30 functions. Now, you don't have to add all of them. Some of them depend upon what sort of grid you're using. But there's 30 functions. It's a bit of work. You can see names like initialize. So the idea of initialize is it would start a model. You can see update. That advances the model by one time step. There's finalize that's used to stop a model. There's also getters and setters. You can get values or set values in the model. All right. So you can see they have pretty readable names as well. All right. Let's go back. Let me get a quick drink. I'm a little too far. There we go. All right. So I've given you kind of a high-level overview of what BMI is. It's a set of functions. But my next bullet here is what is the benefit of a BMI? Why would you want to include a BMI with your model? And to do this, I've put a picture of my wife, Subaru, on the side here. It's going to make me think. So Tia, I have a question for you. You come to the office. We practice this, so she's smiling. When you come to the office, how do you get to the office usually? I usually drive a car. Oh, okay. So where did you learn how to drive a car? I learned that I'm back in China and got my first driver's license in China. Okay. Cool. All right. So the thing is that you're in the States right now and you're able to drive a car just fine, right? So that's kind of cool. So the neat thing is that Tien has benefited from a standard in the sense that cars have steering wheels, cars have accelerator pedals and brake pedals. So there's a standard. So the fact that Tien learned how to drive a car in China and was able to come to the States and drive a car just fine here in Boulder, Colorado is really because of standards. And so BMI is a standard. And the idea is that if you have the same set of functions for any model, it makes it easier. Once you've seen one BMI, you've seen them all because it's the same set of functions. All right. So this is the benefit of a BMI. And I think I garbled that a little bit, but I promise we can talk about that more. And again, please look at that documentation. I think we did a nicer job writing it up there as well. All right. So there's been so much prelude to the content of our webinar. So my last bullet is here. Well, how can a BMI apply to data? This is the basic model interface, not the basic data interface. Well, the thing is, if you think about it, a data set has many analogs to a model. So for example, I think of the initialize function. Initialize is used to start a model. Initialize could be used analogously to open a data set, to open a file. Finalize could be used to close a file. We could use the update function to move to the next time slice, for example, in a file. We could use the get value function to pull out the data from a given time slice. So there are many direct analogs between a model and a data set. And BMI, we think, can apply here. So this is the topic of our webinar, then the idea of a data component. So the way that we've imagined and the way we've implemented this idea of a data component at CSDMS is just a Python package. And it provides access to a data set through a BMI. Now, the data set will have its own API. And it may have its own, it'll have its own details, and everyone's probably different. But because it has a BMI, they'll be the same across different data sets. So we've done some work on this. Let's see what we've done so far. So I'm going to click a link again. All right. And Tien, if you could please put that in the chat. Awesome. I'd like to put it way ahead of me. Thank you for doing that. All right. So you can see on the CSDMS page, we have a page of data components written by mostly by Tien. Tien's the expert. I wrote a couple because I looked at what Tien did. We've got a contribution from Rich in the community, which is awesome. All right. So just to give a higher level look at this, you can see that there's a number of particular data sets, as well as some broader data sets as well. Each of these would have their own API. If I had to go in, and well, Tien will show this, but if I had to go get some ERA5 data, for example, for VCMWF, it has its own interface to access. I don't know what that interface is right now. I imagine it's probably NetCDF. There's probably something wrapped around NetCDF. But I don't have to know because there's a data component, all I need to know are the BMI functions. I know initialize, finalize, update, get value, for example. So I don't have to even know that it's a NetCDF file or a GRIB file, for example. All I need is the data component with its BMI. All right. So these are some of the ones that we've written so far. Let me go back. Oops. There we go. All right. So data components then are also a part of the CSDMS workbench. So we didn't mention this in the beginning, but just to circle back to that first slide of ours, data components are going to be a part of the whole ecosystem that we have at CSDMS for building, running, and coupling models. All right. So now I'm going to turn it back to Tien, who, again, is the expert, can talk us a little bit about how to create a data component. Thank you, Mark. And thanks for talking about the basic model interface and showing some data components. Actually, we hope the community can also contribute your own data components, especially when you have a large data set and you want more people to use them and or couple your data with other models, which is under the time team modeling framework. You can also write your own data components. And today I'm going to use the soil grade data component as an example. This data component fetches the global graded soil information from the soil grid system. And this system provides the spatial distribution of the sum of the soil properties, such as the bulk density, clay, sand, and soil content, something like that. But they have some more other soil properties. Can you go to the next slide? So when I create the soil-grade Python package, it includes the soil-grade.py file. In this file, it has a class that downloads the soil data sets from the soil-grade system using their web coverage service. And there is also a BMI.py file. And this file includes another class that wraps the class from the soil-grade.py file with the basic model interface. After I finish the soil-grade package, I use the Bibleizer to run over this package and generate the PyMT soil-grade package. Mark, would you want to add more and talk about what Bibleizer is? Oh, yeah, that's good. So we talked about PyMT already. The Bibleizer is another tool that's in the CSD Most Workbench. And it's used to wrap a BMI-wrapped model or data set in any of our supported languages, like Python, C, C++, or Fortran. And it makes it into a Python package that you can import in PyMT. So the Bibleizer, we actually have a paper in Joss. It's under review. It's almost done, so it should be out pretty soon. Okay, cool. Oh, can I just add one thing? One thing I want to mention about that, actually, you can directly use the service Python package to download the data if you don't need to couple your data sets with some of the model components under the PyMT modeling framework. Okay, next slide. So after you create your data component, you may want to create some tutorial notebooks and share with others and help them easily run them and learn your data component. And the CSD Most at HydroShare capability can help you achieve this goal. So I want to provide a little bit of background about what HydroShare is. So HydroShare is a web-based hydrologic information system for people to share their data models or tools and collaborate and solve research problems. Next slide. And in HydroShare, you can put any kind of files for sharing. And there are some additional data functions for several data types, such as time series, geographic feature, or geographic raster, and also multi-dimensional space time data. HydroShare also provides data publication functionalities. When you publish your data or models or even your GPU notebooks, you can get a DOI and cite it in your research paper. Next slide. So in HydroShare, there are some social functions to encourage collaboration. One of it is the resource access control. So when you use that, you can only choose the trusted HydroShare users to get your data. But you can also put your data sets as public so anyone can discover and access them. HydroShare also supports several web applications for data analysis, visualization, and modeling. And one of the examples is the quasi-duper hub, which is shown in this figure at the bottom right. In quasi-duper hub, on the left figure, you will see in the quasi-duper hub, there are several server options. One of them is the CSDMS workbench. In this server option, we installed the PymTN LAN lab, which is from the CSDMS workbench. And we also have some scientific Python packages installed to support analysis and realization, such as Mapwell Lib, X-Array, and NumPy, and some other Python packages. So we also put many tutorial notebooks for PymT and LAN lab for people to discover them from HydroShare and run them using the quasi-duper hub so that they don't need to install anything and directly learn how to use PymT or LAN lab. So now I'm going to give a live demo. Finger crossed and then hope for the best. Be perfect. I will stop sharing. I will start sharing. Can you guys see my screen? Yep, it's all good. Great, cool. So here's the HydroShare homepage. If I go to discover and type soil grades, and you will find out the Jupyter notebooks for this data component. And this is the resource landing page for the Jupyter notebooks of the soil grade data component. You can enter the abstract information, the keywords, and also those are the resource files. One is for the soil grade Python package and one is for the PymT soil grade Python package. Those two are the tutorial notebooks for them. And if you click on open with and select quasi-duper hub and remember to select CSDMS Revenge because only this eruption are installing the correct Python packages for you to run those notebooks and click on start. This process really takes two to three minutes to load. So probably you need to be a little bit patient. I'm just going to sit here and stare awkwardly at the screen waiting for it to load. Tien, we were supposed to rehearse some small talk. I forgot to do that. Yeah. So, oh, okay. Yeah, so one thing maybe I can mention later. Okay. Yeah, so I will open the soil grade Jupyter notebook. Okay. In this tutorial, it includes three sections. One is the brief introduction and package installation. The second section is to show two examples how to use the soil grade data component to download data sets for realization. The third section is to guide you right to your own code and download different soil property data sets. The first step is to install the soil grade Python package. And in a quasi-duper you are allowed to install your own Python packages. This is very helpful because when there is a new component, whether it's a model component or data component, we don't need to update the CSDMS workbench server option. You can just add a combined for package installation in your Jupyter notebook. So, in the second section, it will show two examples. As I have shown in the soil grade package, there are two Python files. The soil grade class is in the soil grade.py file. And the BMI soil grade class is in the BMI.py file. The first class is designed for users to download data sets. And the BMI soil grade class actually wraps the soil grade class with a basic model interface. So, the first example is to use the soil grade class to download the data. In this class, it includes the get coverage data method to access the soil property data from the soil grade system. And the first cell actually is trying to download the soil page data for the study area in Senegal. If you want to learn more about the details of the different parameters, you can click on the parameter setting link. It will show more details about that. So, let's run the first cell and download the data set. Okay. The second cell is trying to show the map data information. And you can see it shows the variable name, units, and the corresponding service URL information. And the third cell is to make a plot of this data set. Okay. And the second example is to use the BMI service class to download the same data set. Actually, BMI are not designed for people to use. If you want to learn how to use the data component under the PymT modeling framework, you're welcome to try out with the PymT soil grade Jupyter notebook. But here, I want to show you how to use some of the BMI methods to access the data set as well as the map data information. When you use the BMI, the first step is to use the configuration file and to initiate a data component. Actually, this step is trying to download the data set from the service system. Okay. If you have interest, want to know what is in this configure file, you can show that. So, it includes the parameter information, which is exactly the same that we have seen from this method. Okay. This cell actually is using the variable related methods from the BMI soil grade class to check the variable information of this soil data set. It's actually like doing the retrieving the map data information of the soil data sets, like variable name, unit, location, type, and its associated variable grade. And this helps to use the grid related methods of the class to check the grid map data information. Like the grid rank, size, the shape, spacing, and also its lower left origin coordinate information. And this type of step is using the GAT value method from the BMI method to retrieve the soil grade data set. The last cell is to do the visualization of the data set. So, this is exactly the same as the one we have seen from the example one. The third section is to guide you, write your own code, but I'm not going to show the details. Only one thing to mention is that when you write your code, actually, you can double click the section to check with the answer to see if it is same as the answer we provided. Okay. So, here is the PIMT soil grades. I won't go into the details, but I want to mention if you look into the code, you will find out it's very similar as what I've shown for the BMI soil grade class to access the map data as well as the data sets. Yeah. And you will see the plotting are same too. I think that's all my demo. I will stop sharing. Okay. Yeah. And I can start sharing again. Okay. While I'm doing this, one little note is that, one little note is that, you saw in Tien's notebooks, when she used the API from soil grids, it was only a couple lines, but when she used the BMI, it was several lines of code and that's kind of the trade-off. Maybe the BMI, because it has to be more general, it takes more lines of code in order to do the same task. Another little interesting thing too is that because the BMI is written in Python, the BMI and the PyMT version of the soil grids component are very similar. Like, if you inspect Tien's BMI notebook and the PyMT notebook, you'll see code that is rather similar. Again, and that's because they're both Python. If we had written the data component in C, for example, then they'd be quite a bit different. Yeah. I also put the hydro shear links for LaLa, PyMT and also soil grids, Jupyter notebooks. That is the information I want to talk about while loading the container, but since it looks faster, I just put it here. Yeah. So fast. Yeah. I did a little bit test before the webinar. That is why. All right. So let's take a look. Here's a summary of what we see today in the webinar. All right. So one, the concept of BMI, that's basic model interface, can be extended to data sets and it works pretty well. It's really kind of a neat idea. Okay. So two, a data component is a Python package. And I should mention as well, we know we choose Python because we basically chose Python as our hub language at CSDMS. We could have done this in other languages, but we have just decided to standardize on Python. All right. So a data component is a Python package that provides access to a data set through a BMI. The data set will have its own API, but we're not going to care about that as a user. We can just use the BMI instead for simplicity. All right. Three, data components can be coupled with models in a framework such as PyMT or LandLab. Now, we didn't show much of that today, but that is possible. And then four, we have developed a set of data components. Mostly Tien has done that. We've also had some contributions to the community. Yeah, Rich. But you guys can make this as well. And it's really cool if we have a big collection of data components, we'll make it easier for other people to access data and use them within a coupling framework. All right. So those are the takeaways from today's webinar. Tien, is there anything else you can think of? Yeah. I put some information in the chat, especially for people who have an interest to create your own data component. So after you finish that, you actually can create a VikiLab from our CSMS website as the lab information and help the students and other researchers to learn about your data component. And we also have the model rappel. When you have your new data component available, you can actually register your data components in there. And they will be shown on the list in the future, like when we have shown the country, we have seven. Hopefully, we'll see more from the community. Yeah, that's my note. Cool. All right. Thanks, Tien. Right. Is that it then? Yeah, I think so. All right. Thanks, everybody. Thanks for watching. We'll stick around and we'll try to answer questions. If you'd like, we're going to try to answer questions. Okay. If you'd like, you can ask questions in the chat or you can just unmute your mic and ask us directly. Yeah. Can we ask questions, please? Yeah. Okay. Thank you, Tien, and Bob for this nice presentation, to be honest with you. So my first question to you, is your code limited to a specific data source in order to download the soil properties? Or you can download the soil properties from any website? And what kind of the soil properties it's like, for example, I mean, what is the type of the data? It is a graded data or it's boring data, and you consider in your code in order to make interpolation between that in order to produce a map-based soil. So this is my main question for you. Okay. Thanks a lot for your questions. I will try to answer the ones I remember. If I miss someone, you can just remind me. First, for the soil data sets, this data component is only working on the data sets provided by the soil grid system. I can put the systems link to you. But if you want to create another data component downloading the soil data sets from other system, that would be totally fine. Or if you have your own soil data sets and you want to create a data component, that would be fine. So there's no limitation, what kind of data you want to wrap with as a data component. And specifically for this soil property data set is raster data. So it's covering the whole global scale. They have more soil property data sets and not just limited to the ones I have mentioned. I can send you the link and can check into the details and see if there are some properties you have interest in. Any other questions that I missed? Or do I answer your question? Yeah, you answered my question. Thank you, Thayan. And to be honest, why I'm asking this question, because you know, we are using different types of the hydrological modeling, particularly at continental scale, like the water, hydro, water flow, whatever. And you know, from each model to another model, maybe the soil properties, I don't know the shape of the data, the type of the data, it could be different. So the question that for example, if I want to use or create my own data for the soil properties in order to feed the water hydro model, is that enough? Or do I need to look at other data sources in order to obtain all the data that I need to feed the water hydro model? For example, while hydro model is one example of the hydrological model. Or does this data that you are already or the tools that you created, we can obtain all the soil properties that the hydrological modeling needs? So it depends what kind of soil properties the model requires. And then you need to look into the soil grid system and see if that system provides the corresponding soil property data. Am I answering your question? Yes, yes. Thank you, I appreciate that. Thank you. Okay, I will put the soil grid system information in the chat. So you can check whether your data is available to be used as the input for your model. And yeah. And then you can use the, if they have the corresponding properties, then you can use the soil waste Python package to download the data set. I see there's a question from Hunter in the chat as well. I think I can take this one, Tien, you can back me up on this if you'd like. So Hunter asks, would you suggest model development in land lab in another repository such as GitHub instead of HydroShare for version control? Or does HydroShare works similarly? So I think the answer to the question is, yeah, if you're going to do some development in land lab, it should be in a GitHub repository. HydroShare would be more of a place for you to host the result, to host running the model. So yeah, is that sound good? So yeah, development in, and use a GitHub repository for your source code, but use HydroShare then in order to distribute your model and to let others use it. Yeah, or if you, after you finish one land lab component using GitHub and maybe you want to create some Jupyter notebooks and show how to use your land lab component, then you can share your Jupyter notebooks in HydroShare and others can easily access, discover them and write them on Quasi Jupyter Hub. Okay, thank you, Tien. Yeah. Okay, anyone else? Any other questions? Okay. Well, thank you very much for your tending, right? Yeah, thank you so much. All right, bye-bye.