 Hey, hi, welcome. I'm Mark. I'm a research software engineer at CSDMS And today I'd like to talk about some of the key software technologies that go into building the cyber infrastructure for CSDMS Before I begin, I want to thank the organizers of this conference. It's awesome. Thanks Boris for introducing me I also want to thank the Colleagues of mine at CSDMS so Eric and Albert and then and Lena and Greg right So what is CSDMS? So the community surface dynamics modeling system. It's an international organization In the past 10 or so years, it's grown to be over 1700 members. So CSDMS was founded for the purpose of Supporting the science and in particular the modeling of Earth surface processes So on that foundation three pillars have arisen three pillars of CSDMS so community computing and education and I'm going to focus today on the computing pillar and by the way CSDMS is a Fierce advocate of open source so open source software open licensing open development. So thank you great for letting me steal your slide right, so when CSDMS was in its design phase Jai Savitsky and others decided on a bottom-up Unity-driven this organization that they're great together right, so the idea was step one that Researchers in the community would write models or surface processes These models would be written in whatever language they felt most comfortable with All right, so that's great So you can see in my diagram that there are Models that are little models that are big models are written in Fortran models C. Maybe Python, etc All right, so the next step, this is the tricky one A modeling framework Developed that would allow these models to communicate with each other and exchange information This is the concept of coupling that you're all familiar with Please indulge you for a second. You know the idea could be Maybe a more concrete example is that you have a river model and a landscape evolution model and Perhaps the landscape evolution model would have an uplift event that would change the course of the river But the river would erode the uplifted landscape and transport sediment downstream Couple that with a delta model for example, etc. You can see how this works All right, so that's the second step and then the third step then is hopefully with this novel approach We could advance science the science of Earth-surface process All right. Well, so that's second step those question marks. That's kind of the key So this eventually became the CSDMS modeling framework And so what I like to talk about in this talk is a story Story of how we got from question mark question mark to the CSDMS modeling framework and I'm going to tell this story by Presenting problems that were in the way and then the software technologies that were used to overcome these problems All right, so the first problem How could we standardize access to models? You can imagine looking at this diagram that these models written in different languages They probably have different calling syntaxes different configuration of aisles different boundary conditions the mess All right So if we can find a way to standardize the way that we call a model It'll make it a lot easier both on us as people if we're trying to write code in order to couple models Also for machines if you want to automate that process of model coupling So we solve this through the basic model interface or BMI grab my notes here right, so BMI provides a standard interface for a model and let me Step off first a second and talk about the idea of an interface. So computer science has a set meeting So you can imagine an interface. It's like a template All right, so we have a bunch of functions each function has a name and it has Defined inputs and outputs However, the interior of the function is empty Nothing there yet when we implement a BMI. That's where we actually fill in the code inside that function So the interface is kind of empty, but it has set rules for what the function names are with the inputs and outputs are Just to give a little more concrete example BMI has a Function called get time units. That's a really easy one All right So I think get time units doesn't have any inputs and the output is a string With whatever unit the model uses so days seconds years whatever All right, so you can imagine that a model developer could just you know say return days Or maybe they query their model and ask for whatever units it's using All right, so that's kind of the idea behind BMI right so BMI has functions or Setting up and airing down a model. These are initialized and finalized. We see this a couple times in this talk It has functions for advancing the state of the model through time For example It also has functions for getting and setting variables from the inside of a model As well as other helper support Right so when a researcher writes a model they can implement a BMI for So you can think of the BMI is sitting on top of the model the model doesn't care about the BMI You can still run the model without my but by adding a BMI This allows you to add your model to the CSD this modeling Maybe just to go a little further Why who cares? Why would you want to do that? Well, you know if you have your model in the CSD Part of the community your model could be picked up by someone else Some of you don't know we they could use it for idea for a purpose that you had not imagined So it's kind of advantageous as a And so just a couple last words so BMI is a design pattern goes to figure with this idea wrapper And the neat thing about BMI That once you've seen one you've seen them all Same functions to be initialized whether it's in portran C Python or whatever Okay, so I Have a little sketch tries to show how BMI would work inside the CSDMS modeling framework So imagine I have a pair of models Or simplicity for this example. I'm going to assume they're written in the same language Come back and check that later So even though they're written the same language they could have very different calling syntaxes Figurations be ones a procedural They won't necessarily make it easy to pass information for one another if we add a BMI See only get this right here. Hang on. I Have words. Okay, so you can see that each model now has a standard interface and it's represented by the Suggestively congruent puzzle pieces that I've used and so for simplicity for this example, let's also assume that the Puzzle pieces the BMI is are written in the same way. No, I mean, but they don't have to be in the same language as model Actually, like maybe you know models could be in or trans Emi's could be in C. It's actually not very hard to do But at least let's assume they're in the same Imagine these models could actually be coupled through their BMI's Puzzle pieces fit together with a very satisfying click All right, so Sorry So This does work actually I've done this before my work It's only for a narrow set of cases though So it's it's so in theory we could exchange information Between these two models just through their BMI's There's more here with more detail. So this gets to our next problem So how can we ensure that the information that we exchange between couple models is the same Solution to this is CSDMS standard names And let me try to motivate this a little better for example so let's say for example that I have a model and it models something in the atmosphere and One of the output variables is temperature Great, right. So that's surface temperature That dry bulb temperature There's a virtual temperature. Maybe it's potential temperature. Is it three meters? No, it's the surface Maybe it's a 10 meters. Is it instantaneous? Is there some averaging happening? So you can see what I call temperature in my model May not be the same variable as what other people call temperature in their model This is where CSDMS standard names comes in handy Let me check my notes really quick. Here's I have some good words to use here So CSDMS standard names Use a template for creating unambiguous and easily understood standard variable names according to a set of rules a Grammar the grammar of CSDMS standard names is we have an object in this case. This is the atmosphere at the bottom it has a quantity temperature I'm using and Optionally operations that could be performed on that one Doesn't quite my example, but I wanted to show the example of an operation So this creates an unambiguous name for the variable that I want to use Now model developers don't need to use this inside their models. That would be awful. I could still use The capital T underscore DT represent this variable in my model But in the BMI that I write I would use these unambiguously communicate what variable I want Then the idea is in the BMI you'd have some dictionary mapping between your internal variables in your model and the variables that are exposed to the wider public Right. So that's the idea of CSDMS standard names Oh, and Scott Peckham like one of the architects of CSDMS also has several papers Standard names. He goes into more detail. Okay, so we solved this next problem So now we standardized standardized interfaces for models. He standardized the way between information next problem is How can we couple models are in different languages? So this is pretty big one. This goes back to my One of my early slides and where I had all those models of different shapes and sizes different languages So so far we haven't addressed that The way we would do this in the CSDMS modeling framework is through Babel. I Have an asterisk next to the name Babel because We currently use Babel in the CSDMS modeling framework. However, we're looking at moving away from it I'll explain a bit why a little bit later So Babel though is a very cool piece of software So it was developed as a DOE project At Lawrence Livermore National Lab and a couple other collaborators elsewhere The idea is is that with Babel It supports five languages. So C, C++, Fortran, so Fortran 77, 995, 2003 Java and Python. So those five languages The idea is if you give it Code in one language, it will spit back out wrappers for all those other languages The thing like the theory I could give it Java code Give me back all these wrappers and I could use a Fortran 77 wrapper around Java. It's possible with Babel so we only use Fraction of the functionality of Babel. So we use the Python wrappers that it gives back to the CSDMS modeling framework So we accept code written in those five languages We'll come back to that in a second as well. So we accept the code in those five languages and get back Python All right, so I have another little diagram showing how we would use Babel All right, so Imagine that we have models with BMIs. In this case, you can see model A and model B. They're in different languages because they're different colors Right further A and B both have BMIs The BMIs are in different languages as well. You can see they're in different colors I tried to make them different sizes. So you couldn't fit these puzzle pieces together very well If we run them through Babel, they become Babelized components Right, and I love that we're Babelized. We can use that word pride And so you can see that the orange that we are Python wrappers around these They look like they fit together very nicely satisfying click All right, so again, you know, we could use Babel to try to couple bottles written in different languages. So that's cool. And again, this works But in a narrow set of cases maybe a little wider than just BMI is but still a narrow set of cases All right, so what we need to think about next is How can we put all this together? So we've got BMI. We've got standard names We've got Babel. How can we get all this together and also solve some other problems? So all right, so Some things to think about even though I've you know coupled A and B There are some kind of gnarly issues that are still left things like time interpolation Know what if model A had a time step of four and model B had a time step of five How we're going to pass information between them in the update things like grids Model A has a nice rectangular grid, but model B is an unstructured mesh things like units maybe The time step in A is in days and in B it's in decades or something for example right, so there's still some naughty issues that we need to work on and These are included inside our solution the Python modeling tool to get the PyMT right so PyMT allows a user to access and interact with Babelized components through Python. This is really cool. I'm using Python now. It doesn't matter what the source language is So they get it handles details like time interpolation I mentioned this you know through sci-pi interpolate as well as some custom code that Eric's primarily written Grid mapping through ESMF, ESMF grid mapper Units through UD units, which is from unit data at NCAR And also output to net CDF through X-array and net CDF4 right so you can see in my little sketch I have my babelized components, but I've broken them apart again, and I've introduced yet another layer of middleware Which is kind of funny. We get all these layers in order to get these models wrapped correctly Right, but we can actually do this. We can run Python and access these models So on the right of side of the slide I have some sample code for running a model in PyMT and I'm kind of proud of myself I didn't put any code in until now. This is talking about software, you know I didn't want your eyes the glaze over Right, so even if you're not familiar with Python, it's kind of not hard to understand You can see, oh should I try this? You can kind of see that green dot. All right, so You can see that I'm importing a model called sedflex3d from PyMT Actually from PyMT's set of components that it encapsulates Sedflex3d by the way is written in C. So we're actually running a C model right now through Python So you can see we create an instance of sedflex3d. It's called model We then call setup on that model Setup is actually a method that belongs to PyMT not to BMI and what it does is it Provision the configuration for a model. So sedflex3d for example has a bunch of configuration files and sample data files That works with their setup through the setup method So the last three statements though, we can see those are just pure BMI methods So initialize which starts the model it puts it at its basic state And then in a for loop we update the model ten times in whatever its time step is Finally we finalize the model at the end So this is exactly the code that some would write you can put yourself in the driver seat You know if you had a model and you BMI'd it This would be the kind of code that you would write inside of the CSC MS modeling framework to run your model Coupling models requires a little more code that I can put easily on a slide. So I actually have a demo that I can show Chad gave me the Not a good look what I asked him if I could do this here in my talk So I'm not gonna do it here in my talk But there's a break I think after our session and so I'll set up in here somewhere with my laptop But anyone who wants to see an example. I have a couple of notebooks. We'll actually run. Okay so Check my notes really quickly here. So There's still lots of work to do and so I Have identified five of a zillion things that we could still do in CSC MS So, let me first start with Babel So as I mentioned earlier Babel's what we currently use we're looking to go away from it. The reason why is Babel is a heavyweight piece of software. It does so much more than we actually need for our purposes It also is a little bit limiting in that it's kind of bound to Python 2 which is going away soon Paper Python 3 it also doesn't support other languages that would be really helpful for us in CSC MS languages like our Julia and net logo, for example so It would be nice to instead Not use Babel anymore, but instead we're working on a homegrown solution Babel also has Relevance for the fourth bullet So we want to decrease the time it takes from Having a model with a BMI and converting it into component So the way that the process works currently is a research in the community would write a model They'd add a BMI to it and then they pass it over to Eric and me here at CSC MS And we would babelize it and make it into a component. That's kind of a lengthy process you know we want to be able to democratize this idea of making a component and also speed up the process we Put the XNA on Babel then we'll have a faster way to do this All right, so coupling with data so there's no reason why a BMI couldn't work in the data file You can imagine a net CDF file. I could call initialize and that would open a file. I could call You know get value and pull out a slice. I could call finalize and close the file. So we're looking at that as well Also modeling in a geospatial context, you know So right now there's an explosion of geospatial data and we need to be able to allow Modelers in our community access and use to see their models in the CSDS modeling framework Finally, I think I've heard others say this as well. We need better documentation examples Frankly, we need better marketing I mean we have these cool things that have been developed and we haven't let our community know well enough How to use them and that are there even though we need to work on that Before I finish just quickly links if you download this presentation You click on the links and go to the different software technologies that I talked about in the talk Basically CSDMS.color.edu is a good starting point. All right. So in summary CSDMS provides a cyber infrastructure that allows community members to Run a couple models Okay, thank you very much for I Have a brief question. Can you give us an idea of how many kind of models? That's a great question. So Let me back up just a half a step so CSDMS also has a model repository anyone can submit a model and basically The help page so it's a fair I can't read back in there, but it's findable accessible informative things like that All right. So a subset of those models are currently CSDMS And we have about I'm gonna guess I don't have exact number like 30 years out 30 or so of those are included in CSDMS And it would be nice if we could reduce the friction between Bottle's in the repository which we have like over 220 of questions Thanks Mark. It makes me proud to be part of CSDMS Um, so a lot of the talk in this meeting has been about coupling of surface process So, is there any reason why a tectonic model couldn't be wrapped? Is there any reason why a surface process model or sorry, is there any reason why a tectonic model couldn't be wrapped with a BMI and Then babelized in the same way that a surface process There's no reason why I tried to make my talk as generic as possible a model a I know there's a lot of Flexible enough that the nice thing is Eric and working here. We can help you to help make These are always fit in a nice little box. Hi Eric Middlestead University of Idaho So in that vein Can your modeling toolkit can the babelized components handle parallel processing? That's a good question So I think the direct answer is that no so they aren't the BMIs are not Parallelized models themselves can't be we don't have any problem with that model a could be parallel model d could be parallel But we don't have a tight coupling parallelized models so If the model a is running in a parallel model B is running parallel and they happen to be on the same processors do that I mean you would need all the information to come back to masternode Yes That's a serious problem and others have pointed out that that's a serious problem. I can answer that so in my experience So a is currently what is the balance between surface process components versus other Spheric I think that some of these components are not really different than the geodynamic model what can We learn from that in terms of how to Dynamics Hey, I'd say most of the models that are Possibly and our surface process models. However, we also have wrapped models like Bob's Delta Harris has wrapped Delphi D with a BMI. I know that there is work on Wrapping wharf at NCAR as well. So most models are Process models, but there are others as well. So for the second question for part B I'm not sure I have a good answer right now, but maybe we could talk online Additional question that is how much overhead does this actually add that you have all this layers of code Oh, it doesn't slow down. That's a great question. All right, so keeping in mind that I'm trying to think of the right way to say this I'm gonna say Coarsely keeping in mind that we don't care too much about performance or interested in coupling You don't think about it a whole lot. However, it still works pretty well. It's actually not a very although my My wrappings are pretty thick on the side It's a pretty thin wrapper around So I can't tell you more example metrics that would be a need to do actually find out just how different they are but my Intuition is that it doesn't add much. Thanks mark. Is it is it on? This is Katie Barnhardt from the University of Colorado. So a thing that's come up a lot is There's sort of potential a need for the parallel eyes surface process models and As you just said I am teased not was not built with that intention But my understanding is that you guys have worked quite hard to make these things very generic What are what do you think the pathway towards making it parallelized would look like is that a you know Long torturous road or is that something that you think is sort of feasible to do in? CSDMS 4.0 Thanks, Katie You're not supposed to ask me hard questions Katie Yes, you do you're good at that So I want to be careful. I think it's not Trivial and so you the way you said this in CSDMS 4.0 that may be the way it works So let me be frank and say that I don't have a good answer to that I think that Eric probably would have a better answer for that at least better to be better to explain our Physician, so I think to try to be fair. It's hard The way things are architected now, and I don't have a good answer. Oh, thank you very much. Thank you