 Good morning. So this is a some a project I've been working on for a number of years with a number of collaborators For the quasi-HIS team. This is funded by NSF David Maiman is the project PI David Tarboton who's Right over here. Many of you know of course is a one of the co-pis I'm also one of the co-pis and there's a number of students and Researchers that have also participated in the project. So I'm really gonna kind of give an overview of the HIS Because I think this is something that you may not be familiar with And I'm going to talk a little bit about my part of the project which is creating this hydromodular application has some analogies to what's happening here with CSDMS and conclude with some kind of thoughts here about How we have a number of different modeling frameworks that seem to be coming up in different communities and what can we do to try to? Facilitate interoperability across those different frameworks So first you might be asking yourself what's quasi So this stands for the consortium of universities for the advancement of hydrologic science incorporated quasi has about a hundred and twenty five members And among other things some of the priorities for quasi is are to develop and support Operating research infrastructure. So this is this infrastructure to help do science And a second objective is to improve and promote promote access to data information and models So underneath that umbrella. We've had this project called the quasi hydrologic information system project and The problem statement here and I almost put problem in quotes because we have more and more data That's being made available to us through sensor networks remote sensing model output and we need to Kind of advance our techniques for handling and processing this data Okay, so we've got the the wealth of information if you will How do we how do we leverage that effectively to do science now? You might say to yourself? Well, I don't think we have that much data. So maybe you can rephrase this a little bit How do I know what data is actually available and out there? So that's another kind of way to twist the problem if I'm doing a study in a particular area So some of the studies that we heard about over the last couple of days, especially these kind of global scale studies that depend on data sets that In order to parameterize more statistical models Being able to access this data and integrate it is a major challenge And so if we can facilitate ways to do that more effectively that we can hopefully advance science So the HIS goal here really to serve science, so I'm not going to be doing a lot of Hodge a lot of science here in this presentation You might call it information science, but it's tools and techniques to help Facilitate hydrologic science The way that we've set up this system the design this HIS system First thing you need to understand is we've made a decision early on to keep the data distributed So some of you might have be familiar with things like open depth for example and follows a similar kind of paradigm We keep the data where the data lives so keep the data with the data publisher But we create mechanisms where we can query that data Through things called web services So we can go and look in those databases and see what information is in there create what we call a metadata catalog that facilitates search and you can create little scripts or tools That can go off and fetch data on the fly and assemble it and the key here is standardization So that's something I'm going to talk about a number of times and there's a lot of analogies here You can make between data integration model integration You that might come out through this talk and I'll try to emphasize through this talk But if we can create standards for data access for interfaces for these databases for data communication Protocols then it becomes a lot easier to integrate this information together. The software has is less sophisticated It has to be less sophisticated, which means less code Maintain more robust application Okay, so this is our triangle with diagram here to help describe the HIS in kind of simple terms We're really thinking think about the HIS is three different components First we have the hydro model or sorry hydro server component Okay, so This is where the data resides So if you want to set up your own hydro server we have software for doing that I have a slide coming up to describe that in more detail and so you load in your observational data Into hydro server and then it puts it out onto the internet We have something called HIS central so this is maintained at the San Diego Supercomputing Center and this is basically looking at these hydro servers and looking at metadata for those servers and Creating a metadata catalog And then we have client applications our primary client application we call hydro desktop and This is kind of the portal into the system All right, so if you want to go and query and find out where there's information You would use a client application like hydro desktop to do that So the important point here is that the data sits in these servers We harvest metadata and centralize that but all the data stays distributed so that when you do a query you might look for information in On central to find out what's available, but then when you actually want to go get the data You go and get it back from one of these servers So we like to kind of maybe draw the analogy here between this set up and the way the internet works in general So you have a number of different servers out there. They're being indexed by search engines You do a search against Google Google has got a cached version of that website that it looks at But when you go and actually get the website, you're not getting it from Google Google directs you to the actual website So it's following the same kind of paradigm Okay, so now I want to go through each one of those parts in a little more detail provided you a little more detail So this is the work of David Tarleton and Jeff horse broke her horse borough at Utah State University Hydro server it one of the main contributions here is this Observations data model and it's described in this publication It's essentially metadata for describing observational data within hydraulic science So when you take an observation you need to be very explicit about what that observation is What are you measuring? Where are you measuring it? How was it measured? Who measured it? When did they measure it? all these metadata are just critical for understanding what that Data value represents especially when you want to share it with someone who who didn't actually directly collect the data so that metadata was thought through very closely by David and his team and Resulted in this observations data model. There's a bunch of software that facilitates the automatic harvesting of information off of sensors and loading that into The database there's tools for visualizing the information within the database and these are all open source software that are Available from our website and you can download install and use a number of people are using it The other piece of the software here with Hydro server is This piece and this is the the web service piece Okay, so like I said before all the Hydro servers live in different Geographic locations, but they all have a standard interface That allow client applications to go and query data from those databases So with an analogy here back to CSDMS you have models models have a standard interface Okay, so this is something that Scott's talked about a number of times That componentizes that model and allows it to be used within a larger software system But it's the same idea here. We have our particular interface here for Exposing information that is stored in these observational databases The data that so we called the actual interface itself water one flow It's basically you could think about it as a set of functions Get sites so get sites will clear the database and return to you all the sites that are stored within that database This is kind of a metadata type of search or you could say get values And you can say I would like to know that this particular time series and it will deliver that time series back to you We designed this particular data transmission language called water ML and this is a XML based file format that fully describes the time series so that client in applications can Access that information understand it and properly interpret it a little more detail here on these two critical pieces of the infrastructure Because these interfaces and these data communication Languages are so important for creating these types of systems The water one-flow web service is listed here and all the different Functions that are available and This isn't a complete list But it gives you an idea and this gives you an idea of what the water ML file would look like so you can see that self-describing this is a Particular site the Colorado River at Austin, Texas It's a USGS national water information system site that we are harvesting and providing through our system It's got a particular lat long and you can see the data values as well So these are XML files you can open up up with a text editor but more importantly their software that knows how to parse that information automatically and and And operate on that information with kind of an offshoring to programming languages Here's one more example here of zooming in on the actual in-wis data for a response for Just describing what the variables are and then the actual values you could have all kinds of attributes for qualifiers Daytime, etc. Now once we have all this information One of the main challenges is everyone's got their own way of describing information Okay, so even across federal agencies, of course, there's not completely consistent ways of describing information So we handle that with something called an anthology, which is essentially a way of mapping concepts So you might have a concept such as stream flow and that concept of stream flow might be expressed using different vocabularies across different Databases, but they all mean the same basic concept All right So in order to search on this catalog if we don't require people to use particular name We can't tell the USGS exactly how to name their variables then we use this idea of an anthology to be able to map between different concepts And to facilitate search within this distributed database And so just to give you an idea the amount of information that we have indexed right now in HIS central there's 66 different services. So you might think of these as 66 different servers out there with information 18,000 variables 1.9 million sites 29 million different series time series and 5.1 billion values and it's growing every day people set up new servers It's international to we have servers that are there's nothing that restricts it to the United States And it's free anyone can you can if you wanted to set up your own server Tell San Diego about it. It will go and index it and it will become part of this system as well become a node within the system Just to give you a little taste of what information is in the system So we kind of took out some of the data and plotted in Google Earth here. This is USGS instantaneous data So these are the USGS gauges that are measuring data in real time and report information about every 15 minutes 80 different variables the USGS provides this way at just over 11,000 different sites The way that this works with our system by the way is USGS Has obviously they maintain their own database. They don't adopt our database standard So we create wrappers around their database that essentially what happens when you make a request and you want USGS data that request is is Translated at San Diego and then pass on to USGS now. What's the advantage of this the advantage of this is We've had instances before where USGS changes slightly the way they do things Right, and if you're depending directly on USGS for your information and they've tweaked it just a little bit It breaks your application But we can if everyone's dependent on San Diego San Diego quickly fixes it And then all the client applications so it's kind of a middleware James talked about middleware before So it's kind of acting as a middleware within the system We also have indexed the National Comet data center weather data that is North American scale data, I think probably global to scale data as well as gives you example of a global scale data set These URLs here are where the web services are okay So if you open this up you would with a browser You would see a bunch of XML which doesn't really help you but from the Software perspective there are tools that know how to render and work with those particular types of documents in order to call those functions So that's kind of your unique identifier to where this data set lives Also want to make the point here that you don't have to do these large scale data sets. This is a US DA Experimental watershed within Idaho, I believe Yeah, I don't and so again anyone can sit there's an open system So anyone can go and set up their own hydro server and put their data into it index it within the system It becomes searchable within the system and one of the main focuses because it's NSF funded is Experimental watersheds that are set up through by hydrologists. So here's one example of that. There are many examples of these different experimental watersheds within the system dry creek experimental watershed and Observational data being collected at that experimental watershed are put into the system as well and accessible to anyone Through the system for analysis So our primary application for viewing and accessing the information is called hydro desktop this is built on an open source GIS system called map window and it the emphasis on our software is really to be user-friendly and so there's tutorials and wizards that help you go through the process of Queering for information so you can kind of zoom into a particular area have start-end dates do keywords You can limit it if you're only interested in a particular set of a service you can limit to what services it searches over So if you only trust one particular data source for example one of the questions We get a lot is if you let all the people play in the system How do you know the data is any good? Well one way is you can always limit your searches and say well I just want data from a federal database and so you can you can make sure you get that just that information out of system If that's what you are interested in The data will be delivered back to your desktop and Stored locally so you kind of search the cloud if you will for the information you want find what you want and it goes And downloads into your into your workspace so this is Brings up the point where I've been working on the project So I've been working on application. It's in many ways similar to what's happening here and CSDMS component-based modeling We call this system. This is so James talks about how he's poor This is basically done by a graduate student in my group so we built we have leveraged very heavily on the open modeling interface open MI and we built a plug-in into hydro desktop and essentially what we're doing here is Allowing researchers Tools and techniques that allow you to kind of take individual processes or large models put them into this this modeling environment Open MI provides a lot of functionality for passing and linking information between different components We have developed this database reader component So the data you go and download from hydro desktop is sitting in your database And this provides access to that data archive to any kind of model that you have within the system Likewise, we built a data writer very simple So when the model goes and calculates something so pen my teeth is going to go calculate some potential of average Transpiration you can go write that back to the database Okay There's other tabs within hydro modeler for viewing the data geographically temporarily you can view it as tables. So it's a very interactive type of environment. We're trying to build And my primary motivation here with building this is is really kind of our educational purposes For teaching hydrology basics of hydrology to get those individual components for allow students to swap out the individual components And to really kind of see within the the big box of what the hydrology model is doing and Isolate individual pieces. I think this gets to the component-based idea There's a lot of merits from a science perspective as well as really going in and testing individual pieces The model testing how the model would work if you changed out certain pieces If you had a new way of doing evapotranspiration and you wanted to see what a difference that would make within a larger framework of a model It's relatively easy to swap out a component put in your new component run it again and see what the result would be So this is also freely available through the hydro desktop Website and we have some tutorials too that walk you through a few basic examples Just a quick word here about open to my this is the technology that we're building off of so CSDMS is is close to open to my compliant and Essentially what this is is a you EU funded initiative to try to think about how you couple models together within component-based modeling and This might seem trivial to you. I don't know But it's there's a lot of software and ideas that go behind how you do this exchange of information How do you describe information when it's being passed between different models the what where and when of individual values so that two models can Properly interpret the information. There's all kinds of regretting that might have to happen As been mentioned a number of times here at this conference Open to my one of the differences is it's more a kind of object oriented in terms of how it views or vector oriented in terms of How it views space so you have things like polygons and polylines? Representing elements within a model or you could have a series of polygons representing a groundwater model And you have to do some kind of remapping to exchange the information between the two But a lot of that is handled within the open to my Environment for doing those types of data exchanges One of the things we've been working on this is the work of Tony Casanova Who's a PhD student in my group is lowering lowering the barrier to getting into these component-based environments it's Open to my is really a software engineering tool and so it's intimidating for Civil engineering grad students to really get in there and start to do stuff So we've tried to Simplify this we have a paper about our approach that we published last year environmental modeling and software We call the simple model a simple model wrapper and it's in a lot of ways analogous to what's happening in CSDMS We have initialized perform time step finna lot finish, which is very similar to the IRF paradigm within CSDMS We have a procedural model the actual code the data and any kind of supporting libraries for that We dump a lot of the metadata into what we call an XML Configuration file that describes the inputs the outputs the data exchanges that are available for this component And then we take this little piece here and we wrap it with an open to my interface Okay, so one of the things I want to show later is I think one of the core ideas We could think about with all these different modeling frameworks that coming up or coming online is what is this core piece? look like that can be wrapped and exposed with multiple different types of interfaces So maybe you have an open to my interface for your component Maybe you have a CSDMS interface for your component. So the core library the core code can be Interoperable between the different modeling framework. So we're not all writing the same in one month teeth equation in each of the individual environments So let me show you an example here of how this works So this is the Kawida Watershed, which is just over the border in North Carolina And this is experimental watershed that's been running for a number of years decades And what we wanted to do is just show how this might work from beginning to end with the HIS and doing modeling So we set up a Hydro server for Kawida. We loaded in some of the data that were available there for precipitation air temperature or stream discharge This work is been done by two grad students in my group Mustafa built a lot of the components I'll show and Tony's done a lot of the Software engineering that was necessary to pull it off So viewing the information in Hydro desktop again, it's a GIS based viewer. So you can see the actual watershed Right, we're going to simulate this particular catch been here, which is watershed number 18 The dots here are Information that we went and retreat from the hydro server So we went through the series of searches that are available with tonight hydro desktop to go and find information That's relevant for the study area Download it and the whole process is seamless and it automatically downs it loads it into the database and data structure That's sitting behind Hydro desktop So the way I think about modeling and of course, there's many different ways of doing this I like this explanation from Keith Bevin in his book rainfall rent runoff modeling Start from the very beginning, you know, how do I perceive this? System to be working Okay, how do I want to express my perception of how the system is working in terms of? Equations, okay, I need to code up those equations. There's different options for doing that I need to make sure I have parameter values that allow me to reproduce some historical data Okay Did it work? If yes, good, you're done. If no, you need to go back to any one of these steps Okay, and I think right now we go back to this step. We do calibration a lot Maybe we need to make it easier to go back to those previous steps as well So that we can really think about the system all the way back to how we perceive the system to be working Five minutes, okay So do I have the right processes do I have the right? Representation are there any bugs in my numerical code? All this process needs to be exposed so that the modeler can see and kind of play around very interactive as well We envision this particular application Working again emphasis on education, but also for research as well to test out new ideas about individual processes within a larger system okay, so HIs isn't just for hydro modeler So there's an API as I said before and our vision here is that HIs will be a piece within multiple different Frameworks, and so we've been working with Scott to do this. So this is a cmt and Scott wrote an application here called HIs data, so it's a little component and Essentially all it does is uses the web services I talked about before you can query for information within the HIs It will go and download it and there it is sitting for you ready to be input to other Components within cmt. All right, so this is a good analogy of how we want to HIs to be kind of Serving multiple different modeling environments So here the path forward for what I think we need to be doing my last point Okay, what are the core ideas here that are as common against the different modeling frameworks? How do we get to that core concept so that model components can be shared across different environments? so this is the open of my interface here and Then within CSDMS We have a direct mapping between the different methods that makes it pretty easy to take a CSDMS component and bring it into open of mine All right, so this is Robert's actually an undergrad student who came who's I kind of said Maybe we can use CSDMS components within open of mind And so he's mapped out this idea for how to do that and he's starting to implement it now I think I'll skip this one. I just wanted to make the point here Of scaling up to even larger models and some work that we're doing with the ESMF and open of my Interoperability, but since I'm running a little in time here. I think I'll skip over that one and just finish here If you think about where things are heading at least in the HIS team We definitely think OGC is kind of the place that things are heading so OGC is a standards organization and open of my quasi through water ML Threads data server are all kind of being compliant here to some degree with with OGC or working towards standards through OGC So we want to think about earth system modeling and integrating data across different earth science disciplines We need standards across these disciplines and what we're seeing at least in our community that maybe is OGC is Is the right place to go for that? So in summary, I really wanted you to note two things Get some background on HIS It's really about standards for exposing information and then software we have built that helps you use that information and secondly here You know, you might think standards and protocols are pretty boring But if you want to do all this kind of cross-disciplinary earth science analysis getting those standards and interface specifications Exactly agreed upon is the key for making all the systems interoperable. So I will stop there. Thank you very much