 We're going to get started and I think Andrew is going to introduce the speakers, but I'd just like to welcome them and thank them and talk to them in the hall a little while ago and welcome them to thank them for coming to our meeting, giving us some information that we need. I just want to remind people that and the speakers and the members, just use your microphone when you're speaking or asking a question and then turn it off. Anyway, I'll turn it over to Andrea and she'll introduce the panel. Great. My name is Andrea Dutton. I'm an associate professor at the University of Florida and a member of the Chorus Committee. So I'll be moderating the session today. I'd like to thank all of our speakers for being here. I know some of you are quite close by and some of you are really far away. So we appreciate all of you for coming and being here in person and we also have one person remotely participating as well as a speaker. So as you probably know, one part of our statement of task for this committee involves infrastructure and in that task to our statement of task is to identify the existing infrastructure to provide a discussion of the current inventory of it and to analyze capability gaps that exist within that infrastructure. So as part of that, obviously cyber infrastructure falls within this very clearly. And there is quite a landscape when it comes to cyber infrastructure. And so part of our reason for asking you here today is to help us understand better what that landscape is. And so you are representatives of that at various different levels within EAR with a geo level or broader different directorates. And they're all different levels of that. And so that's what we're grappling with, trying to understand what is out there and maybe what gaps there are as well. And so we've given you a list of guiding questions to help us out with. And my understanding is that you've put together one presentation where you're going to try to walk through and address those questions. Before we get to that, I just want to provide some very brief introductions of the speakers and then we'll get started with that part. So we have Steve Whitmire, who's representing geo informatics. He is a professor at James Madison University, but currently serving as a rotating program director for geo informatics and tectonics. Kevin Johnson, is he with us remotely? Yes, I'm here. Ready? Great. Hi, Kevin. Thank you for being here, the voice from the sky. Kevin is a professor at the University of Hawaii. I think you're in good company in this room today in the Department of Earth Sciences. And he is a rotating program director for geo informatics, as well as IS. We also have representing EarthCube. We have Eva Sancerchia, who's a program director also in geophysics, but has been involved with several cross foundational activities related to cyber infrastructure and is currently leading the EarthCube program. And we also have Ken Rubin, who's a professor in Earth Sciences at the University of Hawaii. He is currently serving a two year term as the chair of the NSF-funded EarthCube Leadership Council and was previously chair of the EarthCube Science Committee. So both of those are elected positions within the governance of EarthCube. And we also have Amy Walton, who is a program director in the Office of Advanced Cyber Infrastructure and is also involved with the Cyber Infrastructure for Sustained Scientific Innovation CSSI program. And she will speak to us about her expertise in those areas as well. So that gives you a little bit of a background. There are more detailed bios in the agenda that you can look at. And I would also ask some of our speakers, if you want to add more about your background or your bio or things that you can speak to, please jump in and do that as we go along. So without further ado, I'm going to turn it over to you for the presentation. And then I'll follow up with some questions and open things to the committee after that. So go ahead. Okay, so this is Steve Whitmire. I'll start off talking about the geoinformatics program in the Division of Earth Sciences. Our hope is that as we sort of work our way through the different programs that we were asked to discuss, we go from sort of a division level to a directorate level for geosciences and then to a cross-directorate at the end. We have a number of guiding questions that you all asked us to address. And we're not going to each individually address them for each program, but there are a lot of cross-cutting themes. And hopefully by the end of what we talk about here, we'll cover most of that. And the rest, I assume, will come out during our discussion in question period at the end. So Kevin is out there as well. Kevin Johnson, who's my co-program director for geoinformatics, I will start off and sort of go through the basics of the program. But Kevin, I assume, and I'm sure we'll chime in as needed. So geoinformatics originally was a component of the IF program, instrumentation and facilities several years ago. Then it was separated out as a separate program. And the previous solicitation was competed up through, I think the last solicitation was 2015. That's under IDVA, I think. Then it took a hiatus for a couple of years as we sort of re-envisioned the program and retooled it to make it more consistent with funding cyber infrastructure through the Division of Earth Sciences. And the new solicitation is being competed this year. It's the first competition for this 2019 solicitation. The deadline is August 15th. The funding amount, which you can see there at the bottom of the slide, is about the same as the old program. It's 4.8 million give or take per year. It's competed every two years, however. So there's a competition this year and then the next one will be two years from now, 2021. And this new solicitation, the new vision for the program, breaks geoinformatics down into three components. The goal is development of cyber infrastructure specifically for the Earth Sciences. But we recognize that there are projects that are at different levels in their evolution, you might say. So from just getting started, which is what we envision as the catalytic track, sort of pilot programs, through established resources, cyber infrastructure resources, which would be competed through the facilities track. And then resources that have been around long enough that they're probably at the stage of how do we sustain this resource, perhaps post NSF funding or into the future? And that comes into the sustainability track. I will point out that one of the main themes of the new geoinformatics solicitation is sustainability. It's recognizing across the board that we as a foundation have resources and various data components that you've given probably a flat funding scenario cannot be supported at Infanitem into the future by the National Science Foundation. And so we and the community at large, I think need to think long and fast about how these can be sustained. If the community recognizes them as being valuable into the future. There we go. Okay, so this is just a quick summary of the three tracks. The catalytic track, which will support pilot projects, awards of up to three years, the facilities track, implementation and operation projects, three to five year awards. The idea here is you can get sort of 10 years of funding total, whether it's through three three year awards or two five year awards, after which we would expect you to move into the final track here, which is sustainability. And that is to support the development and implementation of sustainable approaches to preserve data, models, resources, et cetera, into the future. And again, we don't envision sustainability awards as being more than one or two of these awards, after which we hope that the resource will move forward. Again, the proposal deadline is coming up quickly for this year and then it's every second year after that. Just to be clear, the funding for the program is yearly funding. So there is a second year of funding that will not be competed, but most awards we envision are going to be continuing awards that will need funding as we move forward. And so this provides sustainable funding in the off year. To Remy. All right, it worked. I never know which computer actually I'm targeting. Okay, so this is sort of a teaser slide because this is something that's sort of the theme of this whole panel, right? Is the spectrum of possible, not just geoinformatics or cyber informatics and data structure supported activities. Obviously there are some activities that are directed towards an individual project and we suspect those will be supported by the disciplinary programs. Once you get into geoinformatics program itself, again, there's the catalytic track, the facilities track and the sustainability track. And then later on, as I pass the mic and so forth, we'll talk about more across disciplinary types of programs. Never gonna get this right. There we go. One of the things we were asked to touch on is what are some examples of current projects supported by the programs? Keep in mind that all of these are currently supported by geoinformatics, but these were from the previous solicitation, okay? So these are not necessarily representative of what's gonna come in this year, which will be, I think, a broad spectrum of projects. However, some of these, most of these are probably familiar. Several of these are rather large facilities that are relatively well-established, things like CSDMS, Earth, Kim, and CSAR, which are components of CIDA project or the AIDA facility at Lamont. CIG, Open Topography, these have been established and functioning for a while. A slightly smaller project is Flyover Country, which has been funded through geoinformatics. And I should say that almost all of these are co-funded by other programs. So these are not solely geoinformatics projects. Anyway, that gives you sort of an overview of geoinformatics. I think the next thing to do is to progress to the next level up from here, which is talking about a Geoscience Directorate Level Program, which is EarthCube. And so I'm going to pass this along to Ken and Eva and they can follow up. Thanks, Steve. Before we get into EarthCube, I'll just introduce, as we've already been introduced, I'm the Program Director for EarthCube and that's on the NSF side. But Ken's really the leader from the community side and we'll talk a little about how EarthCube is set up and what the different roles are there. So I'm going to start with a few slides but then turn it over to Ken. Before I get into sort of the background of what EarthCube is about, I will point out this website that we have on the Geoscience Directorate webpage called Cyber Infrastructure for the Geoscience's Opportunities. And this is where we attempt to lay out all of the different activities, whether they're about data policies, things at the division level, directorate level or elsewhere in the foundation that might be of interest to the Geosciences. So this one is for the past fiscal year. There are some things that are still active and upcoming here but you'll see that there were many things that were put up here like town halls and things like that from the past year. So I would recommend that you can go to this website to get an idea of what kinds of opportunities are available from the foundation for Earth scientists as well as geoscientists. Okay, so I'm going to give you just two slides, a quick background on what EarthCube is and what the premise was. So EarthCube started in around 2011, 2012. And the idea was that we were responding to drivers in the data sciences. So at that time there was a five year NSF initiative called CIF 21 Cyber Infrastructure for the 21st century. And the main mission of that initiative was to create a national cyber infrastructure to support all domains of science that NSF is supporting. And then there was also this memo from the Office of Science and Technology Policy which stated that any products of federally funded research needed to be publicly available. And any agency that provided, produced that type of data had to come up with a plan for how they would make that data publicly available. And NSF's response to that is available for people to take a look at. What that NSF's response acknowledges is that it's really hard and there's lots of different practice across all the different areas of science. So our approach would be to work with different communities and begin to make their data available as was appropriate for the scientific domain. And then on the other side, well, I'm sorry, one thing I will note is in the interim from that time period to now, making data broadly available has become more of a driver for the scientific community for other reasons. Reproducibility has been a very important thing. There are efforts like that, those at AGU that have led to publications requiring where you tell them where your data is going to be held. So these are things that are impacting the community. And again, they're part of the driver for EarthCube. On the scientific side, there was a strategic priorities report that came out of our advisory committee for the geosciences called Dynamic Earth. And that laid out a set of scientific imperatives for across the directorate. And some of the things that came out are there are we're trying to do system level science across several scales, whether it's interdisciplinary across the different domains of the geosciences or within a specific domain itself. There are interdisciplinary problems that require integration of data. Then the challenge is that we have huge time scales of data and a diversity of different data types from streaming data, continuous data all the way to physical samples. And the challenge of trying to integrate those is one of the difficulties that we were trying to respond to. We also have a wide variety of sources and researchers from our large facilities all the way down to individual PIs as well as coordination with academic partners and agency partners. So EarthCube is a program to try to enable data discovery and access and interoperability of all of these different sources so that we can advance our science. So EarthCube is a program at NSF. The funding began around 2012, 2013. Since the beginning, it's been a partnership between the Office of Advanced Cyber Infrastructure and the Geosciences Directorate. And Amy Walton is our partner in that. And so she works very closely with us as we put our awards together. So this slide, you'll see this little timeline here. On the top are the things that NSF does. So we send out funds, we put together solicitations. This kind of lays them out. And another important thing besides funding infrastructure that would help with our mission for enabling data access was that we wanted to build a community of geoscientists and cyber infrastructure researchers and computer scientists to really inform us about what challenges are the ones that are ripe for advancement and to give us feedback that we use in our solicitations. So we have many different mechanisms that we've used. We funded over 24 workshops. We started first with visioning 24 workshops in different geoscience communities to tell us about their data challenges. That led to competitions like research coordination networks which built on those workshops and then different activities to sort of build up the cyber infrastructure that was needed to address those problems in an interdisciplinary way. And then on the bottom side now, I'll turn this over to Ken to talk more about what's come out of the community process. We've got a community governance that has a leadership council in different committees and an office that sort of facilitates this community to come together and produce visioning documents. There's been a roadmap and then there are these pretty significant activities related to data registry. So with that, then there's continued stakeholder alignment that the EarthCube community continues to engage in. So I'm gonna turn it over to Ken to talk about that. Thanks Eva. So yeah, I think to some extent, EarthCube is a bit of a black box to a lot of researchers who are supported by NSF who aren't a part of what we do. And this is something that we hope to change quickly and using examples of very useful things for our communities. And just to give you an idea of what I think EarthCube is doing, it's building this sort of discovery and use framework for all forms of geoscience data. So not limited to the Earth Sciences. So across the entire geosciences directorate for data, data that comes from sensors or what we call long-tail researchers in data related resources, which includes everything from models, software for visualization and use and analysis and building a community that plans for this, builds it. And then this last part is a very key part of what I consider my own personal mission with respect to EarthCube, has to do with data stewardship. Educating geoscientists about best practices and ways to make their workflows better for themselves and for the rest of us who would like to utilize their data. So this really is a community-driven effort by and for geoscientists, including those of you in the room who collect data or use them in any way. And for me, the vision is to develop and enable this sort of interconnected, interoperable, interoperable is key, high functioning ecosystem of data repositories, search and discovery capabilities and analysis, modeling and visualization applications for the geosciences. And this is meant to improve the time to science. This is workflow support in the same way that your smartphone does life workflow for you, whether it's e-banking or e-commerce or streaming video in ways that it couldn't do a decade ago. We need this just for the average working geoscientists as well as enhance data discovery and knowledge creation into enable cutting edge research. We don't wanna leave out the cutting edge research it will always happen and we wanna facilitate that. But we also wanna think about the everyday average workflow of working Earth scientists. So something that EarthCube is not, okay? We're not focused on subdomain or limited use or very specific applications. Those have their homes in other programs, but we're especially not interested in things that aren't sustainable or that don't integrate into an interoperability framework. And I'll explain what this means in coming slides, but this is very important. We need to move to an environment where all of the tools that are created by and for us work together and exchange information. Okay, so this is a kind of a wordy slide. I apologize, it lists some of the accomplishments and the impacts. So perhaps the biggest accomplishment is the community itself. It was really difficult to develop a community of working geoscientists, informaticists, data scientists, software engineers to develop a governance structure that was able to speak the same language and exchange information. And it's taken us a while to get there, but I think we're there now. We've identified needs and solutions across a broad spectrum of the geoscientist community. One of our probably most impactful activities is the development of something called the Council of Data Facilities. So this is, I believe, the first platform at NSF for cross data facility discussions. These are NSF-supported data facilities, things like Unavco and Iris and IATA, having them talk together, identify common needs, common pain points, places where the technology is going, how do they behave sustainably and to come up with this first data interoperability effort so that from the perspective of an average user, you shouldn't have to care where your data is stored. You want to be able to find it, you want to be able to access it, you want to be able to use it. And we want all these data centers to play in an interoperable way, which requires putting some standards on. So we've developed various kinds of platforms for data search and discovery. And these include not limited to, but something called Geocodes, which is a cloud implementation of data and data resource registries, which are compiling information about data. And I have a specific slide on this so I'll come to it in a moment, but key is that this is built on industry, standard, search capabilities that drive the web. And we also have a large portfolio and I've only listed a few of them there of applications for the geosciences. There's a huge number of them, including all sorts of sort of middleware that promote information exchange and structuring for geosciences web services. There's things like StraboSpot, which you might be familiar with. This is started as an RCN, it's a group that supports field geosciences. Tools that help scientists that work in the field to collect and analyze their data in real time to access things in the literature from the field, et cetera. There are things like Earth, Life Consortium and Pangeo, also important cross-domain efforts that are integrating lots of data from different science domains. There's Cords is a really interesting program. This is real-time data streaming. It started in the sort of climate and atmospheric sciences domain, but it's been expanded out to all sorts of stuff such as volcano geodetics. Any place where real-time data is available on the web can be accessed using the structures of that program. We were asked to talk about our relationships with industry or very engaged with industry at the level that we can given the way the foundation operates. So for things like search, for the platforms that we support, for utilizing cloud storage, for utilizing other NSF infrastructure that supports those things in partner organizations, we've got interactions all over the place and many of them are quite positive. So just a moment about what Geocodes is. So Geocodes was an effort that the idea for it came out of this Council of Data Facilities. Gee, wouldn't it be nice if we could all, using the same framework, describe what data we provide, what data services we provide, what other data services are out there, software packages, for instance, modeling packages that work with our data across a seamless framework that a science user or another data facility person or an informaticist could access from one point. And so the first project in this effort was called Project 418. And what it did was integrate a whole series of pilot data centers, you can see them over here. Maybe you see your data center there, maybe you don't. These were ones that volunteered to help encode information about their data holdings, the metadata, using a standardized nomenclature and structure. We're using standardized markup language called JSON-LD, using a standardized data structure that was developed by the big search providers, Google and Bing, so that now with the data component of schema.org, and we've added an extension to that for Geosciences data, you can find data holdings through an organic search, just through your search bar, or you can ask Alexa. But more importantly for us is that stuff is on your right hand side of the screen, which is applications built to pull that data and use it. And we wanna support all of those platforms, plus any new platforms that are coming online through a tool integration network, so that along with this data facility registry, which I've shown here, we have a registry of resources, which are tools to use the data, plus to be developed an integration platform. Now, one of the challenges of NSF funding through solicitations and peer review is that the kind of projects that bubble up to the top are the ones that some combination of program managers, reviewers and panelists seem to be most important. It's not always the case that those people, all of those people have the overarching goal in mind when they fund a project, especially one that needs to be a building block in a puzzle piece of a landscape of interoperable things. And so we advocated strongly after not being able to develop this through the standard solicitation process to fund these developments through sub-awards made through the office. They're still competed, they're RFPs. We convinced Eva that this was something that we needed to do, and so this has been going on now for a couple of years. Focused, targeted developments, they are community governed, so we have advisor groups that are drawn from our user communities, helping with the development of these things. And they're moving forward relatively rapidly in a way that's using sort of undeniably logical approaches, which is to base everything on the industry standard of today, right? Rather than what one or another PI thinks is the right way to do stuff. So what I want to leave you with is this concept that we're building for the future, right? And the future literally means to get us to a place where we have this interoperable framework, where we have these tools that serve a broad sector of their scientist community, and it is flexible and adaptable to emerging CI trends. This is why it's so important to not be focused on a fixed, hard infrastructure that sits at some facility somewhere, but to try and implement as much as possible in the cloud through the cloud. And along with this, it's very important to me to develop this concept of data stewardship methodology, solutions for scalable data access. I'm assuming most of you know what FAIR stands for with that acronym, as Eva mentioned, AGUs, has promoted this, it didn't compromise you, but they're promoting it for their journals. But what it means is to make data and data resources findable, accessible, interoperable, and reusable. And it will tell you that the National Science Foundation is so far from FAIR that they don't even realize how far they are from it. Right, for them, open data, open access is what they're interested in, but the data landscape is extremely heterogeneous. It's very difficult to pull information across programs, especially if you're a climate scientist and you want to understand the relationship between climate and geophysical data set or a volcanological data set. Very difficult for you to do that without talking to some other expert that can hold your hand. And we want to change that. And the way to do that is by getting everyone to play along with the same standards. So that means individual scientists need to understand why this is important, the people that are building our infrastructure need to understand why it's important. And we need to get behind a common framework, a common language for exchanging information. And once we have that, and you can see I've got this triangle here that sort of puts EarthCube at the center of an effort for fair, for open data, and for sustainability of those efforts, it's not going to be easy, but at some point in time we should have a user transparent search and data recovery capability for individual scientists from any NSF data repository. We should already have it now, but we don't. We need to have scalable and implementable resource utilization by geocodes or some similar effort and a tool integration platform so that every piece of software that is built from this point forward to solve any individual domain research problem or something broader should have components that are reusable, should have components that are accessible by other scientists. There's no reason why we can't start to conform in the same way that the ecosystem of applications on your phone conforms to a set of standards. That's what makes them work together. So this is a really difficult job. There's a lot of cultural change. There's a lot of social change. The next generation of scientists, our students, our postdocs, they are going to expect this, right? They want this to be native in their workflows. For some of us, it's long past overdue. The data policies that NSF now has come out of the late 2000s and they're very, very far behind industry standards that exist outside of nationally funded research in the United States. So we need to help them. We're the community that needs to educate and bring NSF along to the level of where we want to be. So this is a difficult job and there's a lot more work to be done, but I do think that we're starting to develop the kernel of a thought process and a framework to bring this to geosciences. And key is having scientists support it, right? NSF will do what we ask, but they're not going to do it for us, right? It needs to be promoted from the research communities. So yeah. Okay, well, thank you for inviting me today. My name is Amy Walton, and I'm from the National Science Foundation's Office of Advanced Cyber Infrastructure. It's not a division. It's an office that used to be in the director's organization. It's now hooked into the computer science directorate, but basically our job all the time, and you'll see this throughout the presentation, is to be those cross-cutting inter-organizational interdisciplinary activities. And it's not just technology. One of the first things, I guess I was asked to do three things today, to give a two-minute overview of machine learning. So let's see how that goes today, everyone. Secondly, I guess the other two things are to talk about two of the key activities or initiatives that are going on across the foundation, and certainly that are led actually by teams of people throughout the foundation. So I'll be doing those in a minute as well. But I also would mention that these activities that go on in the Office of Advanced Cyber Infrastructure are not just, and they have been in the past, for example, the large computing activities like the Blue Waters computer, the Texas Advanced Computing Center activity, the cloud-based computing that is at Indiana and basically the Jetstream activity, networking. So there's a lot of hardware, but it's also software, obviously, data with the piece that I was brought in to deal with. But I would also mention that one of the things we do is also organizational innovations. Those can be some of the most powerful, as you were hearing a minute ago, standards. One of the first things I did when I came here was to participate in the development of the Research Data Alliance. That has now thousands of people, now it's volunteer organizations, so now what is the sustainability there? But for right now, that's a model where people who have a problem get together and in an 18-month period, we either fish or cut bait, we come up with new things or we move to the next activity. So there's a lot of different kinds of things that one can call cyber infrastructure. So that's what I'd like to start out by mentioning. And then basically we sit now in the Computer Science Organization, so part of what you'll see is here are the things that are just beginning to emerge from machine learning and a whole array, excuse me, as you'll see in a moment, of different kinds of computer science, information science, data science activities, but also it's something that our lifetime has basically spent in meetings like this. So the slides I'm going to go to are the ones that I presented recently at AGU because I religiously go there because our job is to be speaking with the community because that's where the interesting problems come from. My life basically cuts across all the different science and engineering and education activities at the Foundation. It's basically a 24-7 cocktail party. It's a very interesting lifestyle. So okay, two minutes on machine learning. Here we go team. All right, so what this picture shows is basically how has the use of machine learning grown over time. In 2010, nobody used the words machine learning in their NSF abstract. That's changed. Obviously the large rust-colored piece, the computer science directorate, almost all of them it's like now you take in the shaker, the word machine learning and deep learning and neural networks into the abstract. But I would note that the geo community, which is that sky blue stripe, all of the data of different directorates are using more machine learning. And in some places it's growing very rapidly and some of them might be a surprise. For example, the social and behavioral activity simply because they're using more in classrooms, they're using video, audio, click kinds of interactions that those are growing extremely rapidly. I think that in the geo community, with all the remote sensing that's going on there, with all of the large real-time data sets that are going on, the ability to sort of sort through information in a reasonably rapid manner. I think there's a lot of potential there, but anyway. So this is an overview of just how, where things are and how it's growing over time. Couple of examples, again, they're from all over the foundation. We use it and stop to think about it. So okay, there's the brain picture there. But if you stop to think about it, you were just hearing a moment ago about how in the geo community, you're going from very high resolution or very sort of small spatial footprints to larger and larger regional, basically units of analysis. Same there with the brain, is that it's going from basic chemistry to sort of what's going on at the molecular level, what's going on in the various organs of the brain, what's going on with human behavior. So they too are wrestling with different spatial, different temporal units of analysis. In astronomy and physics, they're huge pipelines of data that they're looking to have coming from the large telescopes are some of the things that you're seeing in the remote sensing instruments that you're now. I guess I spent a lot of years at NASA and it was my privilege to be often taking and working with the 10-year science plans and trying to turn those into instruments. So same thing here, is that if there's a wide variety of problems, how would you measure it and how would you turn that into something that is a reliable measure of that phenomenon that you're trying to do your science on. So also for the foundation, education, I mean, every directorate does that in terms of we're training the next generation and again, as Ken was just saying, those young people are going to have different expectations than those of us with our spreadsheets and all the question marks that we always see when you come and do reviews for the foundation. So a lot of different awards. Machine learning is only a very small part of what's going on in the whole world of artificial intelligence. So this puzzle picture that you have here, if you go from basically the left side to the right, a lot of what the geo community is trying to deal with are things that start at the left. You have very heterogeneous data sets. How do you combine the ship track data with the overflight of the satellite, with how do you take and find the data that is meaningful to you? How do you show it visually? So there's a lot of tools and techniques. And again, having been involved with Eva and the folks with EarthCube over time is that from the five years ago, when you had those 27 shachats that basically were across every discipline, the facilities and things like that, at first it seemed like a very heterogeneous community and then they got talking with each other and it's like, you know, streaming data is something we both have problems with. Data privacy might be something that, for example, the social community, especially as you start to move into situations where you have kind of real time responses and community services for your capabilities, there's data privacy and issues that are going on. So there seem to be these sort of larger problems that more than one community faces. The yellow machine learning includes deep learning as a subset, deep learning is often used as a synonym for neural networks. And there are many kinds of neural networks. If any of you have looked at those papers, everybody's naming their own new improved vitamin fortified neural network, but basically it's a very small subset. And then it continues to go on to more and more robust intelligence, driving self-driving cars, sort of intelligent interfaces, wired up cities. One of the things we have is the array of things, which is basically the city of Chicago that has instrumentation that's up. And again, real time city data that is being provided to the community and to being used. But if you take a look at that across that whole area, one of our divisions at the foundation, the division of intelligent systems, basically every year spends $200 million across those puzzle pieces. So there's a lot of things going on and that's only one of the four organizations in the computer science directorate and we in the office of advanced cyber infrastructure are another of those organizations. So there's a lot going on. And I think it's later on when we sort of talk about what is the biggest barrier to knowing things? It's like, how do you find out about all the things in the dictionary? There's so much that's going on. How do you find the things that you need? So anyway, so there's a lot of things going on there. So this was the, there's, I think a lot of potential for machine learning. They work, the nice thing is the problem, what is most of the people around the table consider a problem is a great thing for machine learning. Oh look, more data. That's actually very valuable to machine learning. It does better with more data if they have faster and specialized hardware and we're finding that the Silicon Valley and lots of other industry organizations are just pumping out amazing new techniques and capabilities that will make a difference to our science as we start to have more and more data and different ways of using it, but more and more data. Algorithms that are coming out, especially for what's called unsupervised learning. Instead of the, is this a cat, right? You need to show them pictures and then you have to say, is this a cat or is this spam or is this a car or is this a certain kind of vegetation or is this, what are we looking at? For those kinds of things, better algorithms are making a big difference. So the kinds of things that you are looking for in your picture, especially the hyperspectral data that is associated with minerals and things as you take these oversights, what's going on there? But there are substantial challenges that still remain is that one of the things we think might be a very promising area is that the more data machine learning works better, but if you can bound it with physical models which you have around the room, you can get to your answer more rapidly or you can start to understand what's going on inside of it. So there's a lot of possibilities there. So there's the two minute overview of machine learning. So let me now take and go to, basically there's two programs that it is my privilege to lead at the foundation. One of them used to be, and I think when you were there, was the data infrastructure activity. We had both a data activity and a software activity and it's like, well, you need data to do the software and the software works on data. So it's like they were combined into, the next activity is now called cyber infrastructure for sustained scientific innovation and the most recent solicitation, that's the number that's there. I can't talk about 19 because we just had the panels and we don't have, what have we funded this year but we can talk a little bit about some of the things that we funded last year in the discussion. But basically how is this, what's going on here is that this is not just a new experimental technique that someone in the rest of the computer science organization would be putting together. Let's see if this works. This is, as you were hearing earlier, can this do the job without breaking all the time? Can it do it reliably, quickly? What happens when you have sparse data? What happens when you have these kinds of things? So it's basically something that we're looking for, robust, reliable tools and techniques that can handle whether it's streaming data or heterogeneous data or whatever that capability is that they're being put together for. There's a half a dozen things that are the basic, what are we always looking for? And the number one thing when you look at this solicitation is what's the science, the driving science challenge that it's trying to address. And when we have those panels, we have members of the community is that if this is supposed to be working on, that's still in the blank science problem, somebody from that community is there and everybody kind of swings around and says, well, hey Eva, are they really doing this? And if not, we move on to the next activity. So science challenge, innovative infrastructure. It's not just building large storage piles of blades to do computing or storage or things like that. It's, oh, would this technique work or would this be a new way of putting, where do you put the storage? Where do you put these things such that it makes it faster for people to look at these activities? That cross organizational, is it again, not just the reviews of these activities, but these activities include working as equals, the science community with the question and the people that, hey, I know a little bit about that problem in storage or that kind of software tool and technique. And then a lot of sort of the management things, so how are you gonna measure success? And it shouldn't be floating point operations per second. It should be, how do you change the time to science? Is it, if you do this, is it gonna make it faster? Is it gonna make it so that you can get these measurements you never could before? So again, it's sort of like, how do you measure it? What's your plan? It has more of a business model to it. It's like, what's your timeline? What's your plan and things like that. But the last thing that's on the list there, well, leverage is another thing and it's like, you're building on. You may be using an existing data repository for the community for your data. You're not building your own. So again, so there's leverage there. But the final thing is the issue of sustainability. So what happens when the grant is done? And it's like, if everything evaporates, or it stays in my lab, or it's on a sun drive, you just have to ask for it. That basically, what are you doing? Where is it going? How are you making sure that the rest of the community can use it? So those are basically the underlying ground rules for these things. And we've put out, it's a big program. It's over $40 million a year. So if you take a look at it, we have two sizes of activities. There's things called elements. They can be up to $600,000 over a three-year period. They also have things that are implementations. So the first set is kind of like a pilot that in a particular community, let's try this for a couple of years, see if we can make it happen. If it starts to be a real capability, there are things called frameworks that are for up to five years and up to $5 million. So those are bigger activities. So it's something that we actually have already started to have. Every year we have had awards to every community in the foundation. So we can go into that in excruciating detail at your convenience, but that's the idea of this one. Okay. Next one, there are 10 big ideas that the director has come up with. One of them is harnessing the data revolution. There's other things like windows on the universe, which is sort of the multi-messenger kind of thing that physics is looking at. That there's a lot of different communities that things that are going on. Basically for the harnessing the data revolution, they have a handful of things that are going on. But again, the goal is to try and address a wide variety of the key science challenges. So real time results, whether it's trying to find something that suddenly your telescope picks up a peep in the sky and you have to follow up now or that there's real time emergency response kinds of activities. The obviously earth system science is really important to this activity. So there's a whole bunch of different kinds of science problems, but inside of it, this circle of harnessing the data revolution activities, there's kind of three major kinds of activities that are going on right now. And so what you're gonna see down the one column is that those three main bullets are the kinds of things that are going on. There's just one of the foundations of data science. So obviously the division of mathematical sciences and the sort of the computer science, driving new tools, techniques and technologies. The word that you'll hear for that is tripods. It's an acronym, but basically that's one round of solicitations that's been going out. There's also been education and workforce. Again, the training, the community, people are having a tough time finding folks who understand the software, machine learning tools and techniques that are out there. So there's a data science core activity that's going out. The last one is one that I and a colleague in the division of mathematical sciences are lucky to be co-leading basically. What's the idea there? Is that it's hard to name it. We're calling it Institute. But what it is, is it's more of a capability. The word we don't use is center because that tends to be more brick and mortar here, mine institution and what these are attempting to be and they can be virtual. They could be basically outward looking. Somehow we're trying to develop capabilities that will allow the community to do better in a subset of areas. And it could be real time streaming data, missing data, things that, but basically there was a series of different, what is the scientific problem and why is it important to your science kinds of activities. And in the next two years, it's almost like in the pilot phase, it's like, okay, let's see if we can get some of those activities going on. And then in the longer term, after about two years, it would be the larger, potentially more of these and working across disciplines, because none of us starts out knowing more than particularly our discipline. But over the next couple of years, it's an attempt to have these groups that are chosen in this round, speak with each other and in the next round, would you have a larger capability that includes the same people, subsets of those people, different, as things change over time, what we're trying to do is build, in a sense, a fabric of capabilities that here's my problem and here's some people that can help me. So that's kind of where we're heading there. And I think the last chart, it's just sort of another, is that what we've done, and actually it's still going on today back at the Foundation, but basically there's two paths. One is the whole idea of Ideas Lab. If you have a problem, or you have a tool or technique that you think is valuable, but you don't have a team yet, Ideas Lab are one of the ways that we're trying to, can you start to introduce people to each other and get some proposals going that way? The other, the orange arrow is existing teams, is that if you and your group, it's like, look, I've been working with these companies, these universities, I think I have a well-defined problem and it's not just that I'm gonna solve the problem, it's a continuing issue for our community. And I think that we have the core of something that could continually help and improve and work with that. That's that second arrow, and then probably out in a couple of years. And now the winners, which we're working on now, will basically be interacting with each other a lot. There will be a lot of sort of PI meetings and things like that. This is meant to be a capability that if I can't solve it, I can point you the three degrees of Kevin Bacon thing. I can point you in the right direction for getting things solved. And so that's what we would be doing in the out years. So I think that's it, I'm done. So, okay, thanks. So I guess we're here to answer your questions for people out now. And how do you wanna run the show, girl? Yes, you think you're done, but we're just getting started. No, thank you very much. That's great. So the first thing we're gonna do, I'm gonna start with a couple of questions. And actually, I'm actually gonna back this up a step. I should have said this at the beginning and I meant to, and you're probably gonna laugh when I ask this question, but I'm gonna ask it anyway. And while I'm talking, Remy, could you go back to the slides and bring up the one that Amy showed with the puzzle pieces, please? I would like to ask you many of you touched on this at some point in the previous stage, but explicitly, if you could define for us what the word cyber infrastructure means and what it is and what it is not. And some of you kind of mentioned this, but I wanna make sure for the sake of the committee that there's no confusion on this before we go into all the other questions. So you can start listing collectively or what does it include and what does it not? Go for it. Take it off, Steve. So I think at a basic level, it's the tools and digital resources that are required for science to happen. And so that there expands a broad range of topics as Amy mentioned from computational facilities, data resources, the networks and standards that you need. There's an additional part of this equation that sort of drove CIF 21, when that's the people part. So you need the people who are enabled to both develop and use those types of resources. Anyone else wanna add to it? No. And I'm gonna open this question up to the committee a moment. Just one piece, one piece. Yeah. Cyber infrastructure extends beyond science, right? That's the only thing I would add. It's not just for science. That's what we're talking about here, but it overlaps into almost every facet of the way you live your life now. The way food is provided to you, the way your water is managed to, you know. Is that Kevin trying to cut? Can you say that again? Are you trying to answer? No? No, I wasn't trying to say anything. Okay. Okay. One follow-up question I have here regards the upper left-hand puzzle piece, which is that of databases. So that's something from Steve mentioned explicitly about geoinformatics. This is not for databases that belongs in your core programs. My understanding personally from EarthCube is that I've also been a message from EarthCube. We're not funding databases, go somewhere else. Would you agree though databases are an important part of cyber infrastructure? And one of the things, at least I can speak to my own personal experience of trying to get databases funded in core programs when you're competing against people wanting to collect new data, really, really difficult. And so I guess I ask this of myself but also of the community, you know, is the answer still go to the core programs for databases? So databases are probably the biggest topic that comes up when PIs ask us about soliciting, about projects that could be, you know, for the geoinformatics solicitation. I would definitely databases are fundable through geoinformatics. What I think the challenge is, and this has been said by several people here, they have to be driven by the science. They have to be driven by a recognized community need and they should result in a, you can't just say I'm gonna build a database and hope they will come, right? I mean, it has to, in order for it to be valuable, it has to have value added rather than just data collection, I think. It's gotta be something that at the end of the day it's going to produce or enable science that wouldn't have been enabled prior to that. And so absolutely databases are a component of it but they're not the end of the game, I think is the way to think about it. Can I add to that too? I agree with this completely. I think that databases are possible in several of these different pots. I think what is not available for geoinformatics is, if you're making a database for your own science, right? Geoinformatics is all about, are you providing a resource that a specific community is going to use to do their science? And similarly, we see databases as part of things that come into CSSI or even into EarthCube. For EarthCube, you're building tools that make them interoperable but they're still supposed to be outward looking to the community and not inward looking to a specific science question or goal. So I just wanted to add something to this discussion. Making your data findable and accessible whether it's a quote unquote database or some other mechanism is super important but databases have a dark side to them as well which is what we call stove piping. And so to a data scientist, most applications that they would like to enable using heterogeneous datasets are usually blocked or locked up by the fact that data sit in databases with archaic structures, with incomplete metadata that describes how the data is structured and what it includes with variations in how variables are described so that critically we need to think about our data in the geosciences as in a system of system approach so that where data actually resides isn't super important but if it resides in a way that an individual user can compile components of the greater quote unquote database that drives all of our research so that it looks like a database. And I'll give you an example in the very early days of geochemical data for igneous rocks or something called PET-TB. PET-TB doesn't exist anymore. It's something that you can pull out of EarthCEM which is a much broader library with a whole bunch of different kinds of information simply by tagging that data as relevant to that workflow or application. And so what NSF needs to be supporting is generic data structures that allow us to put our information in a place that other people can find it in an intelligent way. It doesn't mean a database on this particular application or not. I think we've gotten away from that. Look at things like FigShare which is a generic any scientist can put information from any presentation that they've made in any conference or meeting, pre-publications into a place that has not super rich metadata but at least it's there and people can find it. And so I think to harness the data revolution we need to move away from structured databases that aren't cognizant of the broader ways in which data are used. And so that's my caution about the word databases. Yeah, so I was just gonna ask, are there follow-up questions from the committee on this particular question? Yeah, on this in particular because on the database issue I understand what you're saying but PDB and EarthCEM are a really good example. You know, over the last 50 years we've collected an enormous amount of data and most of those are not gonna have the structures or metadata that are useful. And yet we can't throw them away. We can't reinvent the wheel because now we have, you know, the architecture to deal with it, right? And so I think there has to be a combined effort to maybe fold in this databases and that should be just as fundable and just as important and what's coming down the pipeline. Absolutely, I don't mean to imply that any existing data whether it's in a database or not doesn't have value and shouldn't be preserved. And there are in fact machine learning mechanisms. For instance, there was a geoteep dive was a program funded through EarthCube to use Big Blue, the big IBM supercomputer to read geosciences literature and compile on its own the metadata structures for how data is related to each other. And they started with the paleontological database and I think I've forgotten the exact statistics but after a couple of weeks of reading thousands of papers they found all sorts of data structures. They found errors in the miatoma database that people had typed in things wrong. So this is a place where yes, absolutely machine learning and existing what we call long tail data can be brought to bear on our problems. It's just the question of where is it stored? How is it stored and how is it structured? And I think the database modality is old fashioned. That's all I'm saying. I'm not saying that the data that makes up those databases isn't super important and even more important are the data that don't make it into databases. So that for instance, me as an individual researcher I may decide to put X number of data into pet DB and I keep aside some other stuff that I don't understand it or the analysis didn't go well or whatever. And that shouldn't be siloed away on my hard disk. There are other people for whom the failures or the complicated or the outliers are the data. So that we need to be thinking about the way we make our observations and part of the workflow support I would like to see EarthCube and the foundation support are things that make it transparent to you. So we are sitting down at a mass spectrometer or a micro probe or you're in the field collecting the measurement. The workflow should collect the metadata and the data for you. So that you don't have to sit there or have your graduate student type things in. It's all collected. It's all available. It's all stored somewhere in the cloud. It's all accessible. And that's when we're really gonna start to have rich information that we can start to approach problems that we don't yet even aren't able to approach now. There are other follow-ups on this. Pretty good question, Kate. Thank you. My question is just that other aspects, other geoscientists who work in the government like the USGS have many of these same goals. And for example, they require a release of data into science space. And recently I tried to say, oh, we've got this community EarthCube template with all the data to a USGS collaborator. And they say, well, I'm not sure if I can use that because it has to go in this other thing. So I was curious if the different Earth science related government agencies, it sounds like you're saying the same things about your goals. They want machine readable data, interoperable, et cetera. And yet I don't know that actual practitioners are aware of how to tap into this. What's your sense for, perspective of the goals? I'll give my perspective. I'm not, it's a very good question. And I think we're all sort of struggling. As Ken said, there's this old mode that we had of doing things. And we're moving towards a different, more distributed way of accessing our data. So I'm not sure we're there yet. Different communities work with the USGS at varying levels of success. I think in hydrology, for instance, they've been fairly successful in seismology as well. But as a whole, there isn't a sort of common way that we in EAR work with the USGS. I would say this is where the idea for geocodes could be transformative. Because if everyone can adopt these industry standards in a common way, and EarthCube is trying to make it easy for that to happen, it doesn't matter where the data is. You can search from it from your favorite platform and you may be able to find it in different places. And that opens up the ability for us to look at more creative and sustainable solutions for curation of that data in the future. One of the big differences between NSF and many of the other agencies is that NSF data often belongs to the university that funded that activity. Whereas USGS, NOAA, there's a lot of the NASA Digital Active Archives that that's where the data is supposed to go. And so one of the issues that we often have is that we are funding some crucial data set for the community. But the other major activity at the foundation is research. And do you do a little more data cleaning? Or do you do a new, so again, at some point, funding the data archives can get to be a huge portion of a division or a directorate's budget. So we have that as a trade-off. But I think I would go back to Eva's comment that the other, is it a bug or a feature? Because the National Science Foundation gets to work with the university community. There's 4,200 colleges and universities out there. So in a sense, we have the world's best distributed database that if we include clever, whether it's standards for how to communicate with each other, tools and techniques that allow you to combine data sets, we could be in the vanguard of solving a problem rather than worrying about it. The only other thing that I'll add that I'm sure we're all aware of is different organizations, different governments have different levels of data embargo, right? I mean, there's different levels of access to data. And the USGS has a different governance system for their data than various universities or various sources. And that's something you come up against all the time. I mean, just a very simple level of imagery or flying over in data collection, if you go to a different country in a different area, it's the level of openness is completely different. So this is something you're always fighting with. Just to add to this, so geosciences or sciences, these are international, right? And the US is not the only keeper of information. And for those of you who have done fieldwork in other countries, tried to gather topo maps or LiDAR data sets or seismological information or whatever, it's very different in different countries. And so being aware of how these problems are being addressed in other parts of the world is also important. And this is where information and idea exchange and developing a community of people that are aware of what the problems are is a super important part of enabling this kind of global cyber infrastructure. It can't just be the US. And as what people have emphasized, the difference between NSF and a lot of the other holders of data that within the US government structure is that NSF is much more individual PI focused, which I think we all, all of us were funded by NSF love, right? We love to have our idea and get money. On the other side of things, data science is definitely a team sport, right? It's not an individual PI sport. And so whereas people at the survey are essentially forced into a specific data model because that's what the survey has decided to do, we are to NSF. And so there are pluses and minuses there. And I think some of the minuses that we've learned we have is overcoming this sort of individual PI kind of thinking about what they should or shouldn't be able to do with the information they collect as part of their research. Bill, you wanna follow it? My original question, actually it's the first thing you said, Steve, which is use this term sustainable, but not permanent. And this is a major issue. We are saying we should follow some structure. I'm the director of the Yield River Critical Zones Overture. We have a full-time data manager for five years. We're still trying to agree upon a structure. We're now working with Hydrashare, but we don't know if Hyrashare will be there in a few years. So we're going to exceed, but we don't know if exceed is gonna be there in a few years. So we're putting enormous effort into trying to preserve the data and have no idea whether there's gonna be a future in that preservation. So how do you address this issue when you saw up by saying there isn't a guarantee of sustainability, which I realize is a financial problem and all sorts of stuff. And this is a frontier activity, which you can see it from our side. It's a very costly endeavor. And it's not clear if the end product is that, it will be available in 10 years because the program may get shut down. So I'm gonna partially punt on this question and partially throw it back to you. This is a community problem. And as a community, we have to get together and address this. And I think there are multiple avenues and multiple models of sustainability, which are probably applicable to different communities. My approach and comments about sustainability are selfishly or from a small focus, geoinformatics is a small program. We are not in the position of funding large facilities and perpetuity if we have a flat funding model. We want to absolutely have a component of the program that supports innovation. And so we have to have some sort of, we have to have projects and PIs thinking about ways of continuing if it's a valuable resource, ways of continuing the resource beyond what we can do in our program. Now it might be that you go to different NSF programs, that's a possibility. It might be that you go to other places in the community. I don't have the answer, but I encourage the discussion to continue. A lot of them, we have people in their run facilities that have different types of sustainability models. And it's a discussion that the whole point is to foster that discussion. Unfortunately, I don't have the answer, right? So the widespread theme, this is a new frontier, I realized it's a whole new phenomena that's unlike other things that I was saying, we need, in the end, your goal is to preserve and make it available. Preservation means it's got to be there. And that NSF, I understand it as a dynamic institute that says, well, we're going to change gears and go to this now. How do you, what's the concept NSF in terms of it being there in the future? I would just say a couple of things. It's not a solved problem and we're not the only ones facing it. And so we look to OAC to help us. Yeah, right. I won't say that they've got the answer either because it's not answered at the moment. And yet at the same time, I think the foundation does have a history of supporting, sustaining things for a long time. You have to continue to be evaluated for the science that's coming out of what you're doing. And I would say in some ways, the sustainability track for due informatics helps the PIs or the awardees to reimagine what they're doing to take stock. Are there better methods out there that could be incorporated into what we're doing? I think what we've learned from the past is when you stagnate into a certain model, then it's necessarily going to be harder to change in the future. This is one way to sort of have that change sort of incorporated into how we build the cyber infrastructure for earth sciences. It's not an answer though. Yeah. Yeah, so this is a place where I think the community can help drive the foundation. So that the foundation in my opinion should be supporting cloud storage for all of the data products that it supports into the future without limit, forever. And I know that it's, but that's right. The end, I have this sort of bridge of the Starship Enterprise vision of data science so that I should be able in 300 years to ask about my project and find out how it relates to whatever it is I'm asking about. And this is why I say that the foundation isn't anywhere near being able to support findable, accessible, interoperable, and reproducible. And yet it's so foundational to everything that we're trying to do. And so I think migrating data centers away from using a lot of hard fixed infrastructure where they're storing all their information at their place and constantly updating it. I mean, there needs to be backups obviously somewhere but they're trying to migrate into the cloud and having the foundation support that financially, I don't see any way around it, right? And then cloud storage is super cheap, right? It's a question of the community, the science community saying to the foundation, we're not asking for this, we're telling you that we want it and then having the foundation provide it, right? And if we back off, we say, oh, well, maybe just the important data gets stored or whatever, that's not going to be a good solution. So can I ask a related question for geoinformatics and EarthCube is, what is the timeline of your program? So geoinformatics has been inconsistent, which for PIs is kind of hard to plan to understand when that's going to be offered. You mentioned something about every two years in perpetuity or maybe just definitely this one and the next one. And then for EarthCube, is there a timeline sunset for EarthCube? And if so, what is that and what is the plan for the future of that? So for geoinformatics, it's a competition every two years. I have no idea how long that'll continue, but that's the current plan. EarthCube was on a 10-year cycle. So we're coming to the end of that. We've got three years of the program left, but two funding years. So 19 is the third of that, 20 and 21. At the moment, this is what happens to most initiatives. They've got a term limit on them. The outputs of EarthCube may determine how parts of it get supported at the foundation in the future. It's possible that there's a clear pathway and it could be submitted to something like CSSI or it may be that there's an outcome that's really well aligned with HDR or another initiative that's coming down the pipeline we don't know, but some of it will depend on the outcomes. As part of the end of the outcomes, is there going to be an assessment phase of the program conducted within NSF? I don't think there's a formal assessment that will happen, but certainly we would evaluate the outcomes and probably see where, at what stage it's at. And it's up to our senior management to decide what's the value of continuing these types of activities. And sort of the vision that can laid out, we'd be looking for what's the vision that's coming out of the program too. Yeah, so we're thinking a lot about metrics and there are lots of different ways to measure success and compiling that information is something that we're trying to do for the next couple of years. I think I would look at this a slightly different way, which is to say that the needs that the various programs that were described here attempt to address aren't going away. Those will continue into the future and only become more and more prevalent in our science. And so it's up to committees like this and communities in general to apply pressure to the foundation to say, we want these kinds of activities to continue. Maybe there's things about Earthquake that people don't like, maybe it's a successor program or a different program gets spun up. But that's how we're choosing to look at this, that we're trying to lay a groundwork of standards and practices that will extend well beyond the program that other groups can pick up and continue to use. We need to be in this place where things are interoperable. We need to be in this place where all of our data is discoverable and usable. I don't think most people deny that. How you make it happen, how you make it happen in the foundation, whether it's better to do it at division level or directorate level or across the foundation. I think these are all kind of important questions, but the pressure to make sure that this stuff continues to happen has to come from the kind of researcher community. And I guess I would also suggest that I'm fine with sudden setting. I think competition is our friend. I think that there's an awful lot of archives that have, what is the cost benefit of them? And I think having what the foundation does that it's for a certain number of years and Steve makes a good point is that you know that every few years you're going to be reevaluated. That focuses minds. And I think that has made a big difference at the foundation. I think that the kinds of things that Steve was mentioning that you have this sort of sunsetting question that gives maybe an organization a chance that for whatever reasons, if they came up for renewal at a time that something was going on, that maybe they did another year or two or another round to do things, that that's got a fair amount of equity associated with it. But beyond that, whether it's a science center or we have to allocate our scarce resources, they're not going to get more plentiful in the near future. So how do we get the most benefit out of them? There's some new things that are coming out. That's part of the piece that I get to be in. And some of those won't work. Some of those will be terrible experiments, but that's useful to know too, before you get an entire active archive full of those things. So I think that, again, sunsetting and competition are meaningful and important to all of our communities. Okay, so I know a number of you have questions. Yes? Kevin, are you there? I just have a quick comment to add onto this discussion. It'll be quick. But rather than thinking about these things are they going to go on the programs, going on into perpetuity or words to that effect? At the National Science Foundation, certainly we in the different directorates and the different divisions evaluate the programs on a fairly constant basis. And that's why new solicitations come out. Things evolve rather than they go on into perpetuity. Programs evolve, and that's what happened with the geoinformatics solicitation. That happens, you can name every single one of them. So what we look to the community to provide, and that's certainly one of the impetuses of this course group here, as well as community workshops and other places that Ken has alluded to, is to give us the input so that we can modify. I mean, the technologies as every single panelist has pointed out, the technologies and the state of the art change constantly. And keeping up with it is a challenge, but it's something that we're always going to be responding at NSF as much as we are innovating. We look to the community to innovate and we try to keep up with that. So CORE's working group can actually be of great benefit to us here to help sort of get us back up to, not up to the forefront necessarily, but trying to keep pace with what's going on and letting us know what the various sectors of the community want and need from us. That's all I have to say. Great, thanks. And please jump in whenever you have a comment. So Doug, do you want to ask your question next? Just a quick one. Mindful of the comment that you said this isn't solved yet, but also of Andrea's comment about, try to make sure you've got a level playing field because you're almost creating a disincentive to do exactly what you're trying to do. Would it make sense to have some kind of sort of like a percentage, sort of like a tax or like a percentage that goes to every proposal that's sort of prescriptive, but then either ensures that that data is preserved long-term or goes to a permanent fund of some sort that also goes in that same direction. Here's a technical answer to this, which may not be helpful, but the foundation recently updated its policy so that costs associated with making your data available are allowable within a grant. So you are allowed to put those costs within your grant request, your proposal. The flip side of that is in some ways it's a zero-sum game, right? If you're coming to EAR, our facilities are funded out of the same pot as our core programs, and some of those facilities have the mandate already built into them, that they take the data from awards that are funded out of EAR, but that is definitely a mechanism that could be used for meeting your data management plan. Jim, you wanna go? Yeah, I guess my question is, is that POD and Homeland Security Intelligence Community spends massive amounts of money on cyber, and is there a way or maybe there is a way to interact with that community in the non-classified way that you can take advantage of some of the things they've learned, and maybe that's ongoing, but it seems that they've gotta learn things with all that money they spend that would be useful to the science community. One would hope, that's right. Well, one of the things that has just happened at the foundation is that the head of science for the Department of Homeland Security is now at the foundation, leading the convergence activity that is just gearing up at the foundation, and that convergence activity is again another one of these interdisciplinary activities. Stay tuned because a lot of these things are as our panel season winds down, and solicitation season starts to wind up. You'll start to see more out of Doug and his people. It will probably look DARPA-esque, and for those of you who have worked that world, it's like here. Very fast turnarounds and timelines and metrics, and you make your metrics or you don't. So I think that that process is coming here. That will be a very, as you say, a slightly different approach than the foundation has had in the past, but it is yet another experiment. Again, if you're trying to get the maximum output, is that the way to do it? Michael. This is mainly for Steve, since you, I wanted to first of all make sure that I understood the nature of the sustainability grants and then ask a follow-up. So those grants are essentially providing funding for groups to develop a plan to become independent of NSF funding. Okay, so the one example I know of a group trying to become independent is the consortium between telebiology database and the Atoma where they're trying to start a private, non-profit foundation to keep those enterprises going. But would you, would your program fund a grant to help them make that happen? That's the kind of thing you would do? Okay, with the caveat that I can't say that yes, we would fund something like that, those sorts of initiatives are what we are thinking of in the sustainability track. Yeah. I thought it could be useful for the discussion on sunsetting and sustainability. If Steve, you could tell us a little bit more about the geoinformatics hiatus. You said it was partly to, you know, to align with NSF goals, agency goals. I'd just like to understand a little bit more about why the hiatus was necessary, you know, and is that the kind of model to follow in terms of evolving for the needs of the community? So I'm gonna ask Eva also to pitch in on this because her longevity at the NSF is longer than mine, but it's my understanding that the program was deemed as needing some changes to respond to the community. And you can probably speak to that a little bit more. Yeah, that's exactly it. I think, you know, after it had moved out of the instrumentation and facilities solicitation, it had last been updated in 2011. So after the 2015 call or when the last one was, we felt like it was necessary to revamp that solicitation. You know, that was before like data management plans and things like EarthCube and other things had happened. So it seemed like a good time to revisit that solicitation. And then, you know, it was just a matter of having the time and there was, there's no before Steve and Kevin, you know, stepped up. There wasn't a geoinformatics program director to oversee it. Okay, Paul? I saw a list of products go by early on, but I'd like your opinion on what you think your most successful product has been that has been of use to the general science, especially earth science community. This is very difficult. Because from the geoinformatics list, there are several of those that have been ongoing for quite some time and have really made significant impact in their community. And, you know, they kind of span the range of disciplines. So it's really difficult to say is CSDMS more valuable than CIG, though they do the same thing, right? They provide access to models and codes for their respective communities. And I found, I find that both of them have been sort of transformative for their respective communities. Yeah, I think almost everything that was up there on the list from geoinformatics from a subdisciplinary perspective would have been viewed as being very successful. And this is part of the issue, right? I mean, you want to support these communities and you want to support these initiatives and these resources, but how do you do it for the long term? It's one of the challenges. I figure there's several of these now that are very, very successful. And there are some that are clearly ramping up that have a future that looks like it's going to be successful, things like Strabo spot and other things like that. So you want to make sure you're, you know, facilitating these building initiatives as well. Yeah, another way to frame your question is to think about any individual program at the foundation. So I get funded through hydrology and geochemistry and marine geology and geophysics. And if someone were to say to me, okay, pick the one most useful, most important, most impactful project that's been funded in the last five years by other those programs, I'd have a very difficult time doing it. I mean, of course they're mine, but it's a very hard question. And I think the reason it's difficult to answer is because there have been lots of impactful activities and understanding that some of these activities are laying the groundwork for advances that will be mostly felt by the community in the years to come rather than right now is part of why it's difficult to be able to say for sure that this or that implementation of a particular thing is right now the most impactful thing. And I think thinking about not only what are the needs of the community now but projecting into the future, at least at the five year timeframe, if not decadal, is also really important. And I feel like, you know, at least within EarthCube, we're on the right track there, but the results remain to be seen for a lot of the activities that are ongoing now. So as speaking as an outsider at some sense, rather than picking the most beautiful of your children, what I would like to, there's a string of investments that EarthCube has made in cloud services that they're trying to address the issue of if we're not gonna store it here, if we're not the leaders anymore, there's all this stuff going on out there in the commercial world. So I'm diving into the commercial question. Can we take advantage of that? And so, Tim Ahern had put together sort of an evaluation of cloud resources that were available that I think there was sort of a, Panjio used a Google Cloud, there's, I could get into that in a second, but basically looked at using Google Cloud credits as a way to handle data. So there's the whole idea of, and it is, it's getting away from, if it's not at my university or my organization, is there a way to do this? And I think that's kind of tackling that question of, we need this data, but our university can't, or certainly the foundation can't guarantee it, what are some ways to do it? And there's these incredibly high value industries that are sitting out there, churning out storage facilities everywhere. Would that be a possibility? So there's a whole string of activities that have been funded, and I'll let you dive in a little bit, or I can go and speak to, there's sort of a three step thing that's going on in my world is just, so what's going on in the world of industry? Are we working with Amazon? Are we working with Google? Should I take that little sidebar here for a second? So a couple of years ago, there was a series of big data solicitations that had come out, and at one point they said, hey, and it's a little bit like your question about credits, is that the most recent reiteration of that solicitation was 18-539, but basically what they did is they had four computer storage, cloud organizations, AWS, Google Cloud, IBM, and Microsoft Azure that were participating in the program, and what they were doing is they were willing to provide cloud credits or resources to projects that if you had a question or a problem, could you use it? And they were working together to do that. Most of the people chose AWS and Google, but in any case, that was the first round of that. And again, the Pangeo Ryan Abernathy's award, basically was working on that, is that he had a lot of open-source software tools, and he was trying to tackle the whole petabyte scale. How do you do earth system modeling at these giant levels? And so he had a series of use cases that he was trying to look at, and he was working with the industry on that. So that was something that they did for a couple of years, but there's a couple of problems with that. One is never assume that industry will continue to provide this charity for the rest of eternity. That's not a sustainable model. And another thing that was going on is that when these university grants were coming in to do these things and we're requesting X million dollars or X hundred thousand dollars of Google or Amazon resources, the universities were taking their usual overhead on that. So it's like, is there an alternative? So recently, and so it's still in process and we can't announce anything yet, but there was a recent cloud access solicitation and that one was 19.5.10. And so you can look at the details of that, but basically what that was is, can we designate an organization or some organizations that they can take on rather than NSF, the interaction with industry, the interaction with the community, the training, the getting people up to speed, those kinds of capabilities. And is that a more sustainable model? So that's something that we're on the verge of being able to say, and it's a pilot, it's a pilot so we'll have to see, but it could be because there's some very large and they're competing with each other, large organizations that may wanna stay involved with a wide variety of the science community that is likely to be pumping out large quantities of interesting modeling results to the world is that is that a sustainable model? So that's something that's going on. That's gonna take a while is that if you do set up some organization or organizations to do that, that's gonna take a while to set up and make sure that it's running in a meaningful way. In the meantime, with both Google and Amazon, all the directors at the foundation have memoranda of understanding with those two organizations that they would again, continue to have these sort of cloud credit-like possibilities with organizations. So that's something that we're just, again, experiments, but this one could have, if what you're trying to do is to go to cloud and you know that there's just like huge, many-acre storage facilities that these organizations are building all over the globe, can we take advantage of that in a sustainable way? And what charity of X million dollars of share will fund your grant is not a sustainable way. What is? Don't know the answer to that, but that's I guess the current experiment that we're doing and what is the beautiful child? Okay. So it's already 2.35, so we're gonna try to wrap this up in the next five to 10 minutes. One of the guiding questions that you were asked about that I don't think we've really touched on though is to think about what are the current obstacles for collaboration between different NSF units and is there a way that this committee could help to lower any of those barriers? Does anyone have any comments on that question? I do. Yeah, so the foundation's all about barriers, right? It really is. The way programs are funded, the way they compete with each other for funds and proposal pressure, the way the ideas that are or aren't implemented at the individual program level depend a huge amount on the individual program managers on them. The way that the directives from the foundation upper management that filter down into the programs is provided and the large disconnect at the data science level between what communities, research communities want and the foundation's perspective of what we need. I mean, Amy has said now several times that universities, university researchers should be keeping the data. That's less sustainable than relying on the charity of large corporations like Amazon. And so, and we know that that's not a sustainable model. And so part of, and this is difficult, this is the hardest part, but part of the education effort that needs to happen about data stewardship at the scientist level and at the programmatic level within NSF is at the core of this issue is having everyone on the same page about what the challenges are, what the potential solutions are and how to enable them in a cross-disciplinary way and a foundation that at its core is fundamentally balkanized, right? I mean, like I say, each one of the programs competes with each other and it's not healthy. It makes it much more difficult than some of the other major organizations that fund scientific research on the globe to put together a coherent framework for dealing with data issues, all of them across the foundation. And I don't have an answer, but I know for a fact that it's not going to develop organically out of the foundation, right? Just look at how lame fast lane is, right? For 20 years, we've been dealing with an incredibly weak tool. And the impetus for change, as I say, is not gonna come out of the foundation. There are ways in which once we're all speaking the same language, once we're all recognizing what our issues are and certainly the sustainability of data and data resources into the future is one, but maybe not, it's probably of all of them, it's the easiest one to solve. It just takes money is literally getting people on the same page. And I think that's going to take a significant effort at all sorts of levels, right? And that it's a major challenge and I wouldn't put it all on the foundation. It has to, there are a lot of people out in our user communities that know what they want. They don't know how to achieve it. They don't really even want to learn about what the challenges are or what the existing mechanisms are. And not everyone has to be an expert in data science, but they need to at least be a little bit aware, right? And so that's one of the things that we're hoping to do is bring more education to people about what these challenges are. I would say one of the biggest challenges is workload at the foundation. There's a lot of stuff that goes on. I think there are excellent working relationships across the directorates, especially with OAC and HDR, you know, as Amy said, there's a cohort of program directors that work on that from across the foundation. But it's not really sustainable to manage these types of large scale programs as well as the other programs that we're all responsible for at the same time. Just to provide a little different perspective, I'm the shortest term person at the NSF here. One thing that really struck me when I came to the foundation was how much people do talk to each other and how much interaction there is between programs and how much collaborative funding there is between programs. And I understand where Ken's coming from and that there are definitely some barriers that can be challenges to overcome and to bridge. Maybe it's the better word. And that's something that we have to for sure keep working on and keep working towards. But Ken said something else which I think is important here. And he said that a lot of these initiatives need to come from the community. And that's really what the NSF mission is, is it's not a top-down organization like some other organizations are. It should be driven by the community. And so whatever we can do or perhaps whatever this committee can provide the suggestions of getting better community input into the directions that we should be going and who should be talking to who and how to facilitate that. There are government limitations for sure but those sorts of outside input are what move things forward. Okay, so with an eye on the clock and wrapping up are there any either last questions from the committee or having been through this thoughts from our panelists that you want to include before we finish here? Bill, did you have something to say? Well, so one that came up that was a little I wanted clarification. Something Amy said, which is that I think you said that the universities have the data. That's not the case. It's an individual files of individual professors who might retire with the data and it's gone. If the universities don't have any data and they don't want it. Certainly, it's, I don't recall it a disconnect, right? Because from the NSF's perspective the awardee is the institution, right? And they have all ownership of products. That's also I think a really mess. I don't know what's the best way that we can work better with the institutions to sort of outline responsibilities for these things but it is a disconnect. Directive to how the overhead should be used. So just a quick question for Ken and Eva. You were talking about the EarthCube community of users. How big is that community versus the EAR community or versus the geo community? Like, you know, like, can you count them? So what I can tell you is that we have about 2,500 people that subscribe to our monthly newsletter and vote in our elections. And of that, maybe a tenth of them are hyper-engaged and come to our meetings and participate in governance and so forth. And I don't have any idea about the numbers from, you know, the Earth Sciences, numbers of PIs that are funded and so forth. But it's a pretty big cross-section and it does include people from atmospheric sciences and ocean sciences and polar as well as Earth Sciences. So, but it's pretty big.