 Thanks for having me. Yes, I am honored to be known as actually first ever chief data officer, which means I'm a scientist and a beer crepe. And it's a position that I started last year. It's new in government to adopt these kinds of structures. So it's a little bit different than maybe some of your other speakers. It's a little interest type stuff here. Again, a science background, use technology to accomplish the science. Gosh, I started using UNIX probably back in 1987, pre-Linux. I actually started using it. This will date myself right here. On a Vax VMS machine using a UNIX emulator called UNIS. Some of you out there will remember that God-forsaken product. But it got me into the right groove. And I had a colleague back at the time who had told me, look, as an earth scientist, people are going to tell you to learn Fortran and learn VMS. And he said, no, no, learn UNIX and learn C. And those were, that was great advice. Yes. So how many of you have heard of NOAA before as a government agency? That is wonderful. OK. Our brand recognition isn't quite as good as NASA's. But, and we've done the studies just point that out. So that's OK. But we have an operational mission. It's science-based like NASA's. But NASA's mission is research. Our mission is operations to get the quality information to the American people that they need to run their businesses, to protect their lives and property, to manage their water resources, to manage their ocean resources. And we take this mission very, very seriously. We've got 12,000 people that are working this mission right now. About 7,000 of them are scientists, federal employees. And in our mission statement, I've circled it here, our mission is to share that information. Right? So that's very important. We don't hoard the information. We try to get it out to others. And we've been doing big data since before the term was coined. Right? So part and parcel to doing oceanography, as I am I'm an oceanographer. Our meteorology is you're dealing with large quantities of observations that you're trying to glean some kind of insight about how the environment works. Right? And it's inherently a difficult problem. We've got way too much data. We've got about 30 petabytes right now that's in our archives. We have another about 200 petabytes of data that's sort of in our working data store within our organization. It's an enormous amount of data. We have a hard time moving it around. We have a hard time managing it. Surprise. If you look to see how popular this data is, and this is a site you can go to. It's called analytics.usa.gov. And the NOAA websites and data sources are usually among the most popular. It's usually in the top three or top five. This is just from a snapshot from last week of the last 30 days. So we usually get beat out at this time of year by the IRS. Everyone wants to go. There's a site called where's my refund. That's usually number one. The post office is up there near the top two. But right up there, forecast at weather.gov and weather.gov is always near the top. People really need to consume this information. And if you look, there's another site called data.gov. And be careful if you go there because it's really hard to use admittedly. But it's the first step the government's taking to try to get our data stores out into the public consciousness. So if you go there now, you'll see about 70,000 data sets that are publicly available. Okay. And that's actually not all of them, but it's most of them. And those of you that can read this up close enough, you can see part of the problem that we have. If anybody can actually understand these descriptions of our data sets, if you're not actually an oceanographer or meteorologist, good luck to you. But so we have this problem with translating our data, making it actually useful to people who are smart, want to be able to use this stuff, but maybe aren't subject matter experts. But we've got all this data about weather, climate, ocean, coast, fisheries, ecosystems. They're almost all open. If you ask for them, you get them. Okay. And most of them are accessible on the internet today if you know where to look and if you know who to ask to translate it for you. So getting to the openness for open source and open data, no other big consumer of open source, of course, but talk mainly about open data today. So we have a, it's built into our organization to share, share, share. We share everything openly. I think a lot of this comes out, in fact, we're scientists. We come out of the academic science community where sharing has always been central to the success of the science, right? So we've brought that with us from the 80s and 90s into our organization. Our organization was created in the early 1970s. And so today, you still have a reflection of that culture, that open culture, into an operational government agency. And it's a very interesting culture that we have. So on the worldwide stage, NOAA has been a leader for the entire United States in trying to broker data sharing among countries. And one of the most successful ones has been through United Nations, an organization called World Meteorological Organization, WMO. And after several, about 10 years of arguing about this and negotiating this, there is an international treaty today that says environmental data for the protection of lives and property for the good of society will be shared by all countries as openly as possible. And this is one of our foundational concepts that's built into our organization. And we have been successful in getting other nations to follow suit. Within the weather industry and inside the United States, there's always been this tension between commercial interests that are exploiting this data for information products and creating jobs. We are part of the U.S. Department of Commerce, so we love it when companies come and take care of data and create jobs and resell the data or make new information products. That's what we want to see happen. But that fine line between what the government should do and what industry should do has always been a little contentious, a little bit of tension there. And that's good. Way back when, when I, my first job with NOAA was back in 1996 and I had created, I worked at the National Data Buoy Center which has buoy data, wave data. If you're a surfer, you're familiar with the website. I got there, there was an availability of the data. There's an old HP Unix machine sitting around. I said, well, I'll throw Apache on it, I'll throw all the data on there. Now I threw GNU plot on there, which is still working there today, which is, there's a backstory to that too probably. But the idea was, okay, now you can get all the data and you can see graphs of it. Well, true story, the next day a phone call came in from Washington and said, take the website down. Somebody had called their congressman and complained that this government website was, you know, overstepping its bounds and taking away a business opportunity for somebody else. Okay. So this kind of, it's just, you know, cooler heads prevailed after a couple of days. Site came back up, it's been up ever since. But this is, you know, just demonstrative of the kind of tension, healthy tension that exists between government and industry over who's providing the data to who. But NOAA has responded to recommendations from the National Academy of Sciences and others to really define what this partnership model really looks like. And built into this partnership policy is, okay, we're committed to the mission, we're committed to consulting with industry, if we're going to change our data systems or change our observations, we'll let everybody know ahead of time. We disseminate this information openly as much as we can. Equity, fair and equitable access, if you have access, you get the same kind of access. We don't play favorites. And we recognize that industry is an important partner in the delivery of information products to the American public. So one of the things that we do, I like to change coming up is that we've tended to default to making our products accessible in the public domain. We don't license our data, we don't license our software, but we'll give it out to anybody who wants it with no restrictions on use. And so interestingly, that actually restricts our ability to distribute some products that have shared IP. We have partners in academia and industry that have contributed to the development of these products. And sometimes if it's shared that we can't actually distribute in the public domain. So by actually adopting some modern licensing practices, I think we could actually share even more information with the public. And I know that the Linux Foundation has made progress on the Community Data License Agreement. This is one of the things I'd like to possibly consider adopting for our organization. What's the value of this data? So this is just from last year, 2017, how many billion-dollar weather events and climate events there were in the country familiar with a lot of these, including the Hurricanes season. One of the great success stories about getting data out to the public and adopting some of the new modern technologies was related to Hurricane Irma, as you remember, hit Florida. As you could imagine, there were a lot of American citizens that were hungry for information and were hitting the NOAA websites very hard and data sites very hard. Typically, we have a hard time keeping up with that kind of demand. In this case, the National Hurricane Center, which is part of our Weather Service part of NOAA, we took their website, we put it on different infrastructure, we put Amazon Cloudfront in front of it, and we were able to handle over a billion hits a day during this peak hurricane season. Service never went down. People got the information they needed and were able to get out of harm's way. So it's a great success story, but we are still, as our Home Depot colleagues were saying, you're just starting to get into how to adopt some of these more modern technologies to do our job better. The value of this data to the economy, the ocean economy, this usually comes as shocked to a lot of people who don't realize how many hundreds of billions of dollars are writing on the ocean economy. This includes fisheries, this includes ocean transportation, includes oil and gas. It is a very, very big industry and it's continuing to grow and it's projected to grow over the next 20 years. The weather enterprise, similarly, as we call it, is continuing to grow as well. It should be about $10 billion a year right now. This is the value of the products and services that the commercial industries are selling. And two years ago, for example, IBM bought the weather company. It was the parent company of the weather channel and stuff. They bought that company for $2 billion. That was a thing that really caused a lot of us and know how to sit up and say, wow, this is even more valuable than we thought because the weather company at the time was about 200 people and a little bit of infrastructure. It's a significant investment but IBM realized weather touches every part of their company, every part of their services. Industry knows the value. So what we're trying to do now, we're doing an experiment. We're trying to say, can we find a way to leverage that value that's inherent in our data products? Can we leverage that to make the data available to even a wider group of people and make it more easily understood? Because there are two big challenges. Technical is how do we get the data to you and then the second is, well, how do we get you to understand the data? Those are both big challenges. And so we're doing something that we call the big data project right now. I'll describe for you. So to sum up the problem, so this is, if I show you any of our graphs of our data volume, they all look like this. They're all exponentially growing. This happens to be the archive down in Asheville, North Carolina, where I live. And if you look at how much data is actually going out, so that was the storage, 30 petabytes right now. This is how much is going out. You'll see the same kind of exponential growth. More and more people are trying to pull more and more data. Our systems are all throttle because we have to protect them from overuse. So we're getting hammered every day by a lot of companies that want the data and we have to restrict it because we have to have that fair and equitable access and we can't let the resource be restricted to only to a few people. We have to make sure everybody's got an equal chance to gain the data. And as you see, it's becoming a harder and harder job to do it. So in general, our demand is growing exponentially. Our costs are going up because we need more servers. We need more network. So it's growing proportionally and continuously, which is a problem because our budgets look like this and that's kind of optimistic, right? So it's not a great budget environment right now. That's okay. So what we're trying to do is we're trying to say, let's take this foundation of the data and the expertise that NOAA has. Let's partner with industry and have them help facilitate the delivery of data to third parties, other companies, the public, and any of the information consumers that we have. And Jim showed some figures with gears yesterday and was like, oh great. This is fantastic. I've got gears in my thing too. But it really is true. It is a system, interlock system and NOAA's expertise there in the oil can. None of this works unless there's expertise there that's being supplied to actually help people understand what the data are. So what we did is we signed what we call Cooperative Research and Development Agreements or CRATOS with five companies with Amazon, Microsoft, IBM, Google, and a nonprofit out of the University of Chicago called the Open Commons Consortium. And the agreement is that the NOAA data are going to remain free and open. They can't be sold, but they will host the data at no cost to the taxpayer and they are allowed to monetize services around those data. So make the data more easily consumed or whatever. They can actually charge for that, but they can't actually sell the data. So you can go to these sites. I'll give you some URLs in a second. You can download the data for free if you want it to, but they would like you to use them in place and use their tools to analyze the data. Use build products on their system. So it's a three year actually. We're going into four year. We're doing an extension right now. So we're trying to see whether this business experiment can actually work. So what we're doing here, and this is schematic, is we're reaching into the NOAA government infrastructure. We're reaching and we're taking out one copy of the data. So typically we distribute thousands of copies of the same data to everybody that wants it from our federal systems right now. So instead, let's take one copy out. Let's move it out to... We're actually using a data broker right now an academic partner that's helping us split this data and figure out how to drop it into all of our collaborators' cloud platforms and they turn it around and serve many consumers from that. We went from a one to many to a one to a few to many model of distribution, if that makes sense. Here's the URLs right now that you can get to. As I was telling Mark, Azure still hasn't released any public-facing services to date, which is the right within the agreement. Don't mean to leave them out, but just nothing to offer right now. But I'll show you some examples of these other ones in a second too. Everybody's working hard on this too. So as an example, so weather radar data, a lot of you are familiar with seeing weather radar data on the evening news or on your app on your phone. This was the first data set to move within this project, went to multiple of our collaborators. What we've seen on AWS alone is that we've seen an improvement of over two times the amount of people who are actually using the data. We've seen a 50% reduction on the NOAA servers. Okay, that's a good sign too. And we've seen other uses of the data pop up that we didn't expect. So that's what we're hoping to see. But we can't predict these things. So if you're a meteorologist and birds are flying by your radar set, right, this is noise and you remove that from your signal. But if you're actually a bird migration biologist, that's actually what you're looking to see, right? So some of the first folks to come in to compute heavily on the AWS actually were the bird migration scientists who were able to track birds from this 20-year time series across the United States. And now, people that have come looking for weather radar data, we call next-rad weather radar data, 80% of those users now are going to AWS to get the data because it's faster and easier for them to do so. Google has loaded a lot of our climate data into BigQuery. And we've seen petabytes of data that this data sets like terabyte size and they've been able to move petabytes of this data just in a few months just because the data now has been loaded into a tool people are already using and it reduces that obstacle of understanding. You don't have to understand a scientific data format. You can go right into BigQuery. You can join on the data. You can do analyses and people are very interesting group there. The OCC, the Open Commons Consortium is using Jupyter Notebooks and putting basically the recipe for how to use the data out next to the data. And I don't think this is not going to play. There's actually a video. So Tampa Bay Times after the Hurricane Irma event, a couple of these reporters actually wanted to tell a story about the evacuation of Florida and how close Florida came to really almost like a worst-case scenario of the storm coming up the east coast where the majority of the population is. And they were able to use the satellite imagery and the code provided on the OCC platform to create a video storyboard for their newspaper. And these are not scientists. These are people who just, hey, we have an interest. We need to put some context. They reached out. They got the real data and able to integrate it. It's a great example for how this kind of stuff can work. IBM's created a whole bunch of new APIs that can tie into and ties into their Bluemix platform. I'm running out of time. So I'm moving ahead a little quickly here. And one of the problems that we're encountering is, so people trust NOAA's data today because they get it from a NOAA data service. But now we're asking something else. We're asking them to trust the NOAA data that exists outside the federal system on a partner system, right? And I think we're still a little bit ahead of the problem. But going back to the Irma example again, this was a fake forecast that showed up on Facebook, okay? And now it's got the NOAA logo on there. It is somebody I guess meant it as a joke. The National Weather Service does not think this is very funny, right? They do not play around with lives and property. And they were very concerned about this because there's no way to validate the forecast against the actual NOAA data. We don't have a mechanism yet in place where it could easily be done. But very quickly this thing spread like wildfire and it was supposed to be a joke that Irma's going to hit Houston again or something like that. But this definitely caught NOAA leadership attention and in general with this big data project that I'm talking about where we are seeking ways of making sure that the NOAA brand, the trust that comes with the NOAA brand is conveyed with the data. And so people can understand when they're using the data it's the real thing that they can have trust in. So in summary, so we have a very proud history of this open data leadership continuing on that path and we're trying to see how we can amplify that. Can we even take that to greater levels of making this very valuable data more useful to any of you out there that has an idea of how you can use NOAA data if it's being made available through these modern cloud platforms? Does that make it easier for you to actually create information products of value to you or to your company and to your customers? And of course we're also looking for other ways of just doing our business better and we are seeking a way, like I said, right now by definition this is an experiment for a defined amount of time. We're looking to see if this makes sense with our partners like Azure and Google and Amazon. Hey, can we actually do this sustainably going forward and do this as a matter of part of our normal operations and we'll know that probably by early next year. All right, well thank you very much for your time.