 Hadoop Summit in San Jose. Beautiful sunny San Jose, as far as I know, was nice this morning. I haven't been outside in a while, but we're. You're still very nice. Yes, I've got my co-host, Abhimata, filling in. Thank you. From Triseta. A guest co-host? A guest co-host. It was new for me. We're trying new things on theCUBE all the time. And our guest now is Chris Madden from the Jet Propulsion Laboratory at NASA. Very interesting stuff. Nice to meet you guys. Nice to meet you, welcome to theCUBE. It's a painless environment, no worries at all. You haven't hurt me yet. So you were just coming from a talk you gave? Yeah, yeah, I was just coming from talk. Why don't we share a little bit with our audience kind of what you were talking about, and then maybe we can go into some of the use cases, things you're doing at NASA. Yeah, no problem, yeah. So I basically was talking about some of the challenges, like some of our big data scenarios that we have at NASA, like for example, the Square Kilometer Array, which is an international project, the next generation radio astronomy instrument that's going to generate about 700 terabytes of data per second. Oh my God. Which we really don't understand. Say that again, 700 terabytes per second. Yep, per second. So that's 15 times more than the large Hadron Collider. Yeah, and your math is better than mine, so, because you came up with that on the fly. Yeah, absolutely, it's the largest, I mean, we don't know how to even deal with that. There's got to be a decades of research to support that. So I was talking about projects like that, the US National Climate Assessment, a lot of the work that we're doing related to that, and just trying to motivate the development of technologies and the use of technologies like Hadoop and other things. So can you talk about, and Jeff knows this, we've seen Hadoop come from the web, this college industry project six years ago, 2,000 people behind us now saying no longer college industry, this is a real industry. How does it change the game for you? You have a lot of, what were you doing before? And what can you do now that you could never do? Yeah, so I mean, what we were doing before, well, what I was doing was helping be one of the guys that helped make Hadoop. I was one of the original Nutch committers working with that cutting. Oh wow, okay. Yeah, in the open source world. Well thank you for it. Before Hadoop got spun out, but yeah, what we were doing before was basically building ad hoc systems, now having people haul a lot of data around, crunch on it locally, instead of doing it in kind of a distributed environment and stuff. Hadoop, I think originally it was very difficult to use and install, but it was awesome, only the smartest people in the world could use it. I think Hadoop is really moving into a generation where a lot of people can deploy it, people are getting a lot of value and data and just insight out of it, and just anybody in the organization is being able to do that. And then that's got to be where Hadoop evolves to, and it's something I really, I think it's awesome, you know? So you know, obviously NASA has a rich heritage of, you know, in the technology world, developing technologies that eventually kind of make their way out into the wider world. So talk a little bit about some of the experiences you've had and how they would relate to some of our audience, who maybe, you know, they're not at NASA, they're at, you know, a little less sophisticated organization. Exciting job, yeah. You guys are giving us too much credit. No, for us, like, I bet you it's pretty similar even to business or to, you know, other IT industries. You know, for us it's just different data file formats and it's different things that are generating the data. You know, a lot of people or a lot of companies like Yahoo or whatever, the data's being generated by users, like hundreds of millions of users that do clicks or whatever. But you can think of that in a way as something that is being observed and then some derived analysis that happens from that. For us, the thing that's doing the observing a lot of times at least at JPL and at NASA are remote sensing instruments that look at the Earth, that try and measure different parameters. You know, they're looking at surface albedo, you know, reflectance and so, and they're looking at the measurement of, or of snow in a particular pixel of the Earth. You know, and then it's basically, the things that are different are the data file formats, the, you know, the tools that operate on them. A lot of people here in the Hadoop community are, you know, there are users, they're Python users. We're not as sophisticated a lot within, you know, NASA, because a lot of the people that write our algorithms and our codes to crunch on the data, they're scientists that aren't trained programmers. You know, they pick up a little bit of programming, you know, along the way. And so that's, those are the main sort of differences. The differences are in kind of the variety, the velocity, the typical big data, you know, things, but also just in the nuts and bolts. But in the end, it's the same thing. It's generating data, generating, you know, in our case, files or records, processing them, getting some insight. Our insight go into policy makers, they go into science research, they go into monetary decisions and things like that, but. The team at, I have a really good question for you. The team at Vicky Barn and Silicon Island have done a great job on CUBE. We see that the emergence of what we're calling this predictive analytics movement. That it's not about what has happened, but being able to predict what can happen, what can you do and bring with the data. As I'm assuming you at NASA with all the information do a lot of predictive analytics, but at the point you just made, around it's tough to find the skill set. Scientists are not programmers or vice versa. At least not always. A corollary to that is the open data movement. Which is as you're collecting several terabytes a second, I still can't get over my head. Several terabytes a second of information. Are you looking at open sourcing the data itself and inviting the smartest brains from the world to contribute to the intelligence you're seeking to find in that data? What do you think? I think so, I mean NASA's data, technically all of NASA's data is public, it's science research data. So all of the data produced by NASA or science missions by planetary, it's all eventually made public. For us, when it's not public typically, it's because it's protected ITAR information or it's just something that people wouldn't be able to crunch out in the open anyways. But all of NASA's science research data is public. The difficulty a lot, what we see is that a lot of that data is specific to the instrument. You have to understand a lot about it. But as far as the data sets, all of the time people pick up NASA data and they figure out how to crunch on it. It's just not, I don't know, I can't explain why people haven't just picked it all up and said, oh, let's put all this onto Amazon and start dealing with it. But a lot of it has to do with politics and so forth and also just making sure the data stays where it was originally generated because a lot of the science expertise is there. Okay, interesting. So for us laymen, could you maybe give us a couple examples of where big data analytics has really had an impact, give us the end result. What are some of the insights maybe you've gleaned or some of the analytics you've done that are really a lay person, a non-NASA person could actually understand? Sure, so I mean I'll give you one example by a guy named Dr. Tom Painter who's one of my inspirations at JPL. He's a snow hydrologist. Basically what he's working with the Bureau of Reclamation he's working with the water managers in the Western US and he's trying to figure out how to generate more accurate measurements of snow melt and snow pack so that we can get better measurements of water so that we understand that water managers understand how much water to release for the coming season or just people know what to predict in terms of recreation and parks. Like parks and recreation is a $10 million industry in Colorado and the Western US just people going and skiing and things like that. And so if we don't have snow around people can't do that. And a lot of people that are in the money-making profit stream for that hurt. So being able to generate more accurate measurements of snow and being able to use data systems and big data and analytics for that is just something where I think we're seeing a lot of value on. It's contributing to the US National Climate Assessment. You know it's going into all of the climate reports and so forth so that's something that's really fresh in our minds. So you think to be a little bold that big data can help solve global warming? It's a big data problem? I absolutely think it's a big data problem and I will be bold and I think it can do that. I think basically most of the work for running climate models is shipping data around, shipping computation around, all the types of things. And then making, like you said, your predictive analytics was really great because a lot of NASA research is retrospective. I'm trying to learn a model from what has already existed. And people are kind of gun-shy to make predictions a lot of times because especially when they deal with policy and money and things like that. But I think I'll just make a statement. I think that that's a big data problem and that's what we need to get there. I'm totally, yeah. I think the predictive analytics is really where is really the next step rather than just looking back and you know this as well as anybody, obviously. All right, great. Well, thanks so much, Chris for coming on. Really appreciate it. Thanks guys. Inside theCUBE. We will be right back in a moment with our next guest.