 Hi everyone and welcome to the Big Data Deep Dive with the Cube on EMC-TV. I'm Richard Schlesinger and I'm here with Tech Industry Entrepreneur and Wikibon Analyst, Dave Vellante and Silicon Angle CEO and Editor-in-Chief, John Furrier. So welcome to you guys. I have a prediction to make that this segment will be of great interest to our audience because we're talking about predicting the future and that really is the goal, right? That's what Big Data, Strip Away, all the other stuff, that's what Big Data is about, right? Yeah and we talk a lot about this being the tip of the iceberg. We're now at a point, Richard, where we can predict the future and now it becomes a set of number of times, we're in the early innings. Now it becomes a matter of, okay, what do we do with that information? How do we mobilize and take action in an anticipatory way? And how big of an issue is that in the community? It's the number one thing we're seeing right now. We're seeing analytics as the hottest trend in tech right now and underneath that, if you think of it as a car under the hood is a lot of tech geek stuff going on. But what analytics really means is that you can get real time information at the right time delivered to people. So whether it's predicting the landing of a hurricane could have saved New Orleans. I mean this is the kind of impact that Big Data can provide. Well because if you think about it, I mean sifting through mountains of data on all sorts of things can help predict economic, social or even natural trends. But if we were able to predict natural disasters, right, if we were able to predict earthquakes for instance, that now we're talking about not just making money, we're talking about saving lives and that's huge. So look at this story that EMC-TV did a little while back about a group of scientists who were trying to predict volcanic eruptions. The destructive force of mother nature. Deadly. Indifferent. Overwhelming. But these natural disasters can generate massive quantities of data. Data we can use to mount our first line of defense. This is Popocatepital. Located just 40 miles southeast of Mexico City, it's one of North America's most active volcanoes. Volcano expert Carlos Gutierrez Martinez is alarmed by Popocatepital's recent spike in seismic activity. This is a fragment thrown away from the volcano during a minor explosion. He warns that this volcano could be building up to a major eruption. If this volcano explodes, it can scatter all those debris and fragments up to 30 miles to the south. For André Castillo Castro and his family, a major eruption could mean total annihilation. They live in Santiago Salicimpla, one of several small villages near the base of the volcano, right in the path of destruction. In the case of a big explosion from the volcano, it could have something similar to the San Helo volcano. Three plastic pandan pizza. Despite the danger, André says most villagers won't uproot their families or abandon their farms until they're ordered to evacuate. That means thousands of lives could depend on Mexican authorities knowing when this volcano might be ready to blow. This is the National Center for Disaster Prevention, or CENAPRED, in Mexico City. What do you tell me? It's their job to inform the government when an evacuation may be necessary. Researchers here rely on a constant stream of data to tell them if events like this are just a harmless release of pressure, or the beginnings of a catastrophic volcanic eruption. If we see all the bars going up, it means that there's a lot of seismic energy being released. The volcano is really trembling a lot. CENAPRED works with the National Autonomous University of Mexico, or UNAM, to operate a network of remote sensors and digital cameras on and around the volcano. We have collected a lot of information related with the seismicity, the ash emissions, the explosions. The remote sensors feed researchers thousands of data points per second, enabling 24-7 data analysis that can sense a pending eruption in real time. Now we can look at much more information. We could pull it out right away and we could do a lot of analysis. With all the data information that we have right now, we can make definitely better decisions. Fortunately for the residents of Santiago Salicintla, any decision to evacuate will be driven by data. But at least for now, researchers say, it's safe for families like the Castillo-Castros to get on with their lives near the base of the volcano. So the best part about the work that they're doing down there in Mexico is that they're not only talking about predicting individual volcanic eruptions, but they're gathering a database of the behavior of volcanoes that could be used in further research. So there's a lot going on there. There are major breakthroughs that are possible in that science. And I guess there are also major breakthroughs. We talked a little bit about outer space, but you guys don't only talk to data scientists, you talk to rocket scientists. And you've done that recently, so tell me about that. Well, we've also recently saw the actual last shuttle with NASA fly over Silicon Valley. And this great, great fun ditch went out in front of the Google Plus with all the other geeks in Silicon Valley. It didn't bother them that that was 1960s technology. It was just like, you know, watching historic, the final, like, victory lap of NASA. And then NASA prepares to go to the next level. And I think Dave, that was exciting to talk to all the people we talked with, NASA. Yeah, well, we talked to NASA in the Cube. Chris Mattman, who's a computer scientist at NASA and JPL at Jet Propulsion Labs, talks about how they essentially are capturing the universe and what else they're doing with big data around weather and other instrumentation. So take a look at this. Welcome back to the Cube. We're here at Hadoop Summit in San Jose. Beautiful sunny San Jose, as far as I know. It was nice this morning. I haven't been outside in a while, but... It was still very nice. Yes, I've got my co-host, Abhi Mada, filling in. Thank you. From Treseda. A guest co-host? A guest co-host. We're trying new things in the Cube all the time. And our guest now is Chris Mattman from the Jet Propulsion Laboratory at NASA. Very interesting stuff. Nice to meet you guys. Nice to meet you. Welcome to the Cube. It's a painless environment. No worries at all. You haven't hurt me yet. So you were just coming from a talk you gave? Yeah, yeah. Why don't we share a little bit with our audience what you were talking about? And maybe we can go into some of the use cases, things you're doing at NASA. Yeah, no problem. Yeah, so I basically was talking about some of the challenges, like some of our big data scenarios that we have at NASA, like for example the Square Kilometer Array, which is an international project, the next generation radio astronomy instrument that's going to generate about 700 terabytes of data per second. Oh my God. Which we really don't understand. That's still, again, 700 terabytes per second. Yup, per second. That's 15 times more than the large Hadron collider. Yeah, and your math is better than mine, because you came up with that on the fly. Yeah, absolutely. It's the largest, I mean, we don't know how to even deal with that. There's got to be a decade of research to support that. So I was talking about projects like that, the U.S. National Climate Assessment, a lot of the work that we're doing related to that, and just trying to motivate the development of technologies and the use of technologies like Hadoop and other things. So obviously NASA has a rich heritage in the technology world, developing technologies that eventually kind of make their way out into the wider world. So talk a little bit about some of the experiences you've had and how they would relate to some of our audience, who maybe, you know, they're not at NASA, they're at a little less sophisticated organization. Exciting job, yeah. You guys are giving us too much credit. You know, for us, I bet you it's pretty similar even to business or to other IT industries. You know, for us it's just different data file formats and it's different things that are generating the data. You know, a lot of people, a lot of companies like Yahoo or whatever, the data is being generated by users, like hundreds of millions of users that do clicks or whatever. But you can think of that in a way as something that is being observed and then some derived analysis that happens from that. For us, the thing that's doing the observing, a lot of times at least at JPL and at NASA are remote sensing instruments that look at the Earth that try and measure different parameters. You know, they're looking at surface albedo, you know, reflectance and so, and they're looking at the measurement of snow in a particular pixel of the Earth. You know, and then it's basically, the things that are different are the data file formats, the, you know, the tools that operate on them. A lot of people here in the Hadoop community are, you know, there are users, they're Python users. We're not as sophisticated a lot within, you know, NASA because a lot of the people that write our algorithms and our codes to crunch on the data, they're scientists that aren't trained programmers. You know, they pick up a little bit of programming, you know, along the way. And so that's, those are the main sort of differences. The differences are in kind of the variety, the velocity, the typical big data, you know, things, but also just in the nuts and bolts. But in the end, it's the same thing. It's generating data, generating, you know, in our case files or records, processing them, getting some insight. Our insight go into policy makers, they go into science research, they go into monetary decisions and things like that, but. So for, you know, for us laymen, can you maybe give us a couple of examples of where big data analytics has really had a, you know, an impact. Kind of give us the end result. What are some of the insights maybe you've gleaned or some of the, you know, analytics you've done that are really, you know, a lay person, a non-NASA person could actually understand. Sure. So I mean, I give you one example by a guy named Dr. Tom Painter who's one of my inspirations at JPL. He's a snow hydrologist. He's working with the Bureau of Reclamation. He's working with the water managers in the western U.S. and he's trying to figure out how to generate more accurate measurements of a snowmelt and snowpack so that we can get better measurements of water so that we understand that water managers understand how much water to release for the coming season or just people know what to predict in terms of recreation and parks. Like parks and recreation is a $10 million industry, you know, in Colorado and the western U.S. just people going and skiing and things like that. And so if we don't have snow around, people can't do that. And a lot of people that are in the money-making profit stream for that hurt. So being able to generate more accurate measurements of snow and being able to use data systems and big data and analytics for that is just something where I think we're seeing a lot of value on. It's contributing to the U.S. National Climate Assessment. Wow. You know, it's going into all of the climate reports and so forth. So that's something that's really fresh in our mind. So you think to be a little bold that big data can help solve global warming? It's a big data problem? I absolutely think it's a big data problem and I will be bold and I think it can do that. I think basically most of the work for running climate models is shipping data around, shipping computation around all the types of things and then making, like you said, your predictive analytics was really great because a lot of NASA research is retrospective. Interesting. Trying to learn a model from what is already existing. And people are kind of gun-shy to make predictions a lot of times because especially when they deal with policy and money and things like that. I think, I'll just make a statement, I think that that's a big data problem and that's what we need to get there. Absolutely. I think the predictive analytics is really the next really the next step rather than just looking back and you know this as well as anybody obviously. Alright, great. Well, thanks so much. Chris for coming on, really appreciate it. Thanks guys. Inside the Cube. We will be right back in a moment with our next guest. So, you know, listening to this guy Mr. Mattman or Dr. Mattman probably I'm wondering what's the relationship in your world between the data scientists and the other kinds of scientists the rocket scientists, the volcanic scientists. Is there a close relationship? Well, increasingly those two worlds are coming together. You're seeing, you know, data hackers and statisticians and business analysts essentially become data scientists. What's interesting about that segment to me is that you've got a situation where you're taking a lot of data but you're making inference out of that data and the old world prediction was about building models and those models were you know, the be all end all whatever the model said was the answer and now it's about taking sort of fuzzy data and drawing inference from that and that's different. But also the real-time nature of the big data thing would make it hard to have, you know, models that you don't change because things change. Yeah, we're seeing a lot of things going on with that one is on the personnel side it's going from PhD, math, geek because there's a lot of math and science involved and a lot of this big data stuff too more the general purpose, common person, analyst type or any employee or any individual for that matter we show the examples there. The trend is clearly moving towards abstracting away the complexities so that it's easy to use so that anyone can be a data scientist and add value. Because that's how change comes, right? You have to put the power of the data in the hands of somebody who knows that field, whatever the field is. Right now, what's happening right now and that's exciting and that's the most exciting thing and we've seen this before with Intel they made processors, you know, faster but no one really knew how a processor worked it just worked. That's what's going on with big data and you're seeing that happen fast. And there's a big discussion around domain expertise can the data essentially replace the domain expertise right now or in a world where you can't take the human out of the equation but there's conversation about eventually will you be able to do that? That's another very scary thought. You know, there's a lot of work going on in climate research as Matt talked about. We found this company, there's a company doing some really interesting stuff on climate research called 3-tier that is using, I mean, really decades of data to try to help companies make investments and make decisions and what kind of energy production to invest in and how the climate affects those decisions and we did a little story on them so take a look at this. Renewable energy is generally driven by natural phenomena like sunshine, wind, rain they show up when another nature decides to deliver them and not when you necessarily need them. It's very challenging to commit to many, many millions of dollars of an energy portfolio without really understanding the risk associated with having a less than normal windy year or having more sun than average. 3-tier provides our customers with a way to manage the risk around their wind, solar and hydropower projects. We do data everywhere. We have 40 years of climate data from the entire globe. So every corner of the globe, wherever you might imagine even putting a wind farm or a solar plant. We distill all of that data down to the critical decisions that the customers need to make. Do I buy now? Do I sell now? Do I invest in this? A yes now answer. For us as a company, even though we know that at the end of the day we may be sending out something that is a few thousand bytes to the client or the data to get there. The data really piles up. Our super competing cluster is typically putting out about a terabyte a day of data that we actually retain and keep. The MCI Salon system provides us with reliability, scalability and performance that gives me access to everything I need out of a storage system. We have this long laundry list of people that would like to say we'll include that someday but increasingly that day is like today or tomorrow. Mother Nature has an infinite supply of data for us and to me the more we can get our hands on that data and make use of it the more we can work with Mother Nature in terms of providing a sustainable future. One nice thing is of course 3 tier is a pretty long time and a lot of the hardware so it's good to see that they're doing well and they're talking about stuff that's hot now. They're talking about renewable energy hydro and wind you saw in that. So it strikes me that climate is a really fertile ground for this type of research this type of science. What else are you guys hearing about what's going on in that universe? You mentioned it. The renewable energy and the space exploration I think that's all going to come together and some of the smart people think that's part of the all one equation but what's out in the fringe out right now I call the lunatic fringe out where it's really happening is agricultural investments so there's a new trend in Silicon Valley that's getting a lot of traction recently and that's in heavy duty tech investments in agriculture. To help with farming making clean food clean energy but a lot of advancements on agriculture, farmland and using technology to manage soils manage crops all kinds of new get more out of your farmland. It's pretty big business billions of dollars being invested every year in this area in agriculture. Lots more to talk about. I wish we had more time but I really appreciate as always you guys stopping by talking about the agricultural thing it'll be something to watch I would think because that's another field part of the expression fertile area for investment. So thanks again to both of you guys for stopping by and giving us your insight and your really deep knowledge in all this stuff we of course have more installments of the big data deep dive coming up so stay tuned to the conversation with my new best friends from the cube right here on EMC TV.