 At Big Data SV 2014 is brought to you by headline sponsors WAN Disco. We make Hadoop Invincible and Actian, accelerating Big Data 2.0. Okay, we're back here live in Silicon Valley. This is Big Data SV. This is where all the action is happening. Go to the hashtag BigDataSV, go to crowdchat.net, the new application, large-scale group conversations directly pumped into the cube here. crowdchat.net slash BigDataSV, that's the event we're covering, all the Big Data innovation here in Silicon Valley and the Stratoconference happening right across the street, all the news, a lot of action, a lot of entrepreneurs, a lot of VCs, a lot of CEOs, a lot of researchers, a lot of great stuff happening, the commercialization of Big Data is happening now. I'm John Furrier, the founder of Silicon Angle, I'm joined by co-host Jeff Kelly, chief analyst at wikibon.org for Big Data and our next guest is Steven Guess, an R&D manager and knowledge discovery lab lead at GE Research Lab here inside the cube, about to go out into the Stratoconference stage tomorrow to give a big talk about the industrial internet, Big Data, a lot of action around data, and Steven, you're at the center of it for GE, which is a small player in the Big Data space, or a big data in the small data space. You guys are really doing some amazing work, industrial internet, industrial clouds. Welcome to the cube. Thank you. Thanks for having me. I have a lot of small data. I mentioned kind of a little pun there and a little joke. In reality, GE is about all kinds of data, and sometimes little data, but in massive amounts of it. So, talk a little bit about your research, okay, and then we'll share with the folks some of the things you're working on and about to talk to you tomorrow. Okay. Sure. So, I lead a research team in New York, and we're focused on looking at research that enables the company to extract insights from data and to do that faster and with more accuracy. And so, in some ways, we like to think that we're doing research to help data scientists as well as the production type of analytics that run at M&D centers and things. So, talk about the GE, just the GE stuff out of the way. Folks not following at home, and folks that do know us, know that we've been all over the industrial cloud, industrial Internet you guys are part of, Coastal with your CEO at a panel, Minor Machines in Chicago, amazing event with amazing partners, and then here with the developers in San Francisco, share with them the vision of the GE industrial Internet and why it's so important to GE and also to the market. Sure. So, it's actually, you know, it's a very amazing story. So, if we think about what the consumer Internet did for the world starting about 10 years ago, you know, you have things now like Amazon putting out, you know, bookstores out of business and Netflix doing amazing things with streaming, and the list goes on and on. And so, what we started to look at about four years ago is how do we take those amazing technologies and successes and transformative things from the consumer space and bring it into the industrial space, and GE has always been doing a lot of very advanced services like remote monitoring diagnostics on this very complex equipment, and as we were watching the consumer Internet and things like the Internet of Things trend, having everything being connected, it became very apparent that the future in all those advanced services that we do would be in what we call the industrial Internet, bringing together all the devices, connecting people into that loop and looking at advanced technologies for like, you know, managing data, more predictive analytics, and really what's a kind of a cool thing that's focused here in the Bay Area is the new way of doing software. And so, you know, for me, working at the research center, it's been very exciting because, you know, GE doesn't do things small. So, you know, the joke you had before, you know, we used to be called the most wealth when it started up in the Valley. And so, it's been very beneficial to me because they bring in great leaders like Bill Rue, who's leading the center out here, and a whole staff of people from across the company, from other companies to help bring the industrial Internet to life for our customers. Well, this is potentially a really transformational moment for GE because you really, it seems to me that the future for GE is you're going to be competing less on the machines you're creating and more on how you manage and analyze the data coming off those machines. That's going to be, it looks to me as much of a differentiator and as much of a competitive advantage for GE as the actual equipment itself. We were having some conversations last night with some folks who were, one gentleman was on a board of a company and they were looking to buy a jet engine and they actually made the decision not based on how good the jet engine was or the cost, but the fact that GE can actually analyze all that data. So, talk about this transformation. I mean, it's going to, it's a long process, of course. But how do you see this really changing the way GE operates and their whole business? Oh, for sure. The aviation is a fantastic business and they've done extremely well. I was just reading online an article or it was actually an economist, I believe, that they have something like 70% of the engine market share right now and commercial. And they've always been a great user of data and analytics, but it's really changed recently where an aircraft takes off and they would collect a few data points from when it was taking off, when it's cruising, when it's landing. And they'd use that to build prognostic algorithms and help future flights look for things that might need some maintenance or things. And what they have done now in this sort of new era of the industrial internet is now they're collecting all the data that they can and figuring out the ways to use that best for the customers. And I think it really is transformative. It changes, I think, the way that people think about building, doing design, what kinds of sensors do you need, what new services could you offer that are going to be, the customers are going to like even more? And so they already have fantastic hardware and they've kind of shown that the last couple of years, especially with all the new orders that we've been seeing. And now taking this huge step into even more deeper data collection and analysis, I think the customers want that and they're stepping up to it. So it's pretty exciting. Right, and this spans all the different lines of business. It's aviation, but it's also healthcare, it's transportation, it's energy, and really you mentioned some of the new ways you can use this data. Some of the ways you haven't even thought about yet. So the idea is you collect all this data and at some point you're going to bring in smart people who are going to figure out new ways to really use this data to the customer's advantage. So tell us a little bit about the research you're doing specifically. What are some of the exciting things you're working on? And let's get into that. Sure, so I have a great team, researchers, and then I also have lots of colleagues at the center and out here in the Bay Area too. And so the particular area that we focus, my team focuses in, is in what we call big data systems research. So very applicable for Strata and then also knowledge representation research. And the ladder is really about helping GE capture its domain knowledge using contemporary knowledge representations, things like the semantic web if you've heard a ladder linked to data. And so we've been doing some really interesting projects around making it easier for people to analyze data, to query data, and to build analytics faster. And so that's a very, I think, important work for, especially a company that does build very complicated things, is to look at the medium to gather and host all that knowledge and make it digital and executable. The other part around big data systems research, that is also, I mean, this is a fantastic time to be a research scientist in the space. If we go over to the hall over in Strata and we look at all these technologies, all those companies are making these really cool things and they're kind of focused at multiple areas, some for finance, also for e-commerce and things. And when we take one of those technologies and want to bring it into the industrial space, maybe it's a security thing that we have to add to it. Maybe it's a horizontal scalability because we're working with a massive amount of time series which would really dwarf a lot of the transaction like e-commerce type of data. And so we do research to figure out how to make those things scale. And so for example, tomorrow I'm going to talk a little bit about how we looked at taking time series data and building a horizontal scalability on top of Hadoop. And now that turned into a product that's being sold by our G-intelligent platform's business called Prophecy HD. So that's a very exciting, it's a really nice example of going from research to product. Yeah, absolutely. Steven, talk about what you're going to talk about at Strata on one stage and connect that from the research side because it's exciting that you can essentially apply research in a way because you're bringing stuff to market fast. Talk about what you're going to talk about tomorrow and then talk about what's going on at G on the research side and some of the trends around how fast it's changing, meaning the old days was, hey, see you later, research, having long lunches, who knows what they're working on, years later the output of some products. You guys are on a different timetable. I want to kind of touch on that. Well, you know, one of the very cool things about what they're doing here in the Bay Area is bringing the concept of platform to the company. And that's very important for research because typically, you know, industrial research centers work with the industrial partners, their businesses, it's called them, and help develop new technologies for customers. But that technology, that research still needs to be brought to market somehow. And so it can always be, you know, for every corporate R&D center, which there are a few left out there today, but it's always difficult to make that transit, that translation to the product. So by bringing in platform, the team out here is really helping the company build software, build more complex software and do it faster. But from research, it's very exciting because that means that we can develop research directly on the platform and have a faster and more seamless transition to customers. What's the coolest thing you're working on right now? One of the things about GE is that we work with the health care business, we work with aviation, transportation. And so there's a lot of different segments that we work in. I know that my team right now is really excited about some of the things that we're doing with health care because it's got that really direct link towards helping people. And we work on cool things like aviation. That's got a great thing around transportation doing it safely. Health care right now is a big deal with all kinds of new things you can do. Exploration is huge, too. You think about from an energy standpoint. Yes, absolutely. There's some work that has been going on the center, which I've always been very fond about, which is looking at taking wind turbines, optimizing them, helping customers get more value out of those turbines. And that's a great example of using advanced data analytics. I mean, Jeff Emel was saying when we were doing a panel in Chicago, he's like a small percentage, one percent change in innovation. It's billions, say billions. Not even like just that's not a stretch. That's billions. United Airlines was up on the panel and literally shared a story with me in the green room. He's like, look, I'll be straight with you. Safety's cool and everything. We love that. But gas prices alone are huge. So for them, the little things like gas prices, transportation costs to get there, the data involved in the streamlining of the processes, process improvement, has been fantastic. Yes. One of the things I'll mention tomorrow in the talk is that a lot of people think this kind of era of improving productivity has been sort of over and GE pioneered some of the things like lean six and process improvement. But really with having more data around manufacturing processes, any kind of process, it's really transforming the way that the transparency that you have. And then when you start running analytics on it, you have a real time look at the manufacturing process and in real time look at how you can make adjustments, how you could get more, better yield, more throughput. And that's really exciting. I love the ability to collect this data and get that insight and this constant feedback. I mean, I love that whole, it's been overplayed. Drucker, process management, all that stuff, it's maxed out. But when you look at the new data, this is what Jeff and I talk about in the Cube all the time, is that what we get excited about, like you, is it's radically changing the value chains because the activities are also being disrupted. So every activity in the value chain is now radically disrupted. So again, it's like, OK, it's just evolution. I mean, I guess. So we're going to call it instrumented data tied into the users now. So it's like, it's just so exciting. I mean, I think that's why I always laugh. It's like, oh yeah, we're pretty much maxed out when the opportunities are massive. Supply chain is maybe a good one to finish on on this topic because that's something where it's extremely critical to a company like GE to have good, solid supply chains. And we make things that are made out of very complicated materials, work with suppliers that make very high precision components. And so having good supply chain is very, very important. And in the past, it's been hard to see with great transparency and real time data how the supply chain is performing. And so now with advanced data and software and analytics, the insight into that is going to really change things. And I think it's good for everybody. The stuff that Walmart used to do, and they still do, I think, for giving analytics back to their suppliers, I think that's going to see similar stories or similar kinds of things in our industry. So we've talked a little bit about how the industrial limit is going to transform GE. And we've talked about some of the use cases. How is it going to transform your customers in the way they operate in terms of maybe using this data for new types of services, new lines of business that they haven't even thought of. And you mentioned just now a good example, Walmart, who shares this data with their suppliers. And there's so many different ways you can use this data. How do you see this kind of changing your customer's business? And is there a shift in mindset that's going to have to happen with your customers? Do they understand the possibilities here? Is there a kind of an education that has to happen from GE's perspective? Yeah, as a researcher, I always get worried about talking about the GE's customers too much. But I think one of the things that we talk about is giving customers zero unplanned downtime. And basically means if we're supplying gas turbines for a power plant, we want to minimize at all costs that those turbines or whatever asset equipment is, whatever assets that they are, that they don't shut down when they don't expect it. And so that's zero unplanned downtime. And I think if we deliver on what we want to do, it will give the customers the trust that the plant will be running all the time, and that they can be very methodical and have a lot of lead time to when they need to do maintenance and things. And so I think that would change. And I imagine that how the partnership between our company and customers would also change and to be more, I think it's already probably very symbiotic, but even more kind of mutual beneficial things. The things that people talk about like consumption economics, I think data and advanced analytics brings consumption economics into new industries, and that's probably going to happen. So talk a little bit about the challenges that you're facing. What are the biggest challenges in terms of working with industrial data? We did some research, David Flore and I, along with GE around some of the requirements of the platform. And there were security concerns, of course. And there's challenges with scales we've talked about. What are some of the biggest challenges from your perspective and some of the research that you're doing in terms of really getting this into the hands of end users and companies that want to use it to take advantage of and improve their business? So I look at this very with a focus on the analytic building component. So previously I was doing a lot of machine learning work. And one of the big challenges, especially in our industry, is that the data isn't really simple, right? It's not web pages. It's not somebody purchasing something on a website. It's data from a pressure sensor inside an aircraft engine. And as a model or somebody who needs to build analytics to give our customers lead time and help us do maintenance, I need to use that data to build a model, to build some analytics. And it's not easy to use it. You really do need to be a mechanical engineer to really understand what that temperature measurement means or pressure measurement, how to leverage it. And so I had seen firsthand that challenge of working with this complex data. And it's very beneficial to have such a good company where you can work side by side with aviation engineers. But to scale this analytics, to scale the use of data, I really believe that we have to conquer this challenge of making data more self-describing, is what I like to call it, using semantics so that I can, if I need to, call that engineer. But if I can just look up in an ontology or something about what that data means, how to process it, I think that will be game-changing. When I look at the technologies here, it's really cool to see so many people, so many new products around HDFS and Hadoop and the ecosystem. And I just imagine all this data putting into these big data lakes or whatever, and somebody still needs to keep track of it. And how do we handle all that metadata? And so I think that's one of the challenges that would keep me up at night, thinking about how if all of GD gave us all their data into a big Hadoop distribution, what am I going to do to manage that? That's where I think we need some new kind of technology. Yeah, that's a good point. The governance, you've got compliance issues you've got to deal with, you've got to keep track of that data, you've got to know who accessed it, when, and how they're using it, that kind of thing. And that's not always the first thing you think about, and some of the cool stuff that gets the front-page headlines or the cool analytics that are happening. But ultimately, you've got to manage that as an asset, and you've got to comply with regulations, et cetera. So tell us a little bit more about your take on the conference. You've seen all the vendors over there. I mean, it's a crazy market right now. I mean, there are so many vendors in this space from the huge big players like IBM and EMC. And then you've got Pivotal, which I know GE is partnering with, down to little startups that have got maybe a little niche feature in Hadoop that they're trying to market. So what did you take and what's going on over there? It's really interesting. I'm enjoying it. There's a lot of cool technology. I like that everybody's got a play on the elephant, on their shirt, or stuffed animals, or things. It's a very exciting time. And it's very hard to keep straight all these different things. But I think it's good. There needs to be this mass expansion of these ecosystem infrastructure tools. And then some of them probably would complement each other. And you start to see that. You see a lot of Hadoop plus Hive, SQL on top of different NoSQL stores. And so it's definitely when I first started looking at when it was just the patchy Hadoop, when cutting, just kind of started to do that work. And we were using that. It was difficult to get those first things up and running on Hadoop. And the field has come a tremendous. And it's really great. I mean, we internally are looking at these technologies and using them in production environments. And that's fantastic. Yeah, I mean, you guys, not that this is your area specifically, but you're partnering with Pivotal. But you're also partnering with companies like, I think, Cisco and others. And how do you navigate that with very good people? With very good people, yeah. That's got to be a challenge. How do you make those decisions around who you partner with and when it's time to kind of start integrating and co-innovating with some of these other companies? Yeah, there's somebody on it. There's some great guys out here. And we do some work where they'll connect us with some of the new companies and we'll do some technology evaluation. But we have a great team of guys doing venture activities, partnership ecosystem plays. It's not my area of specialty, but I certainly enjoy when they bring in the CTO from a startup. Being privileged to talk to them, I don't know of another company that gives me access like that to some of these really cool startups. Yeah, that must be great when you've got some of these really interesting companies coming in there. Obviously, they would love to partner with GE. So you get to access to their technology and their know-how. Sometimes I wish I had a bigger checkbook. Stephen, thanks for coming on the Q&A. GE, big player. We enjoyed working with you guys and had a lot of fun on both events in Chicago here at San Francisco. Great work, good investment. You guys put a lot of wood behind the arrow and it's really important also to your business and your customers. Great to see. Big data, revolutions happening, continuing to grow, and this is theCUBE, present at creation at the beginning, continuing to ride the big data wave. We are here in Silicon Valley where all the action's happening this week for big data SV event and we'll be right back with our next guest after this short break.