 York City, it's theCUBE at Big Data NYC 2014. Brought to you by headline sponsor, Juan Disco, with support from EMC, MarkLogic, and TerraData, with hosts, Dave Vellante, and Jeff Kelly. Okay, we're back. This is Big Data NYC. We're here in the Big Apple at the Times Square Hilton. We had a great event last night. Jeff Kelly gave a really terrific presentation with a lot of substance about the state of the Big Data market. Then we had a fantastic panel, and we're going to be replaying that panel and sharing that with you over the next several days, weeks, and months, because there was a lot of sparks flying, and just a lot of insight around the innovation that is Big Data. Mark Grabb is here as the technical lead for the analytics practice within GE, the industrial internet leading that charge. Mark, thanks very much for coming on theCUBE. Yeah, you're welcome. So we, as I was saying, we covered that event, that launch. Jeff Kelly did a lot of research around it. Really sort of fascinating to see what's happening with this whole notion of the industrial internet. Love your branding around that, and there's some real substance behind it. At the same time, you come from the classic GE side of things, not necessarily the software world coming in and saying, hey, we're going to take over the industry. You come from sort of the industrial practitioner side. So I wonder if you could give us sort of where you're at with the whole initiative and the industrial internet, what your role is there, and then we can really get into it. Yeah, absolutely. So I'm responsible for analytics research across GE, which is fantastic, because we work with all the different business units that gets us involved in all these different verticals, and what you're talking about and what we launched out in Santa Ramon. Of course, we hired Bill Rue as a leader, and he has over 1,000 people there now in software and analytics. We work very closely in a role. We're all one team, but he's changed the culture of GE, which has been fantastic, bringing in that new big data analytics feel into an established company. But it's an established company that makes gas turbines, aircraft engines, all these locomotives, these really incredibly high tech pieces of machinery, we call them assets, and what do the customers want? They want zero unplanned downtime. They want highly reliable assets, and they also want those assets to work very economically efficient within their larger ecosystem. So analytics is becoming incredibly important to that because we've got those things completely censored up, we stream all the data back, and we're computing those analytics to provide guidance to those customers. And it's a tremendous amount of physics and material science and thermodynamics that are required to go into those kinds of analytics along with your traditional data analysis. Can you sort of add some color to that? Help us understand what you mean by, so for instance, thermodynamics, going into the analytics. Yeah, well, absolutely. I mean, let's just maybe start with healthcare, for example. We have this luxury of working across all the different business units. You know, a couple of decades ago, medical imaging, which GE was at the forefront of, really put almost an end to exploratory surgery. And then, how do you do that? Do that by building an image inside the body in a non-invasive way. Then, as medical science has advanced, there was this strong desire to get the personalized medicine. You know, so even if there would be a slight biopsy to take a sample from a person, we work with molecular pathologists and cell pathologists and people in optics to create systems to automatically understand and diagnose little pathological slides. And we've created a business out of that. And what that allows us to do is to direct the therapy optimally for that specific patient. So that's the concept of personalized medicine. We're taking that same concept into the industrial space. So even though when we manufacture parts, they're manufactured with very high tolerances, and you might think that every single part is exactly the same, but at the microstructure level, these parts are actually different. It's like a snowflake. You know, there's no, how the microstructure of that special alloy is assembled for that specific part is different. And if you're gonna understand exactly how that part is working inside that gas turbine or that aircraft engine, then we do things like pathology. We'll take a part out after time. We'll cut it up. People use very special electron-based imaging devices to go in there and understand the microstructure of that exact part. Then we'll also collect a tremendous amount of data using nondestructive techniques like ultrasound. Then we're able to equate those so now we can understand without cutting the part up exactly what's the microstructure component of that part. So it's like giving a health check to the part every time it comes in the shop. So the existing sort of previous sort of strategy was a brute force. Let's do surgery on the component. And now you're saying you can instrument that and infer from other data to presumably a high degree of accuracy. Absolutely. And then we're doing things using boroscopes and then that way you don't have to necessarily take machines completely down or take aircraft engines off wing and collect imagery data and very precise types of information all along the life cycle of the asset. We make very important service decisions with that information. We feed that information into our new designs that creates better designs. Better designs creates better service. Better service creates better designs. It's just this virtuous cycle. So can you talk about just culturally the engineer's attitude toward this whole initiative when it first started? I can see engineers going, wait a minute. I don't want you near my stuff. It's working. Yeah, right. How did you, was that an issue and how did you get through that not whole? Yeah, we got through that. We got through that not whole pretty darn quickly. You know, I would say. Is that a top down? Yeah. Well, you know, it was a little top down. I mean, we're lucky to have a chairman that's just really driving this effort, right? So there was some top down. But the scientists, the people that I partner with so closely with, it's definitely a team sport. It's very collaborative working with the material scientists and the physicists and all those folks. They welcome the data. And we'll take, at times, we'll take purely data driven techniques and we'll very provocatively ask questions to the material sciences and challenge their theories. They love that because that pushes the state of their art. So there's very much of benefits flowing in both directions, both from the data. And then of course, we use their equations, their theoretical understanding of material science and computational fluid dynamics and we integrate that into our analytical solutions. So they see it as adding value? No doubt about it, as you're doing. Mark, I wonder if we could talk a little bit about some of the challenges. We've done, you know, in our research, talking to big data practitioners, not specifically in the industrial sector, but just generally, and there's a number of challenges they come up against, whether they're technology related or people in process related. And one of the issues around GE, I could imagine, and we cover this a little bit in some of the research that we did that Dave alluded to, is that you're dealing with some industries, some that are very highly regulated. Yes. Whether it's airline, whether it's nuclear energy. And one of the challenges there, of course, is well, what can you, it's one thing to be able to access this data and do analytics around it, but if there are regulations that prevent you from doing certain things, if there are, we talked to some nuclear power plant folks as part of our research and I remember them saying something like, well, we can't even connect to the internet. That's not something we're allowed to do. How does the regulatory environment and some of the, just kind of whether, maybe they're not even encoded in regulations, but just some of the industry norms impacting your ability to bring analytics and big data to these industries. Right, so really I would even take a step more general than that and say one of the initial challenges was just getting the data together, realizing that all the data that we need and can use to do these kinds of analytics, that needed to take place. We partnered with Pivotal to build our Predix platform. That's getting all of our data centralized in one place. That certainly brings the regulatory question, whether it's in the healthcare space or in the aviation space. So there's people that work those agreements with the customers, work those agreements and sometimes with the government too, so that we have the security around the data. Sometimes we need to create solutions that are on-prem because the data can't be taken out. So we have the flexibility though of having, from an analytics person's perspective, of having the accessibility of all the data is driving so much innovation and speed. I mean I know people have said this many times, I've been an analytics guy for 20 years so I'll tell you it's true, is that we would spend 80% of our time just trying to get the data in order. And then you're, all the cool stuff that I was, you know that I'm all jazzed up about. You're out of money. You're out of money. It's like 20% of the time, right? So now what we're seeing dramatically shift is all that exciting stuff is turning into 80% of the time, 90% of the time. And when you need to be collaborating with material scientists, with physicists, when you need to be understanding heat transfer to integrate in with the analytics, you cannot afford to have time messing around with getting your data in order. That's why the PREDICS platform, the partnership with Pivotal has just been vital to us. I just want to, if you could expand on that a little bit because we've heard that number 80% of the time is either getting the data or massaging it, getting it into a form that can actually do something. So, but you mentioned now that's less of an issue. How has, not just Pivotal specifically, but just Hadoop, Big Data generally helped solve that problem? Because I still hear though that that's still an issue with data scientists, even if they're using things like Big Data and Hadoop. How has this helped you? It's not just, Hadoop alone isn't going to solve that problem, right? There just becomes a rapid tipping point. Once you start getting your data together, now you can address data qualities. You know where your problems are. And you also know the business case on the value that comes from having all of that heterogeneous data and to be able to create analytics for customer value. Now there's a huge accelerator. Now there's no excuse. And people see that that's the bottleneck. So then you can systematically start attacking the data quality issues. You can systematically make sure that you've got the right data models and architectures in place. And that's the shift that I see that's dramatically been happening over months. You can just, you watch it happening. So I remember when Pivotal sort of came out and I was actually at the, down here, the financial analyst meeting when EMC kind of announced that whole federation. And I remember, I'm skeptical, right? Because I've seen so many me twos over the year and I said, how are these guys going to differentiate? Who needs another Hadoop distribution? And then I saw the GE investment and I went, wow. Smart guys do some stuff behind the scenes. They're not just, this wasn't just a press release. This is some serious business. So I wonder if you can talk about the connection points with Pivotal. What's the technology that you're getting from them? How does it feed your technology? What's the relationship like? What is the impact with? I know there's a lot of questions here, but what's the impact with other people in the ecosystem that you naturally have to work with? I wonder if we can sort of start with the Pivotal relationship and the technology connections. Right, well, absolutely. Well, I'll answer the question, of course, from an analytics person's perspective. Sometimes you need to create analytics that move very quickly. You want to get that algorithm right down to the data and certainly bringing in Pivotal in that partnership that the style of Hadoop that they bring to the table. That has allowed our Predix platform to be able to do that quite nicely. But there's other times that the analytics are so complex that you're not going to map reduce these complex analytics, and you do need to selectively bring the data up to the analytics. Pivotal has solutions for that that they've developed for at the industrial strength in partnership with GE that is the Predix platform. So just jump-starting General Electric. As you said at the outset, General Electric, even though we've got over a hundred years of experience, it's largely big iron experience. So we really had to jump-start the big data software side of it. So getting a partner like Pivotal early on in this journey has just been very important. So I want to follow up on industrial strength, because in our world, industrial strength means a mainframe and recovery. Oh, there is. So the software perspective, what does industrial strength mean? Well, industrial strength goes back to your regulation question, right? When you're creating analytics that are working with aircraft engines, whether it's in the healthcare space, you have to systematically go through processes so that that software goes through the regulatory processes and procedures. We have always done that. Since GE has entered the Bay Area, we're learning new techniques to develop software much quicker, much faster, but also industrial strength. Also meeting the regulatory requirements. Now, as an analytics person, I wonder if you could help us understand sort of where does the data go? Does it go into some data lake? How do you deal with things like data quality and consistency and data governance? I don't know if you could talk about that a little bit. Absolutely, so we've formed data lakes, that's our strategy, we're bringing all of the data into data lakes, and what that also has allowed us to do is bring a lot more data in. We've had very sophisticated sensors that have been on our assets where we would just sample those readings, and then we would do our analytics on those sampled readings. Now we're able to just bring in the full bandwidth of some of these sensors. Sampling's dead in that world. Yeah, very rich information, and we can bring that into a data lake, and then we also have the ability to get the data that we need at any point in time to do the analytics. That's the power of having a well-formed and architected data lake. So just talking about Predix a little bit, so I understand you've now opened that up, you're essentially selling that now to making it available to other companies. They want to purchase that. Talk a little bit about that transition now. You're in the big data analytics software business as well. Right, yeah, so we're working with some select partners, and we're going to be making that available in 2015. It's really quite natural, from my perspective, because the various GE business units, even though there's a lot of rotating machinery there, in the locomotives, the gas turbines, the aircraft engines, the oil and gas machinery, but there's also, and we have healthcare, we do work for GE lighting and the whole variety of GE business. We do a lot in financial services for GE capital, analytics there. So taking it out to the rest of the world, what we're finding is this scales very naturally, and so bringing it into an agriculture sector, bringing it into these other areas is actually working out quite naturally. One of the things I'm curious about, obviously we do a lot of video here, is video analytics. Can you talk about that a little bit? Cameras and sensors, the metadata associated with that? We're, you know, so of course computer vision and video analytics has been around for a while, and we've seen face recognition and other applications, but I'm really seeing, even from our GE labs, some tremendous technology where the camera, if you think of it as an installed base, as a sensor, it's really starting to understand anything that a human expert would be able to understand with his or her eyes, incredibly powerful. Some of this is right in the people interaction space where we have computer vision applications where we are doing work for the GE businesses, we're also doing work for government agencies that want our expertise, and we're able to read body language and posture, you know, quite in quite an advanced way. So the whole observable world is now being turned into fairly efficient data, and now when you think of the vast amounts of this heterogeneous data that's being brought into data lakes, video is going to be, with, you could just stream raw video into a data lake, but you've gotta process it to get the value out of it and computer vision is really taking tremendous strides in that area, both in industrial inspection applications, as well as in just pure human interaction. What's the state of facial recognition? I'm not familiar with what happens on Facebook, but you try to apply that broadly in a situation where you don't have everybody's pictures, it's gotta be a bigger challenge, where are we at with that technology? So what's taking off there is so many companies have focused on perfecting facial recognition, and there's been tremendous strides there, but now what we're seeing is just as I'm moving my head as I look to both of you, that these still cameras can create a 3D rendering of my head through structured motion as I move around, and then as, and also it can build super resolved versions of my head as it collects more and more data. So over time, even though it's just a one camera, it can create a very super resolution 3D model of my face. And then when you feed that into a face recognition engine, you can get dramatically better results. We're starting to see things like that. Well, and I would imagine now one of the applications and use cases is security, of course. No doubt. And I can see now the arms race starting where the bad guys are trying to reconfigure their face with mouthpieces and not have to wear masks because it may be too obvious, but that's, are we seeing that yet? That there's arms race starting here. Well, there's no doubt that we're seeing that and there's a lot of use cases and there's a lot of money where people want to advance that state of the art in the measures and counter measures, but there's so many other applications that fall out that are available because that technology is there. When you walk into a store, what signage exactly are you looking at? And as you look at that signage, what did you actually buy? What product did you look at and not buy? All of that information can be collected on every person that enters the store right from the security parameters that are already there in the installable days. Yeah, now some people are watching and saying, oh boy, now I'm getting a little concerned here that there's a privacy question. Sure. And that's another issue around related to regulation. And maybe you could touch on that, those concerns around privacy, but apply that to, what are some of the applications in GE's lines of business where this could be applicable? Yeah, absolutely. So to let you know, I'll tell you about GE's applications. So what I was talking about, we have a lot of people coming to us, they want to license our technology to do those kinds of applications. GE isn't in that business. In our healthcare sector, we provide hospital optimization, so we have cameras that are set up in a hospital area. There's certain protocols, talk about regulation that need to be followed. Do you wash your hands when you enter the area, if you've gone through those steps? There's, if you install cameras and as you're installed base as a sensor, now you can start layering a large number of applications. You can alert the doctor if he or she did not wash hands before seeing the next patient. You can take statistics to see if that's going on. Hospitals are getting increasingly, increasingly interested in this technology, just as the whole payment moves from making people healthy, not just building their time in the hospital, because the whole way of paying. Wow, and you look what's happening right now with the Ebola scare and CDC basically saying, hey, these hospitals aren't prepared to apply the protocols. Well, I mean, you're not too far away putting forth a vision where you could actually help at least automate or adjudicate that in near real time, right? Right, right. Mark, I wonder if I could ask you sort of just a broad question about what you, as an analytics pro, what you want from sort of the core technologies, the infrastructure that supports the analytics mission. Can you just sort of describe, you know, your basic requirements there? Right, yeah, so, well, my feeling is it's happening because we need to keep our gas turbines running with no unplanned downtime. I mean, that's what our customers want, that's what we wanna provide to our customers, that's what's happening. But how are we getting there? We're getting there by treating every asset as an individual, which means you have to collect a tremendous amount of detailed information. Just as doctors look after us, we're the doctors that are looking out of, after our assets out in the field. And that tremendous amount of information anywhere we can get it in a very prescribed way needs to come back and be available for analysis. And it needs to come back and be available for analysis in a very collaborative way because we need to bring in our technical experts, we need to bring in our analytics experts, and then we need to create these solutions and help make decisions on, next time there's a scheduled service outage, exactly what are we going to do? Those kinds of decisions are what keeps you to no one planned downtime. And so that requires vast amounts of information and it requires collaboration and it requires making those strategic decisions. It requires automating as many of those decisions as possible. And when you have all of this information in people's heads, think about this, General Electric has over a hundred years of all of this big iron rotating machinery, you know, brain in there. The more that we can get that in a knowledge graph and digitized so that those smart people can be working on top of that, those are the things that are gonna allow us to do more of this job automatically and more assisted through analytics in a very efficient way. And that's providing these more advanced services to our customers. Interesting, we put forth this notion about this idea of digital fabric that leaders are riding on top of, creating new business models, new value, and one of the characteristics that Jeff brought up yesterday was your ability to personalize the user experience you're talking about personalizing that asset experience. So there's a real parallel there, Jeff. And I think the other really important thing you touched on is collaboration. People, some practitioners I talked about, their company they talked about, they think that they can hire a PhD genius data scientist and that's gonna solve their problem. And it's not about hiring one data scientist, or even a team of data scientists. It's about enabling them to collaborate not just with each other, but with the business units, with the other technical people who are not necessarily the data geeks, if you will. It's very much a collaborative effort. Yeah, we absolutely see that and we've been living that very closely. We work shoulder to shoulder. All the analytics people in GE are always working shoulder to shoulder with some domain expert. And now, yet another thing that's been a great addition to General Electric ever since we moved out to the Bay Area is user experience. So now we have quite a strong user experience competency. Having that design thinking, coming in and meeting that engineering thinking has been quite magical. Excellent, Mark. Well, listen, thanks very much for coming to theCUBE, great segment. And good luck with the initiatives and keep in touch when we watch and love to have you back and give us a progress update. Thanks a lot, I really appreciate it. Keep it right there, everybody. Jeff Kelly and I will be back with our next guest right after this. This is theCUBE. We're live from Big Data NYC. Right back.