 Okay, we're back here live inside theCUBE. We're actually in a room here. We're not actually in the venue here in San Francisco for GE's industrial cloud event. This is where they're introducing all their big data for the industrial business. They're a business model. I'm John Furrier. I'm joined by my co-host. Hi everybody, I'm Dave Vellante of Wikibon.org. Ed Dumbill is here. Ed's a good friend of theCUBE. We first met Ed, of course, at the O'Reilly Strata Conferences. He's the co-chair with Alistair Kroll. And Ed, you've got a new role as well, I understand. Silicon Valley Data Science. So congratulations. Thanks very much. I figured it was time to jump in and enjoy the water as well as watching from a distance. So what's that all about? Tell us, fill us in. Yeah, well, we're a new company, very young, but we're, a bunch of folks got a track record in data science consulting. So that's the main idea that we're going to go after hard Nali problems in big data and data science. Our CTO is John Acre, who used to run the big data labs at Accenture. Awesome, well congratulations. So you've thought about, written about, talked a lot about the Internet of Things. You saw this GE announcement today. What's your take? I'm very excited, actually. I think it's a very sensible move to create an ecosystem. GE is obviously aware that what they provide goes into companies. It goes into an industrial ecosystem. It goes into a commercial ecosystem. If they want to enable a vision of getting more productivity and better use out of these machines and saving costs, then they need everybody to play with them. And what we heard today wasn't just an announcement about a product or two of theirs, but really a massive setting of an agenda for an entire industry. It was interesting how they juxtaposed against the consumer Internet and basically put forth this vision of higher value, higher stakes, and enormous potential market opportunity. Yeah, it's huge and there's a lot of work to do, but what they're trying to bring in, particularly from the consumer world, is not only the data on analytics, which we heard Bill Rupert first and foremost, but the agility too. You know, this is something industry just doesn't have as a rule, is the ability to move quickly, and that's why Paul Merritt's and Pivotal's involvement is so important to them. So I tweeted out, and I don't know if you coined this term, but the other first time I ever heard somebody say you can't take the humans out of there, you know, the humans are the last mile in big data. And we've been asking sort of questions about headwinds potentially, one of them we put forth is sort of conservative engineers, you know, who don't want to go hyper-automated. What are your thoughts on some of the barriers to seeing this vision through, and what has to change in order for this to really get uptake? I think there is a lot of change. I mean, obviously there's cultural change, and we see that with data and analytics everywhere, that the tools are half of the problem and the culture is the other. I think G is doing very well at addressing some of that by trying to create a movement, right? Jeff Immelt's going around and telling all his peers and all the companies that they need to get real with data. The second half is the tooling, is the platform, you know, you cannot, one of the reasons engineers are so conservative is the systems they've got, if you try and fiddle with them, they can be great, man. You know, so a new kind of more platform, more agile approach to actually the technical underpinnings is a big deal too. So Ed, you know, obviously you're in the data space, obviously you're in a new company with data science. So the data science stuff is obviously compelling, we've talked many times about that, but I want to get your take on the underlying infrastructure because you saw Paul Moritz on stage here with Werner Vogels from Amazon, and so they talked about the consumer internet's moving in this direction, so now the industrial internet's moving, but that's great, from a trend standpoint, we can all kind of check the box, and say, yeah, that's going to happen, but there's still a lot of change going on on the consumer side, we've got the cloud for instance, the stacks, do you go full vertical stack and the stacks are changing relative to the software, because the developer market is booming. So I want to ask, with all that kind of context, what about this internet of things, this industrial cloud, this industrial internet, what does it look like from other paradigms? It's an emerging market, I was on Twitter last night with a friend of mine, Scott Lemon in Utah, and we were talking about our days in wireless back in the 2001 timeframe, and it's the same kind of thing, LAN, local area networking, systems, complex event processing, a lot of those paradigms are kind of coming back into the stream here, but not exactly in the same way, so what do you see this revolution looking like that could be, if you can point to the past and say, it's a little bit like that, that and that, is it systems management, is it more like the local area networking, is it a little bit of something else, what's your take on that? It's definitely a paradigm shift, I totally get it, and I think it's interesting in that some of the responsibility for work computing is done as shifter, so you could draw parallels to client server revolution, for instance, and that which is definitely one of the biggest shifts that we had, but I think the real big deal, to be honest, is about what we heard before and the polarity from Accenture led with it, the digital business thing, so to me the shift is actually more about where you see computing applied and no matter what the technology, we had this thing where up until five or 10 years ago, computers were basically faster paper, that's what it was, and now we kind of seen the means to expand the digital influence into product delivery, obviously more happens on the web, more automation in factories and so on, the more you can control, the more you can automate, the more you can experiment, so I know you're asking for gnarly tech things, it's just observation, there's no real wrong, it's kind of like where are we? It's totally a new space, I mean, Paul Moritz obviously has the perspective of, you know, given his view, IT has to change, which it does, that's just basically saying the infrastructure and the dev has to change, so yeah, I mean it's totally a paradigm shift, it's just trying to tease out, where's the lever, where's the tech, where's the disruptive enabler, I guess that's what I'm trying to get to is, is it, can you point to, in the old days, TCP IP enabled a bunch of stuff to happen, the web, HTTP enabled web page, HTML for publishing, you know what I'm saying, so I'm trying to, I'm going to say it's actually, it's a social and collaborative one, and it's an agile app one, so look at factories before robots, lines and lines of people, right? Walk in any startup here, what do you see? Lines and lines of people writing code and doing very frankly repetitive tasks, there's very little leverage on actually a programmer, and if you think that's the case here with the best talent in the world, well look in the shops out in industry where they can't attract Google level software developers or whatnot, the amount of leverage they need off these people to drive their businesses forward is the key thing, so it's about productivity and things like social, easy sharing and rapid application development. You know we were, David, it's an interesting point, I like how you always come back to that one point, I think it's the human element and it's what humans are doing with the technology, in fact when we were at the HP Discover event last week in Las Vegas, one of the things that David and I kind of had the epiphany we've been kind of thinking about for a while, what we talk publicly about is that the standards bodies of once yesteryear of the ITF and all these things would make these global decisions around standards and stuff is now happening in open source, right? So the communities themselves are developing the standards, I mean we had Beth Comstock talking about the Kaggle work that she's done for her data science and the work you're doing, so that's the new standard, like OpenStack, the proliferation of that kind of validation. And you know what, there is a technical shift that really matters and this is basically the breaking apart of data silos and the emergence of Hadoop as an obvious place to put everything and the shift here is the delay in breaking up and cleaning data from the beginning to the last moment. And I see that as the thing that's starting to really transform businesses at the technical level. If you're looking for a connection, it is absolutely Hadoop. Even more properly HDFS, the data storage layer, frankly, and that as a standard, the different data tools agreeing upon. I agree, it's a huge enable, in fact, one of the questioners in the audience kind of poo-poo'd the whole Hadoop piece of it, but I don't see this as possible without Hadoop or some construct like Hadoop. I don't at all, I wonder if Hadoop is a loaded term, as you know, used to mean this batch-map-produced thing. Frankly, it's an ecosystem now and it's all about having data in one place and accessible. And shipping the function to the data. Well, Jeff Kelly pointed out on the panel, plug for our analyst, Jeff Kelly at Wikibon, but he pointed out that it's about interoperability. Again, that's an old word that's been kicked around so there's no one store for everything. Hadoop might be the popular one for store and stuff and low latency, whether it's in Apollo or Hortonworks, but I think there's clouds working together, data sources, we're so breaking down the data silos, I think is a very good way to look at it. What about the data science of all this, you had a new role here. How do you see things like this so-called industrial internet changing the data science activities? Well, I think to do data science, you first need to get the data, which sounds trivial, but you know, you talk to anybody in big data and they're telling you, man, I'm spending 80% of my time getting data out of databases, extracting it, cleaning it, getting it where it wants. So my philosophy on data sciences, we can get a lot of leverage out of algorithms, but we need to get the data there in the first place. And I think that's one of the key things we're hearing here, we get the data out of the machines so we can get the leverage from the science. So with data quality, it sounds like it is a bigger theme, we're actually doing an MIT data quality conference in July and it's in the boring but important category, it sounds like it's boring but vital. That's one of the areas where we've really got to see more tools, more automation, absolutely. Okay, we're inside the CUBE, I've done, congratulations on your new role, Vice President of Data Science, start up here in Silicon Valley, also the chair, co-chair of Strata, very successful big data conference going on, it's what, third year, third season, huge success, we'll see you there, the O'Reilly Stratoconference in Hadoop World. This is theCUBE, we'll be right back with our next guest after the short break, John Furrier and Dave Vellante, we'll be right back.