 Howdy, and welcome to theCUBE. My name is Peter Burris, and I'm broadcasting from ours, Silicon Angle Studios in Silicon Valley. Very excited here to talk about some of the research that's going on in Wikibon Weekly, especially in anticipation of next week's Hadoop Summit. I'm here with George Gilbert, who's the analyst of Wikibon who looks very closely at big data. George, welcome. Good to be here, Peter. So this is gonna be an ongoing series of conversations between Wikibon analysts as part of theCUBE's effort to increase programming, and that we're going to call Wikibon Weekly. So we've got a couple things to talk about. George, I think that the first thing we want to talk about is the event next week. We're at a crossroads in the world of Hadoop. It's been out for a decade now. It's, we've got Spark that just hit and is generating a lot of excitement. We've got still a search for business models that are gonna generate and sustain the type of investment. But we're also getting some indications from our conversations with clients that the adoption of Hadoop and related technologies at least in production is a little bit more segmented than many people might have thought. What do we need to see from the community next week to kind of kick this back in a gear? Well, every technology enterprise technology goes through a process of hardening and streamlining to make it more accessible. And the organizations that are most technically sophisticated can adopt it first and get the most benefits out of it. Standard technology adoption dynamics. But we haven't seen a step function in the accessibility of Hadoop in any time really in the last several years. So in other words, we're trying to reach beyond the Fortune 500 but we haven't really changed the product to reach companies who can't put five specialized admins. And by specialized, I mean different domains to run a 10 node pilot. And where I think we're gonna see that showing up is slowing adoption or more difficult traction in mid-market companies. And mid-market, let's just say, is probably below the US Fortune 500 because they just don't have the numbers of admins in their IT department to specialize to the degree that Fortune 500 companies do. So typically we're talking about a company that might have an IT budget of $50 to $60 million, maybe $100 million. That's a mid-market company by that definition. That's real, that's still a pretty sizable shop. Yes it is, but as we all know that 80% plus of that money is going towards maintaining the existing systems. We call them legacy systems, they call them the systems that work. And you're not gonna, everyone knows that something like Hadoop is necessary as a sort of a next generation strategic data platform or hub, but we have to make it, we have to reduce the friction to install it and to deploy it and widely utilize it. And so we're hearing more noises about the relative, relatively faster adoption of Hadoop in the cloud where some of that friction is reduced. And so in other words that there's a couple of paths that look like we got going forward at least for the mid-market. And the mid-market is where a lot of things end up going mainstream. One is that we can see the mid-market moving toward the cloud where they can run scripts and have Amazon or Microsoft or somebody else actually handle the administrative duties or that might start seeing applications start taking more center stage and new forms of packaging that speak directly to a particular business problem whether it be horizontal, vertical or what else it might be. You've recently done some research on the pain that the market's going through as it tries to develop these application concepts. Where are we right now? Okay, so there's a variety of different business models that are being experimented with, frankly, because the numbers aren't big enough to say we've settled on a dominant one for this point in the adoption cycle. And as I like to say, as Captain Rinoz said in Casablanca, we can round up the usual suspects and put them in their buckets. The vendors that are using consulting-led, semi-templated solutions are IBM, Palantir, Accenture, Teradata, among the largest. Teradata? Yes. Let's come back to that one. Okay. So we keep going. So Teradata, I went to their influencer summit last week and I was expecting to see a lot of speeds and feeds and, hey, we've still got the biggest, baddest decision support database and people who can't stretch Oracle properly will come to us. They were very surprising because they talked about an entirely different way of thinking about big data and transformational business solutions. It wasn't even really big data. It was if we want to be able to rethink how we can manage business processes that cross functions without regard to the source systems or the types of data, here's the underlying infrastructure we have to put together. So it's good to see Teradata who was the, obviously one of the granddaddies of the data warehousing BI era has basically jumped on board and is now trying or is becoming a thought leader also in some of the things that are happening with big data and the broader sense of the overall analytic space. Yeah, their message was really holistic in the sense that they said, look, you're going to have a lot of data that you can't tie together right away. You put that in the data lake, which is what all the heretics have been saying for years. And then you do discovery on that and you find where the relationships are and you do these, you ask these questions in relation to the business, these transformational business opportunities that you might have, where you have to track behavior across a website. You want to track which members, which customers or prospects are most influential in the community. And you might also want to understand manufacturing lots to see which items down at the lot level needed recalls. You know, that crosses many systems. So we've got the idea of the consulting and services led play, which basically somebody rolls in and says, don't worry, I'll build it for you and maybe you'll even let me administer it in some sort of outsourcing relationship. What's the second business model? Before we go off that one, it's worth saying that there's a spectrum even within that in that you can't go in upfront with a whole portfolio of solutions other than slides and that as you do more of the engagements, they get more repeatable and you can take the templates each time they get better and better from each engagement. And the next customer gets the benefit of an even richer, more repeatable engagement and it gets closer to a supportable application as opposed to a consulting engagement. But the key thing is you're starting with smart people who have good knowledge of tools as opposed to starting with promises about great tools that are easy enough to use by normal people. 100%, but also worth mentioning is the people who walk in with that knowledge of solutions and transformations are very different from the people who sell tools to infrastructure buyers. Sure, so let's talk about that second class which really is, to anticipate what you're gonna say, it really is the emerging software guys who are delivering many of the new platforms and trying to bring together, bring greater usability to bear on the platform set companies like Microsoft, Microsoft, SAP, Oracle and even Amazon and Google where to varying numbers or to varying extents, they have big data platforms and tools, let's say SAP with its HANA database and the HANA Vora database which bridges HANA and Hadoop, Amazon with its range of databases and tools which they are progressively integrating so that if you use one, it's actually much easier and much more functional to use them all and frankly the same with Microsoft and Microsoft has done the most work and is heading, is going to have the largest advantage in bridging what's in the cloud and on-prem in a completely compatible way with tools that make it transparent where your workload's going. So these are companies that, unlike the service guys, would start with people and then they had to do a lot of work and then they use tools, let's start with tools, accrete new value in the tools to make it more available to different people. I mentioned earlier, we mentioned earlier the idea that the other side of this is going to be packages up in applications that serve a more specific type of a business purpose. What's going on with the application space? Okay, so this one's kind of puzzling and we've talked about this before because when you're in SAP or in Oracle and you own both the applications and your platform, the development tools, the database, you are in the cat bird seat and being able to say, I control that whole stack. So if I want to write applications that really demand or make great demands on predictive analytics, large data sets and I want that to be realizable out of the box, you would expect those vendors to have done it first and be way ahead of the rest and they haven't. And I think there's a cultural problem partly at SAP without putting names to anyone since I have or used to have relationships with some of them. Their mindsets are systems, even if they led an application company. And the other part is that when you built traditional ERP and enterprise applications, you built a data model, a bucket for all the data and then the UI or the workflow to manage that data. The data was static. That was, you spent years modeling all that. The way these new big data applications work is you have data coming in and the data shape is changing all the time, but you're extracting the logic for the application out of what the data tells you. It's inverse. So let's talk about that because we actually have mentioned this and this is in some of your research that the historical norm in the application world was start with process that is associated with relatively rigid data model. Tune the heck out of it so that you can scale it and deliver it broadly. That's kind of the operational approach. But when we start talking about some of these new applications for engagement, for IoT, now we're talking about the process of not being as well understood. The algorithms and the workflows and the patterns emerge out of the data and we have to rapidly respond to changes that they reveal. That's the key word you said in there, emergent. It's just, and that might be why the big companies can't wrap their heads around it because it's the inverse of what was going on. It's not top down. It's not cathedral. It's bizarre. Very much so. And I don't mean bizarre as in weird. I mean, you know, the cathedral of the bizarre. Right, going back to a number of years. Okay, so we've got those three. We've got service, we've got platform, and we've got this emerging application space, even though it's very much in transition right now. We're still learning about it, but it's very exciting. The last one is this converged infrastructure. You know, slamming infrastructure together so that the infrastructure itself is simpler to use. You don't need all the administrators to do that, to pile it with the n number of nodes. You still need somebody, but you can put more of your professional talent up into this actually administering the higher level parts of the stack. Yeah, the best quote I heard on this was when you have a completely software defined network, you don't need to take a network cable and unplug it from one place and plug it into another place. That's an API. And once you have that software defined infrastructure, and this is where our colleagues, Brian Graceley and David Floyer have done so much really cool work, the operational characteristics of your private cloud resemble more of what's in the public cloud now. Or just your data center. Okay, your data center. You're not meeting the public cloud in terms of costs or operational automation of the software above it, but that data center operation, the hardware and the operation of it become much more standardized. You're much more alike what you see from the public cloud. And some of the vendors here include Dell. HP, the HP sort of, the enterprise version now that's been spun out. Sure, Oracle, IBM's still a big part of that play. So as we think about these efforts to cohere or converge around a set of business models that have the potential anyway to start translating investments in the various open source Hadoop-oriented big data stacks, what are you looking for next week from some of these vendors? Or what do you think is gonna start getting people excited and get them waiting for 2017? Okay, so there's a couple things. We really, really need to do better with cost of ownership. I like to say that, and I think the Hadoop guys, I don't know if this came tongue in cheek or that they name every project in the Apache, she kind of continuum or portfolio, a different animal. And then the guys that, the product that actually keeps them coordinated is called a zookeeper. So the zoo and the zookeepers really have to learn to play well together. Are we gonna see some of that next week at Hadoop, you think? I think we're gonna see incremental progress. Okay, so first recommendation is look for incremental progress in how the packaging is simplifying the interfaces and the activities of administering these things. And just as an example, at Spark Summit, we had Doug cutting of Cloudera on, actually as our kickoff speaker. On theCUBE. Yeah, on theCUBE. And we asked him about that interoperability challenge and, because his VP of engineering had told us previously that they spend, they budget 50% of their R&D cycles for interoperability and only the other 50% for functionality. And Doug was like, yeah, it's some basic stuff like security, making sure we have common permission. So this is important to the customers that are going to the users and the doers that are gonna go there. But it's also important for the members of the ecosystem. All right, what else, in addition to ease of use and simplification? We'll wanna see how much of a difference can a cloud deployment make in terms of the friction? And I think the best way to measure that is, how many specialized admins do you need for a pilot and then for production? In other words, we've heard numbers five or six different differently skilled admins for a 10 node pilot. That's a, you know, like an 80% OPEX kind of workload. We wanna hear what that number becomes when you go to 25, 50, or 100 nodes. In a cloud setting. Well, no, first on-prem. And then to see what that number looks like in the cloud. And that's gonna be crucial to making the mid-market happen. I know we're gonna be looking a lot for the new models that some folks have about thinking about applications and the evolution of application as a technology set within the big data world. Because like Cloudera has always said, and Hortonworks as well, all the distro vendors have said, look, we want a common platform. We will differ at the edges in terms of how we make it, you know, more manageable or more industrial strain set, the storage layer, like with MapR. But we want this to be a common platform like Linux so application vendors can write to it. But the common platform is forked beyond the ability for any sane application. Well, there's a big difference between an operating system and a control program for hardware and a set of technologies that are intended to capture extremely complex data from a wide variety of sources to solve extremely complex problems. You are probably putting your finger on the very weakness that represents this notion of a common platform that abstracts hardware, which is an operating system, and one that is essentially an application server, you know, or application services. And those have been now forking to the point where it's very difficult to support more than one. Yeah, so I'll have one more to add to that, George, and that is I am going to, I'm also going to be going there, and I know you're going to be on the cube there, fair amount. I also am going to go there, and I intend to see if we can discover more about the role that data value is playing and guiding some of these decisions. How are businesses, first of all, are they starting to think in terms of data value as a discrete thing to factor when they make investment decisions in technology? And how are they then using that concept to inform their other investment decisions, their operation decision, and their strategic decisions? We're getting evidence in our research with clients that that's starting to happen, so I'm going to be looking for that. It's funny that you say that because Teradata, you know, who I would have picked as the number one sort of big iron kind of infrastructure vendor said, when we go and engage with a customer and we assess the first business case that we're going to do together, the first thing we do is figure out what is the payback on turning that into an analytic solution, and we only go for the big ones. All right, so let's wrap this up. Once again, my name is Peter Burris, George Gilbert, analyst at Wikibon, and we are broadcasting here from our studios in Silicon Valley in anticipation of next week's Hadoop Summit. The cube will be there by all means, come and check us out at SiliconANGLE TV, or if you're at the show, come to the booth, come to the studio, meet George, have conversations, grab a cup of coffee, whatever else it might be, and this is the cube, Wikibon, thank you very much for listening, and let's make progress on moving the needle on big data and big data adoption and utilization.