 It's May 26 and you're with theCUBE. My name is Peter Burris. I'm the chief research officer of Wikibon Research and SiliconANGLE Media. And I'm joined today with George Gilbert. Now what we're going to do is because what we are committed is to try to ensure that we are serving our communities with the absolute best signal we possibly can about the big happenings in Silicon Valley that on a weekly basis, we intend to bring you a CUBE conversation around significant Wikibon research findings. And we'll do that here in Silicon Valley with so many of the members of the community, some of the thought leaders that are so proximate to where George and I sit right now. So George, today we're gonna talk about big data and recently we did a significant piece of research on what was going on in the big data marketplace. And Wikibon's done this for four years now and the big change this year was that we expanded it out to look at new types of application patterns that are emerging in the overall big data marketplace. Why don't you take us through some of those seminal changes and how we see those waves evolving over the course of the next 10 years? Okay, and let me put looking at the last couple of years as a fact base and then extrapolating sort of in a straight line because markets evolve more with inflection points as we hit major advances. And the first inflection point was going... Patrick, why don't you give us slide one? First inflection point was going from data lakes, I'm sorry, data warehouses to data lakes. They were sort of the antithesis of the data warehouse where you collected all the data and all its messiness because you needed that data to create the models. The second stage was applying those models to typically consumer facing applications. In real time, so that you could anticipate and influence and personalize each interaction. That was a major advance. Canonical examples of that are managing, optimizing a customer's experience across digital channels. The next level and each of these is an S-curve, but when you pile them on each other, successively they become one big S-curve. And so the third one is what we call intelligent systems of... Intelligent systems of engagement. Well, it's the one after that, I'm sorry, self-tuning systems where the key is it's not necessarily an application interacting with a consumer. It could be two or more applications interacting with each other. Fraud is a great example where someone will request a credit authorization and a whole bunch of different applications essentially have to collaborate transactionally and say, this is a good request or this is something that should be denied. And that requires much deeper integration between the predictive analytics, the new type of technology. So George, as we think about some of these different ways, what we're really talking about is that with each successive generation of these technologies, we're taking on more complex applications or more complex business problems with a toolkit that is ideally more coherent. And so as we look forward over the course of the next 10 or years or so, those are the two trends that we're really focusing on is that the problems become better understood and the tooling to provide solutions for those problems becomes more coherent. But that ultimately requires that we look at things from a framework perspective. Patrick, why don't we bring up the next slide? And this is a, Brian Arthur is a Stanford professor. He's now down at Santa Fe Institute, the complexity thing down there. And for those of you watching, if you have not read any Brian Arthur, he talks about things like path dependencies in the economy. He was one of the original thinkers around the whole notion of network effects within economy. One of the first gentlemen to actually coherently describe how technologies evolve over time. He was the one who came up with the whole notion of one winner or it all comes down to one winner. Winner take most. You know, and this book where he talks about adaptive, like going from simplicity to adaptive stretch when you try and take a technology almost beyond its original principles and it gets all gnarled into each other. The example we use I think in the slide is taking a sapling into one of those gnarled old trees that's all bent in on itself. And in fact, he introduced this in a book called The Nature of Technology, which was very much like Darwin's evolution book, but applied to technology. And the key point is at some point these platforms that we stretch to take new use cases collapse on themselves out of complexity and we have to start new. So let's make sure we understand that George. So we have a particular platform particular technology and people to apply it in the problem. As we gain experience with the application of that to the problem, people specialize, refine and they stretch it into different shapes until it becomes obvious that it's no longer gonna work and we have to start over. Let me give you an example. Well, we all know the story of Hadoop actually at Google which wasn't called Hadoop. It was MapReduce and the Big Table file system. But it was designed as a web crawl index. Then what happened was like Yahoo and Facebook and others said, hey, this is really good for huge scale data warehousing. And they added a whole slew of engines and utilities to make it usable in that scenario. But that's when it got incredibly complicated and it really couldn't be maintained by companies other than those of that size. And so we're now back to rethinking the platform in simplicity terms. So the collapse has taken place and now we're on a new platform. It hasn't collapsed, but as we'll see. It hasn't completed, but it's taking, it's starting to. We'll see evidence of that. Okay, great. So that leads ultimately. Why don't we go to the next slide, Patrick? Our producer, Patrick, is here in our Silicon Valley studio. And thank you very much for helping, Patrick. So if you take a look at this next slide, George, this is really how this plays out from a revenue standpoint. And what we're looking at here is we're looking at how the successive generations of technology that you discussed are gonna be manifest as revenue. So we've gone through, we're in the midst of the data lake orientation, which is about, as you said earlier, moving to, moving our analytics thought processes or approaches to doing things to a, not an extract transform load, but extract load and transform. Getting it in one place in the data lake so we can run a lot of tools against it, typically in batch mode. And the critical point is, and sometimes we overlook it because we talk about the volumes and the variety, but the key point, and this is where Yahoo hit the wall was, hey, you know, if we're trying to build a model of what's likely to happen, we want all the data, not the summarized or refined data because it's the outliers that make the models work really well. And then the second class of applications, these intelligent systems of engagement, they need the models from these data lakes. And so, you know, the difference there is we have to take those models in some sort of production pipeline and connect the consumer facing applications with the systems of record. And then ultimately we're going to get to, as you said earlier, this notion of the self-tuning systems of engagement where we're not only refining against a model, but the models themselves are being refined by the system through machine learning, et cetera. And a great example of this is, like GE predicts industrial IoT platform where they've got models in the cloud where they're collecting all the data from an ecosystem of devices and gateways and sites. And that gives you the big picture. And then they take those models and they push them down to the site level or the gateway level or even the device level. And those models then collaborate rather than you wouldn't have time to get the data all the way up into the cloud and all the way back to fine-tune the operation. So here's the big challenge, George. We have a forecast that shows enormous problems within the overall big data marketplace. And we have a lot of examples of companies starting to generate some real returns out of this. But we are still early on in the adoption of a lot of these methods and approaches and tools and new forms of business and business models. So Patrick, why don't we go to the next slide? George, you found some interesting data from one of the members of the Wikibon community that shows where we are for real in the adoption of some of these technologies. Why don't you take us through it really quick? Okay, this was actually the most astonishing market research data I've come across in a long time. And it's fact-based. It's not based on surveys. It's not based on survey monkey. This is a company called Spider Book that does, in a way, social CRM where they can help companies target who their most likely prospects are down to the priority of what the product or service you're selling, the key people you wanna call on. But it now turned it onto the Hadoop market to find out which companies are, how deep in Hadoop to sum it all up. And there are less than 500 companies in the United States who have either 100 terabytes or more or 100 nodes or more, who have 12 Hadoop engineers on staff. Hadoop is all we've been hearing for five plus years. And the reality is, even more astonishing, if you go back into the industry cohorts where Hadoop has taken root, it's something like 85, oh, 486 of the top 713 companies are in tech. I mean, you break it out into software, into professional services, add tech. But if you add all those up and then you look at the few who are in like telco or oil and gas or whatever, it's like a rounding error. The key point to take away from this is that tech companies have the technical wherewithal, the skills, the people to operate this stuff. Hadoop has a long way to go to simplify itself into a platform that mainstream customers can consume. Because if not, they'll choke on the complexity. Now, this is a statement about installations and companies with installations. It's not a statement about the depth or the profundity of the way that these applications are being used. So oil and gas may be a limited slice, but there may be significant dollars there and that is reflected, go ahead. I do want to actually address that because it's not in the charts, but it turns out that most of the money, almost a third appears to be in fraud applications and almost a third in IoT. So great data. Now let's see if that's reflected also in the sell side. So we talk about a degree of immaturity or early adoption of Hadoop, even though this has been in place for quite some time, but the data on the sell side reflects the same thing. So Patrick, why don't we go to slide number five? And if we look at this, we can see that the marketplace today while important is about $22 billion. So it's sizable that we are not seeing a significant amount of concentration. It's still a very, very diffused marketplace across a large number of suppliers in the software, hardware and services arena and that's indicative of a marketplace that requires some maturing, some seasoning, if you will. But that's happening very quickly. Now George, what I want to do is let's turn our attention to the reasons why we're still early on in the process and we think our research shows that there's two reasons. Reason number one is we've got a long way to go on how we support administrators who are trying to set these jobs up and ensure that they work predictably within a business, but also a long way to go on the developer side who still have not jumped into the Hadoop or the Big Data Pool in a big way and started generating significant new types of value out of this big data ecosystem. The two classes of administrators and two classes of developers. Developers are ISVs and then the corporate developers and the corporate developers need even more help on the simplification, unification side and essentially the same with administrators where you've got the onsite maybe moving to hybrid but then there are those cloud hosted service providers where they've created some repetition around their deployments. And just a quick, quick update. We're going to be releasing this month our big data in the public cloud research. That's not here now, but we will have that later on this month. So let's take a look at the challenges, the fundamental challenges. First the administrator of big data applications or platforms and then looking at the developer side. So Patrick, why don't we go to the next slide? Slide number six. So George, what are we looking at here? Okay, so this one's really interesting. It explains from an administrator's point of view on the y-axis all the levels of the stack that need management. Starting from the network, the data center, compute and storage, middleware applications and these are not a creative over time because the notion of compute and storage changes when we have ephemeral elastic compute and then storage associated with a node as opposed to the network or things like that. So actually what this first slide shows is if we really want to simplify administration of this big data stack for an admin, you want a single pane of glass. Now, Patrick, if you can flip us to the next slide, I want to show you an example of adaptive stretch, which is where we take, I collected, before I ran out of batteries on an airplane flight, I just collected a bunch of consoles from, you know, half a dozen or a dozen products before I got bored and I stuck them into the box for performance management. These are performance management consoles. They don't address security, they don't address availability or change management. You're just looking at the Tower of Babel that exists in performance management and I don't want to cast dispersions too widely but even Apache projects come with their own native consoles. I mean, we do have efforts to unify them but I might add that one of the major Hadoop distributors assigns 50% of the development headcount for each project that they add to the release to interoperability and if that's not an index that indicates these projects were not designed to work together out of the box, I don't know what other index, you know, we could provide. And that's just the administrator side. And that's just the performance for the application level administers. Right. And I should add one more thing that also the fundamental business model of most open source companies is to make money helping their customers run the software because that's the hard part now. Infrastructure software has been commoditized for the most part but the problem is when you have boundaries between the components, that's where all the things go wrong. You have different security model, you have a different way of handling faults, that sort of stuff. The consoles only help you within a component. There are precious few that help you across component, precious few if any. So let me summarize very quickly, George and then we got to get on to the developer side that today we have a very incoherent stack that is put forward by a very incoherent and diverse group of providers, each of which is making or making most of their money by providing administrative tooling and therefore they have an incentive for these tools to not work together because they want to keep everybody within their own stack. Yes, for the ones who are trying to make a deep stack. All right, so let's go to the developer side. So moving on here, on the developer side, if we go to the next slide, Patrick, on the developer side we're talking about really where the rubber is going to meet the road from liberating value out of these technologies. It's nice to have the administrators have coherent tooling but that's really a way of lowering the costs of administrative so that they could do more. Here on the developer side we're talking about how we're going to generate the new levels of business value and solve these very complex problems that everybody's talking about, big data solving. But the problem is that's where things get really ugly. Yes, and really ugly would be a euphemism. The problem is right now that just the way these like open source projects come with their own sort of admin models. They also come with their own development models. And so what we started out with, what we thought was this great liberation where you had in traditional analytic DBMSs they had every type of sort of analysis that they thought they could fit in a single purpose engine. So it would be easy to administer and easy to develop for. Hadoop blew that wide open and said, let a thousand analytic engine flowers bloom. And that sounded wonderful. And though when we got to the promised land we realized we introduced a level of complexity that we never imagined. The mix and match had a cost. And that has to be rationalized. And so if we think about the implications of this and I'm going to skip the next slide. Let's go straight to slide number 10, Patrick. And let's talk about ultimately where we see this playing out in the dynamics of the marketplace. And the thing to point out here is first off that the hardware market is not growing as fast as historical enterprise application businesses have grown. That it's staying relatively flat. But we are seeing an explosion in the professional services opportunity here precisely because of the need for really smart, really high quality labor to make this stuff work. But we do believe, George, that over the course of the next couple of years we're going to start seeing vendors, suppliers, professional services people delivering software that will do a better job of bringing more coherence to the marketplace. Let me add two anecdotes to that. I spoke with Bob Picciano at IBM Insight. I think he runs their analytics business. And then I also spoke with the guy at Accenture just yesterday, the day before, who runs their analytics business for EMEA. And both had the same comments about how to package these applications. They said it's very difficult. What we really do ultimately is a template. And that requires that the customer has at least a lowest common denominator of data to fulfill what they call an analytic record, which is analogous to the old data warehouse data models. And the point there is, I think as you're going to touch upon later, we can put the data, some minimum amount of data together, but we don't really know the whole process for how it works. And so this is going to be a, when you look at the mix between professional services and software or particularly packaged applications, it's going to be much heavier on the professional services side because it's harder to replicate this and stamp it out like, you know, R3 or Oracle apps. Right, but we still think it's going to happen in some degree. It will just take a while. And George, you've actually written some research. We've identified thus far three different, what we'll call archetypes. We're still studying whether or how the characteristics of different applications are going to come together. But we have, we've seen clients do what we call micro, big data micro apps, where they're taking like fraud detection and they're injecting it into existing operational apps. Another class is what we call big data departmental apps, which is effectively a shared service type of application being then exploited by a lot of different parts of the business, cybersecurity being a great one of that. That is another pattern or template. And then the third template we're calling right now, big data ecosystems apps. And these are the grand unified theory kinds of applications where we're doing dynamic pricing against e-commerce and demand management a lot, early, early, early on stuff. But it starts to point out how some of these applications are likely to come together. And what we're really looking for, let's go to slide number 12, Patrick. What we're really looking for is over the course of the history of the computing industry, enterprise software has followed a repeatable process, a repeatable pattern of how it's matured. So if you go back many, many years, it started out in batch because the technology supported that. Punch cards. Punch cards would be an example. That's where batch came from, right? We had batches of cards that we then ran through the machines and over time as the networking and the hardware, but especially the system software, the reliability of the system software improved, then we were able to go to interactive and we were able to start tying systems together and that was the basis for OLTP. And eventually we started doing things that looked like streaming applications, process control, very, very complex OLTP like systems that ran with minimal human intervention and just kept going. We're going to see a similar type of dynamic play out in the big data marketplace as well. Why don't you take us through that just very quickly and we'll show where it's all going to end up. Okay, so the big lesson is that when you move from sort of one era in computing to another, whether it's the programming model, whether it's the relative price performance of hardware components, it's not that anyone goes away. It's the shift in the mix that changes how you do things. And this is the concept of adaptive stretch again. Yes, excellent point. So we took batch about as far as we could and when we went to the first online applications where you wanted like a travel agent, being able to buy an airline ticket rather than having to go to the airline's office, that was basically a stretch for batch and required a whole new infrastructure for an active. Anyway, then we saw the whole ERP CRM supply chain era really took interactive off. Now what we're seeing is what started, yes, in the process control area, but more fundamentally in managing software and hardware infrastructure with streaming machine data. Now we're taking that and applying it very broadly to the internet of things where it's continuous. And I should say streaming is the real-time acquisition and processing of data. Continuous apps are ones that combine batch interactive and streaming. So Patrick slide number 13. And so what you're basically saying, George, is that we're going to see the big data marketplace follow a similar type of a pattern where the technologies associated with batch and interactive and streaming start coming together and create a coherent stack that administrators have a straightforward way of administering. Developers will have a more coherent way of building applications on top of. It may never be as simple as it was in the OLTP database arena because of the complexity of the problems that we're fundamentally trying to solve. But nonetheless, we'll see all those technologies come together over the course of the next 10 years and into what we're calling continuous big data process. And if I would only add one thing, it's like if we used to deal with streaming over here and batch over here and interactive over here, and we're moving towards a layer that hides you from all those. Got it. So the next slide quickly summarizes what we mean by this. If we take a look at how the marketplace is going to evolve over the course of the next 10 years, basically that yellow line with the red dots on it is how we anticipate seeing the workloads that are being invested in in big data moving towards this streaming, continuous, coherent platform that we think is going to start picking up speed from a developer, from a delivery and vendor standpoint over the course of the next few years. And that it cuts across the different classes of apps that we talked about such as data lakes which can feature continuous processing, then intelligence systems of engagement, and finally self-tuming systems of intelligence that all those can see a shift. The application designs can be the same, but underneath we're moving towards continuous processing. So George, why don't you take 30 seconds, tell us what's on the horizon for your research. What are you looking at over the course of the next few months? Basically what we haven't seen a lot of is trying to create a framework for how big data applications might emerge. I mean, it just hasn't gotten a lot of attention. There's been an intense focus on machine learning and streaming. So the infrastructure technologies. And what you're focusing your attention on increasingly is that critical path for how we're going to move from problem to operating software. But I would add that we want to come at it from two angles because when we're getting really good bi-directional information flow with Databricks, the Spark guys, and they've started to talk about so how would a fraud app change when you're using continuous processing? And then we can go to the fraud guys, we can go to some of the fraud application developers or vendors and say, well, what would you do differently with this capability? All right, that was my timer, George. We're done. That's your timer. So this has been a cube conversation with Wikibon. We want to do this on a weekly basis as SiliconANGLE evolves, as the application of the technology that SiliconANGLE delivers evolves. We will bring you the signal from the noise here in Silicon Valley in SiliconANGLE Media's new studio on the corner of San Antonio and Charleston with the phenomenal crew from theCUBE. So thanks very much for watching theCUBE and Wikibon Weekly, and we'll talk to you again soon. Thank you very much.