 Live from San Jose, it's theCUBE. Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and it's ecosystem partners. What am I gonna spend time next, 15, 20 minutes or so talking about? I'm gonna answer three things. Our research has gone deep into where are we now in the big data community? Well, I'm sorry, where is the big data community going? Number one, number two is how are we gonna get there? And number three, what do the numbers say about where we are? So those are the three things. Now, since we wanna get out of here, I'm gonna fly through some of these slides. But again, there's a lot of opportunity for additional conversation because we're all about having conversations with the community. So let's start here. The first thing to note when we think about where this is all going is it has to be bound. It's inextricably bound up with digital transformation. Well, what is digital transformation? We've done a lot of research on this. This is Peter Drucker, who famously said many years ago that the purpose of a business is to create and keep a customer. That's what a business is. Now, what's the difference between a business and a digital business? What's the business between Sears Robot or what's the difference between Sears Robot and Amazon? It's data. A digital business uses data as an asset to create and keep customers. It infuses data in operations differently to create more automation. It infuses data in engagement differently to catalyze superior customer experiences. It reformats and restructures its concept of value proposition and product to move from a product to a services orientation. The role of data is the centerpiece of digital business transformation. And in many respects, that is where we're going is an understanding and appreciation of that. Now we think there's going to be a number of strategic capabilities that will have to be built out to make that possible. First off, we have to start thinking about what it means to put data to work. The whole notion of an asset is an asset is something that can be applied to a productive activity. Data can be applied to a productive activity. Now there's a lot of very interesting implications that we won't get into now. But essentially, if we're going to treat data as an asset and think about how we can put more data to work, we're going to focus on three core strategic capabilities about how to make that possible. One, we need to build the capability for collecting and capturing data. That's a lot of what IOT is about. It's a lot of what mobile computing is about. There's going to be a lot of implications around how to ethically and properly do some of those things, but a lot of that investment is about finding better and superior ways to capture data. Two, once we are very able to capture that data, we have to turn it into value. That, in many respects, is the essence of big data. How we turn data into data assets in the form of models, in the form of insights, in the form of any number of other approaches to thinking about how we're going to appropriate value out of data. But it's not just enough to create value out of it, to have it sit there as potential value. We have to turn it into kinetic value to actually do the work with it. And that is the last piece. We have to build new capabilities for how we're going to apply data to perform work better, to enact based on data. And we've got a concept that we're researching now that we call systems of agency, which is the idea that there's going to be a lot of new approaches, new systems, with a lot of intelligence and a lot of data that act on behalf of the brand. I'm not going to spend a lot of time going into this, but remember that word because I will come back to it. Systems of agency is about how you're going to apply data to perform work with automation, augmentation and actuation on behalf of your brand. Now all this is going to happen against a backdrop of cloud optimization. I'll explain what we mean by that right now. Very importantly, increasingly, how you create value out of data, how you create future options on the value of your data is going to drive your technology choices. For the first 10 years of the cloud, the presumption is all data was going to go to the cloud. We think that a better way of thinking about it is how is the cloud experience going to come to the data? We've done a lot of research on the cost of data movement and both in terms of the actual out-of-pocket costs, but also the potential uncertainty, the transaction costs, et cetera, associated with data movement. That's going to be one of the fundamental pieces or elements of how we think about the future of big data and how digital business works is what we think about data movement. I'll come to that in a bit. But our proposition is increasingly, we're going to see architectural approaches that focus on how we're going to move the cloud experience to the data. We've got this notion of a true private cloud, which is effectively the idea of the cloud experience on or near premise. That doesn't diminish the role that the cloud is going to play in industry or doesn't say that Amazon and AWS and Microsoft Azure and all the other options are not important. They're crucially important. But it means we have to start thinking architecturally about how we're going to create value of data out of data and recognize that that means that we have to start envisioning how our organization and infrastructure is going to be set up so that we can use data where it needs to be or where it's most valuable. And often that's close to the action. So if we think then about that very quickly, because it's a backdrop for everything, increasingly we're going to start talking about the idea of where's the workload going to go. Where's workload the dog going to be against this kind of backdrop of the divorce of infrastructure. We believe that, and our research pretty strongly shows that a lot of workloads are going to go to true private cloud. But a lot of big data is moving into the cloud. This is a prediction we made a few years ago and it's clearly happening and it's underway. And we'll get into what some of the implications are. Now again, when we say that a lot of the big data elements, a lot of the process of creating value out of data is going to move into the cloud, that doesn't mean that all the systems of agency that build or rely on that data, the inference engines, et cetera, are also in a public cloud. A lot of them are going to be distributed out to the edge, out to where the action needs to be because of latency and other types of issues. This is a fundamental proposition and I know I'm going fast but hopefully I'm being clear. All right, so let's now get to the second part. This is kind of where the industry's going. Data is an asset, invest in strategic business capabilities to appreciate, to create those data assets and appreciate the value of those assets and utilize the cloud intelligently to generate and ensure increasing returns. So the next question is, well, how will we get there? Now, right now, not too far from here, Neil Raiden, for example, was on the show for yesterday. Neil made the observation that as he wandered around he only heard the word big data two or three times. The concept of big data is not dead. Whether the term is or is not is somebody else's decision. Our perspective very simply is that the notion is bifurcating and it's bifurcating because we see different strategic comparatives happening at two different levels. On the one hand, we see infrastructure convergence. The idea that increasingly we have to think about how we're going to bring and federate data together both from a systems and a data management standpoint. And on the other hand, we're going to see infrastructure or application specialization that's going to have an enormous implication over the next few years if only because there just aren't enough people in the world that understand how to create value out of data. And there's going to be a lot of effort made over the next few years to find new ways to go from that one expertise group to billions of people, billions of devices. And those are the two dominant considerations in the industry right now. How can we converge data physically, logically? And on the other hand, how can we liberate more of the smarts associated with this very, very powerful approach so that more people get access to the capacities and the capabilities and the assets that are being generated by that process. Now, we've done at Wikibon probably, I don't know, 18, 20, 23 predictions overall on the role that are on the changes being brought by digital business. Here, I'm going to focus on four of them that are central to our big data research. We have many more, but I'm just going to focus on four. The first one, when we think about infrastructure convergence, we worry about hardware. Here's a prediction about what we think is going to happen with hardware. And our observation is we believe pretty strongly that future systems are going to be built on the concept of how do you increase the value of data assets? The technologies are all in place. Simple parts that more successfully bind specifically the role that storage and network are going to play together. Why? Because increasingly, that's the fundamental constraint. How do I make data available to other machines, actors, sources of change, sources of process within the business? Now, we envision, or we are watching before our very eyes, new technologies that allow us to take these simple piece parts and weave them together in very powerful fabrics or grids, what we call unigrid. So that there is almost no latency between data that exists within one of these, call it a molecule, and anywhere else in that grid or lattice. Now again, these are not systems that are going to be here in five years. All the piece parts are here today. And there are companies that are actually delivering them. So if you take a look at what Micron has done with Melanox and other players, that's an example of one of these true private cloud-oriented machines in place. The bottom line, though, is that there is a lot of room left in hardware, a lot of room. This is what cloud suppliers are building and are going to build. But increasingly, as we think about true private cloud, enterprises are going to look at this as well. So future systems for improving data assets. The capacity of this type of a system with low latency amongst any source's data means that we can now think about data not as a set of sources that have to be each individually, each having some control over its own data, and syncs woven together by middleware and applications, but literally as networks of data. As we start to think about distributing data and distributing control and authority associated with that data more broadly across systems, we now have to think about, what does it mean to create networks of data? Because that, in many respects, is how these assets are going to be forged. I haven't even mentioned the role that security is going to play in all of this, by the way. But fundamentally, that's how it's likely to play out. We'll have a lot of different sources, but from a business standpoint, we're going to think about how those sources come together into a persistent network that can be acted upon by the business. One of the primary drivers of this is what's going on at the edge. Mark Andres infamously said that software is eating the world. Well, our observation is great. But if software is eating in the world, it's eating it at the edge. That's where it's happening. Secondly, that this notion of agency zones, I said I'm going to bring that word up again, how systems act on behalf of a brand or act on behalf of an institution or business is very, very crucial because the time necessary to do the analysis, perform the intelligence, and then take action is a real constraint on how we do things. And our expectation is that we're going to see what we call an agency zone or a hub zone or a cloud zone defined by latency and how we architect data to get the data that's necessary to perform that piece of work into the zone where it's required. Now, the implications of this is none of this is going to happen if we don't use AI and related technologies to increasingly automate how we handle infrastructure. And technologies like blockchain have the potential to provide an interesting way of imagining how these networks of data actually get structured. It's not going to solve everything. There's some people that think the blockchain is kind of everything that's necessary. But it will be a way of describing a network of data. So we see those technologies on the Ascension. But what does it mean for DBMS? In the old way, in the old world, the old way of thinking the database manager was the control point for data. In the new world, these networks of data are going to exist beyond a single DBMS. And in fact, over time, that concept of federated data actually has a potential to become real. When we have these networks of data, we're going to need people to act upon them. And that's essentially a lot of what the data scientist is going to be doing. Identifying the outcome, identifying the data that's required, and weaving that data through the construction and management manipulation of pipelines to ensure that the data as an asset can persist for the purposes of solving a near-term problem or order whatever duration is required to solve a longer-term problem. Data scientists remain very important, but we're going to see as a consequence of improvements in tooling capable of doing these things, an increasing recognition that there's a difference between a data scientist and a data scientist. There's going to be a lot of folks that participate in the process of manipulating, maintaining, managing these networks of data just to create these business outcomes. But we're going to see specialization in those ranks as the tooling is more targeted to specific types of activities. So the data scientist is becoming to become or will remain an important job, going to lose a little bit of its luster because it's going to become clear what it means. So some data scientists will probably become more, let's call them, data network administrators or networks of data administrators. And very importantly, as I said earlier, there's just not enough of these people on the planet. And so increasingly, when we think about, again, digital business and the idea of creating data assets, a central challenge is going to be how to create the data or how to turn all the data that can be captured into assets that can be applied to a lot of different uses. And there's going to be two fundamental changes to the way we are currently conceiving of the big data world on the horizon. One is, well, it's pretty clear that Hadoop can only go so far. Hadoop is a great tool for certain types of activities and certain numbers of individuals. So Hadoop solves problems for an important but relatively limited subset of the world. Some of the new data science platforms that we just talked about that I just talked about that are going to help with a degree of specialization that hasn't been available before in the data world will certainly also help. But it also will only take it so far. The real way that we see the work that we're doing, the work that the big data community is performing, turned into sources of value that extend into virtually every single corner of humankind is going to be through these cloud services that are being built and increasingly through packaged applications. A lot of computer science is, it has to be put, it still exists between what I just said and when this actually happens. But in many respects that's the challenge of the vendor ecosystem. How to reconstruct the idea of packaged software, which has historically been built around operations and transaction processing, with a known data model and an unknown, or with a known process and some technology challenges, how do we reapply that to a world where we now are thinking about, well we don't know exactly what the process is because the data tells us at the moment that the action's going to be taking place. It's a very different way of thinking about application development, very different way of thinking about what's important in IT and very different way of thinking about how business is going to be constructed and how strategies are going to be established. Packaged applications are going to be crucially important. So in the last few minutes here, what are the numbers? So this is kind of the basis for our analysis. Digital business, role of data as an asset, having an enormous impact on how we think about hardware, how we think about database management or data management, how we think about the people involved in this and ultimately how we think about how we're going to deliver all this value out to the world. And the numbers are starting to reflect that. So I want you to think about four numbers as I go through the next two or three slides. 103 billion, 68%, 11% in 2017. So of all the numbers that you will see, those are four of the most important numbers. So let's start by looking at the total marketplace. This is the growth of the hardware, software and services pieces of the big data universe. Now we have a fair amount of additional research that breaks all these downs into tighter segments, especially in software side. But the key number here is we're talking about big numbers, 103 billion over the course of the next 10 years. And let's be clear, that 103 billion dollars actually has a dramatic amplification on the rest of the computing industry because a lot of the pricing models associated with especially the software are tied back to open source, which has its own issues. And very importantly, the fact that the services business is gonna go through an enormous amount of change over the next five years as service companies better understand how to deliver some of these big data rich applications. The second point to note here is that it was in 2017 that the software market surpassed the hardware market in big data. Again, for first number of years, we focused on buying the hardware and the system software associated with that and the software became something that we hope to discover. So I was having a conversation here in theCUBE with the CEO of Transwarp, which is a very interesting Chinese big data company. And I asked, what's the difference between how you do things in China and how we do things in the US? He said, well, in the US, you guys focus on proof of concept. You spend an enormous amount of time asking, does the hardware work? Does the software, does the database software work? Does the data management software work? In China, we focus on the outcome. That's what we focus on. Here, you have to placate the IT organization and make sure that everybody in IT is comfortable with what's about to happen. In China, we're focused on the business people. This is the first year that software is bigger than hardware and it's only going to get bigger and bigger over time. That doesn't mean, again, that hardware is dead or hardware is not important. It's gonna remain very important, but it does mean that the centerpiece of the low focus of the industry is moving. Now, when we think about what the market shares look like, it's a very fragmented market. 60%, 68% of the market is still other. This is a highly immature market that's gonna go through a number of changes over the next few years, partly catalyzed by that notion of infrastructure convergence. So in four years, our expectation is that that 68% is gonna start going down pretty fast as we see greater consolidation in how some of these numbers come together. Now, IBM is the biggest one on the basis of the fact that they operate in all of these different segments. They operate in the hardware, software, and services segment, but especially because they're very strong within the services business. The last one I wanna point your attention to is this one. I mentioned earlier on that our expectation is that the market increasingly is gonna move to a packaged application orientation or packaged services orientation as a way of delivering expertise about big data to customers. Splunk is the leading software player right now. Why? Because that's the perspective that they've taken. Now, it's perhaps for a limited subset. It's perhaps for a limited subset of individuals or of markets or of sectors, but it takes a packaged application, weaves these technologies together and applies them to an outcome. And we think this presages more of that kind of activity over the course of the next few years. Oracle, kind of different approach. And we'll see how that plays out over the course of the next five years as well. Okay, so that's where the numbers are. Again, a lot more numbers, a lot of people you can talk to. Let me give you some action items. First one, if data was a core asset, how would IT, how would your business be different? Stop and think about that. If it wasn't your buildings that were your asset, it wasn't the machines that were the asset, it wasn't your people, by themselves were the asset, but data was the asset, how would you re-institutionalize work? That's what every business is starting to ask even if they don't ask it in the same way. And our advice is then do it because that's the future of business. Not data's the only asset, but data's a recognized central asset and that's gonna have enormous impacts on a lot of things. The second point I wanna leave you with. Tens of billions of users, and I'm including people and devices, are dependent on thousands of data scientists. That's an impedance mismatch that cannot be sustained. Packaged apps and these cloud services are gonna be the way to bridge that gap. I'd love to tell you that it's all gonna be about tools that we're gonna have hundreds of thousands or millions or tens of millions or hundreds of millions of data scientists suddenly emerge out of the woodwork. It's not gonna happen. The third thing is, we think that big businesses and enterprises have to master what we call the big inflection, the big tech inflection. The first 50 years were about known process and unknown technology. How do I take an accounting package and do I put it on a mainframe or a mini computer or a client server or do I do it on the web? Unknown technology. Well increasingly today, all of us have a pretty good idea what the base technology's gonna be. Does anybody doubt it's gonna be the cloud? We got a pretty good idea what the base technology's gonna be. What we don't know is what are the new problems that we can attack, that we can address with data-rich approaches to thinking about how we turn those systems into actors on behalf of our business and customers. So I'm a couple minutes over. I apologize. I wanna make sure everybody can get over to the keynotes. If you want to, feel free to stay. The cube's gonna be live at 9.30. If I got that right? So it's actually pretty exciting if anybody wants to see how it works. Feel free to stay. George is here, Neil's here, I'm here. I mentioned Greg Tario, Dave Vellante, John Greco. I think I saw Sam Cahane back in the corner. Any questions? Come and ask us. We'll be more than happy. Thank you very much for being... Oh, David Vellante. I had a question. Yes. We had time. Yeah. All right, so the question was the most valuable companies in the world are companies that are well down the path of treating data as an asset. How does everybody else get going? Our observation is you go back to what's the value proposition? What actions are most important? What's data is necessary to perform those actions? Can changing the way the data is orchestrated and organized and put together, inform or change the cost of performing that work by changing the cost of transactions? Can it be, can you increase new service along the same lines and then architect your infrastructure and your business to make sure that the data is near the action in time for the action to be absolute genius to your customer? So it's a relatively simple thought process. That's how Amazon thought. Apple increasingly thinks like that where they design the experience and they think what data is necessary to deliver that experience. That's a simple approach, but it works. Yes, sir? I want snowflake. I don't see many of those on you to speak to, what are you seeing in terms of? What a great question. So the question was when you look at the companies that are catalyzing a lot of the change, you don't see a lot of the big companies being at the leadership and someone from Snowflake just said, well, who's gonna lead it? That's a big question that has a lot of implications but at this point in time, it's very clear that the big companies are suffering a bit from the old, trying to remember with the RCA syndrome. I think Clay Christensen talked about this, the innovative dilemma. So RCA actually was one of the first creators, they created the transistor and they held a lot of original patents on it. They put that incredible new technology back in the 40s and 50s under the control of the people who ran the vacuum tube business. When was the last time anybody bought RCA stock? The same problem is existing today. Now, how is that gonna play out? Are we gonna see a lot of, as we've always seen, a lot of new vendors emerge out of this industry, grow into big vendors with IPO related exits to try to scale their business or are we gonna see a whole bunch of gobbling up? That's what I'm not clear on but it's pretty clear at this point in time that a lot of the technology, a lot of the science is being done in smaller places. The moderating feature of that is the services side. Because there's limited groupings of expertise that the companies that today are able to attract that expertise, the Googles, the Facebooks, the AWSes, et cetera, the Amazons are doing so in support of a particular service. IBM and others are trying to attract that talent so they can apply it to customer problems. We'll see over the next two years whether the IBMs and the Accentures and the big service providers are able to attract the kind of talent necessary to diffuse that knowledge into the industry faster. That's, so it's the rate at which the idea of internet scale computing, the idea of big data being applied to business problems can diffuse into the marketplace through services. If it can diffuse faster, that will have both an accelerating impact for smaller vendors as it has in the past but it may also, again, have a moderating impact because a lot of that expertise that comes out of IBM, IBM's gonna find ways to drive in the product faster than it ever has before. So it's a complicated answer but that's our thinking at this point in time. Yeah. It wasn't a winner take all market, it was gonna be no. And to a great extent, this is a supply side from the market, I can look at the five big data that's also applying to their business. The real winners are in the five that I mentioned, the practitioners view is really clear that cloud economics are driving the economy for big data. Yeah, I would, I think that's true now but I think the real question, not to argue with Dave, this is part of what we do. The real question is, how is that knowledge gonna diffuse into the enterprise broadly? And because Airbnb, I doubt it's gonna get into the business of providing services. So I think, I think that the whole concept of community partnership ecosystem is gonna remain very important as it always has and we'll see how fast those service companies that are dedicated to diffusing knowledge, diffusing knowledge into customer problems actually occurs. Our expectation is that as the tooling gets better, we will see more people be able to present themselves truly as capable of doing this and that will accelerate the process but the next few years are gonna be really turbulent and we'll see which way it actually ends up going. Yeah, so they don't like talking about it too much but they're doing it. Exactly. And that will have an impact on ultimately how the market gets structured and who the winners end up being. Yeah, well look, at the end of the day, red hat isn't even the red hat of open source. So the bottom line is the thing to focus on is how is this knowledge going to diffuse? That's the thing to focus on and there's a lot of different ways. Some of it's gonna diffuse through tools. If it diffuses through tools, it increases the likelihood that we'll have more people capable of doing this and IBM and others can hire more, that Citibank can hire more. That's an important play. So you have something to say about that but it also says we're gonna see more of the packaged applications emerge because that facilitates the diffusion. This is not, we haven't figured out, I don't know exactly, nobody knows exactly the exact shape it's gonna take but that's the centerpiece of our big data research. How is that diffusion process gonna happen, accelerate and what's the resulting structure gonna look like and ultimately how we're gonna enterprise is going to create value with whatever results. Yes, sir. So the recap question is you see more people coming in and promising the moon but being incapable of delivering because they are, partly because the technology is uncertain and for other reasons. So here's our approach or here's our observation. We actually did a fair amount of research on this. When you take a look at what we call a approach to doing big data that's optimized for the cost of procurement, i.e. let's get the simplest combination of infrastructure, the simplest combination of open source software, the simplest contracting to create that proof of concept that you can stand things up very quickly if you have enough expertise but you can create that proof of concept but the process of turning that into actually a production system extends dramatically and that's one of the reasons why the Clouderas did not take over the universe. There are other reasons. George Gilbert's research has pointed out the Cloudera is spending 53, 55% of their money right now just integrating all the stuff that they bought into the distribution five years ago which is a real great recipe for creating customer value. The bottom line though is that if we focus on the time to value in production, we end up taking a different path. We don't focus as much on whether the hardware is gonna work and the network is gonna work and the storage can be integrated and how it's gonna impact the database and what that's gonna mean to our Oracle license pool and all the other things that people tend to think about if they're focused on the technology. And so as a consequence, you get better time to value if you focus on bringing in the domain expertise, working with the right partner, working with the appropriate approach to go from what's the value proposition, what actions are associated with the value proposition, what state is necessary to perform those actions, how can I take transaction costs out of performing those actions, where's the data need to be, what infrastructure do I require? So we have to focus on a time to value, not the time to procure. And that's not what a lot of professional IT-oriented people are doing. Because many of them, I hate to say it, but many of them still acquire new technology with a promise to helping the business but having a stronger focus on what it's gonna mean to their careers. All right, I wanna be really respectful of everybody's times. The keynote start in about five minutes, which means you just got time. If you wanna stay, feel free to stay. We'll be here, we'll be happy to talk. But I think that's pretty much gonna close our presentation broadcast. Thank you very much for being an attentive audience. And I hope you found this useful.