 Heidi, and welcome to this week's theCUBE Conversations with Wikibon. This is a regular program that we're going to be producing from our studio here in Silicon Valley, in which we use current events, either generated out of the Wikibon research stream or just things that happen in the Valley to get together and discuss some of the big things happening in changes in digital business today. I'm Peter Burris and today I've got George Gilbert who's an analyst with Wikibon and we're going to spend some time talking today about some recent research that we just completed which looks at big data in public cloud. Now, you would think that there's a natural affinity between big data and public cloud, George, because on the one hand, public cloud suggests that it's available always, available everywhere and available to anything and big data says data can come from anywhere at any time in any form. And I would add two other things which is part of what makes big data valuable is the ability to add context in terms of other big data sets and to process it at scale, both of which are unique characteristics to public cloud. So there is a strong affinity between the two but for the first 10 years, most of the big data investment has been on premise. It's gone into Hadoop, on premise storage, on premise processing, on premise Hadoop and related tools. And I would say the primary reason for that is probably that when you move to the cloud you have a fundamental shift in the management model to shared infrastructure. And that sort of upends everything when we moved from mainframe to client server distributed systems that was all dedicated infrastructure. So the shift back is difficult. So we've been doing a fair amount of research on this topic and for those of you who have been paying close attention to Wikibon research not too long ago, we came out with our aggregate big data forecast. And it showed that the marketplace, which is not small for big data is going from around 20 billion, 22 billion and it's going to grow over the next 10 years to about 92 billion. So just south of $100 billion. Big numbers. The public cloud marketplace meanwhile is going to grow from somewhere in the vicinity of about $80 billion in 2015 to an enormous $490 plus billion. Almost $500 billion to spend. And then the question that we asked ourselves as we thought about some of the dynamics of the big data marketplace specifically looking at the applications that we've been looking at over the course of the past few months, George was well, what percentage of big data spend is going to end up in the public cloud? And to kind of get right straight to it, today it's a relatively small percent. By 2026, we think it's going to be about 25% of big data spending will be in the public cloud which is a not inconsequential amount of money. But it's not kind of the dominant, it's not this dominant everything else goes away kind of spending. Why is that, George? What do you think? Well, as we were going back and forth on the sort of underlying assumptions, I think the, it's not just the raw infrastructure management, how do we manage sort of shared access to hardware. But we have to get, we have to solve some thorny problems in data protection. In other words, when you're using data from different sources, some of that has to be sort of secured in motion. You know, when it's on the wire, when it's at rest, it has to be sort of cordoned off so that if it's, if some company, let's say like Salesforce is running a multi-tenant application, they actually have to carve off not the application so much as the parts of the data that belong to different customers. And that's a new skill still. Well, it's a highly complex job to segment data in that way. So the other reasons are of course that big data going back to some of the remarks you made in the introduction, big data presumes that you don't necessarily know where the source is or what source is going to be most valuable or where it is or what other, as you mentioned, security, other kind of constraints it's under. And the simple reality is that there are two physical constraints that still dominate architecture today. One is it costs something to maintain the state of data. Now, we don't have quite as many problems with that in big data because it's a very right once read intensive kind of environment. But the second one, which is not going to go away anytime soon so long as the law of physics are in place, is that it costs something to move data. You can't move data faster than the speed of light and there are restrictions on how much data you can move at any given instant. We're talking about getting up to hundreds of megabits, even gigabits of data on a lot of different networks around that are available to corporations of all sizes. But the simple reality is that we are generating data at multiple gigabytes a second in sometimes simple systems. So take that a step further. Are we originating that data in the cloud or in systems that are more easily connected to public clouds? Or is that data coming into enterprise infrastructure and enterprise applications and then having to be sort of commingled for context in the public cloud? And that's one of the great questions about the types of applications that are on the horizon that will exploit big data. So for example, today when we're looking at ad tech or web analytics or something like that, still these systems are generating enormous amounts of data but they tend to generate it in the cloud or pretty close to the cloud. So that an individual that is acquiring some resource from a business is probably doing so by connecting to some mechanism up in the cloud and then doing a bunch of work and that data is in the cloud and it's easier to keep it in the cloud. But when we start talking about internet of things, for example or applications that are, well, we'll just use internet of things, then those, that data may be generated hundreds or thousands of miles away from a particular cloud-based resource. And then the question is, are we going to stream all that data out of that video camera sitting at the corner of park in Maine overlooking the front of the bank that's tracking who's going in, who's going out, what the demographics of the individuals are or for security reasons, are we going to stream that back 2000 miles to Los Angeles somewhere so that it can actually be stored and processed and hit using some sort of big data analytics? That's what we're not sure about at this point in time. Maybe there's an analogy that we can use from the emerging architecture of online video games where the low latency sort of immediate processing has to happen locally on your console or on your PC or even on your mobile device. But when you want to render the big environment, sort of optimize the whole ecosystem, that happens on the cloud. And there's low latency enough to tell the local device how to render the new environment. And so tying this back to internet of things, the low latency quick response make a decision about how to update or control a device happens locally, but some data goes back into the cloud to optimize how the ecosystem performs over time. And so I think there might be some complementarity, but we don't have the software architecture or for that matter, the hardware infrastructure in place to do that. And I think it's a great example, George, and at Wikibon, we think that's exactly how it's going to play out is the idea that for example, the broad rules about customer experience and the broad rules about fraud and the broad rules governing cybersecurity and those types of things, pricing will be things that are back in the cloud that have ecosystem elements associated with them, but how the actual behavior or the actual action that takes place proximate to the customer once prices there, once general rules of customer experience, who the customer is, has all been present and made stateful locally, then those decisions can be handled locally without having an enormous amount of data being shipped back and forth. So the key question is, when we look at those local applications and the sort of centralized or cloud-based scenarios, if we had to weigh them in terms of data volumes, what would you, is it possible to sort of do a global estimate or is it really use case specific at this point? You know, George, I think that's, I'm not sure, but I don't think we really know. But here's a, there's a law of automation, law of automation, a rule of thumb of automation that actually I developed probably 20 years ago that seems to be playing out pretty well. And that is with every successive generation of automation, the number of transactions required to support that automation increases sevenfold. And so, and that might actually be going up. But the point is, is that as we try to automate behavior locally, the chances are, is that the amount of data and the number of transactions to do that is going to go up sevenfold relative to what we would do if we would try to automate it back at the cloud. Now think about this. So let's just use banking as an example. If in the old days, when you did branch banking, you'd go into the branch, let's just pick some numbers once a week. And then when you do ATM, well, you typically or ATM or online, you do it seven times more frequently. Touch your accounts, you'd, in some way, the bank would touch your accounts, you'd touch your accounts, and you can just kind of see how it explodes over time. So an activity that used to be relatively simple and happened once every two weeks, say, with, you know, deposit a check, withdraw money, now is taking place through a lot of these big debit card systems, you know, hundreds and hundreds of times every couple of weeks. As we push more of the responsibility for governing how those behaviors actually get affected, closer to these local IoT systems, utilizing more data, because, you know, we're going to have to validate faces. We're talking about transactions that are not, you know, a hundred kilobytes in size, but are multiple megabytes in size, and we start using biometric kinds of security. Then we're talking about a lot more data. So I think that what we'll be doing over time is using that network speed as a reasonable constraint on thinking about how those architectures are going to get set up. But what we can be certain about is that as we do more automation, we're going to be putting at least linearly more work on these systems and perhaps it'll be even increasing over time. When you say put linearly or more work, are you talking about the edge side or the cloud side? Talking about the overall active system side, which is why there's going to be an increasing pressure to do more work at the edge. Okay. So we'll see how it plays out, because no doubt new types of compression, algorithms will be available, will come up with ways of pre-processing a lot of the different media that we might utilize to try to distill the signal out of the noise so that we can move data more effectively and efficiently back and forth. But it's fair to say that again, as we automate more, we're going to use more data, requiring more transactions, and that will provide a reasonable governor on where we actually do the work. So if we, our forecast looks out 10 years, and there are a lot of assumptions and milestones in that sort of forecast, what are some of those assumptions and milestones that we might revisit each year or every couple years to see where we write in our assessments? Well, I think the big ones, George, and you've talked about this in your research, and so many respects, I should be turning it over to you, but I think the big one is pretty clear. We expect that there will be a few milestones in the development of the big data marketplace that are tied back to the rate at which application-packaging technologies evolve and improve. So we talked about, for example, we talked about this last week, we talked about, for example, the evolution of data-to-lake technology and how fast data-to-lake technology's going to evolve, both as a technology into itself as a way of perhaps offloading more complex, what is today called data warehousing work, even data prep and data movement, the whole ETL marketplace, as it gets affected by what we can do with data-lake. But we also expect that increasingly we're going to see these intelligent systems of engagement start to evolve where engagement decisions are made more by the system itself, taking a look at pricing, taking a look at the types of offers, making recommendations, et cetera. And then the last one is this notion of self-tuning systems of intelligence, which really comes back to how the applications themselves take more responsibility for formulating and refining the models that we use to actually lay out the system of options that we present to customers. All right, now let me take you back along that same sort of set of step functions and the applications and take it from a different angle, which is how would you carve up, because this isn't something I've really addressed in that sort of research taxonomy. So we don't have stuff we've published yet. Yeah, but how would you carve up between what belongs, since we're moving from the era of centralization with semi-rich clients, but really end devices, now we want to go back to where we have some sort of beefy processing closer to the edge. How would you divide up the capabilities of those applications between, whereas now it's smart device, big cloud, and where we will soon have gateways and edge processing. Do we have patterns that we can look at? Well, I think right now the pattern is, most obvious pattern is that the edge, as you said, the edge devices are getting a lot more potent. They're getting more potent both in terms of their processing power, their memory capacities. We get very, very large physical memories on some of these devices, and we're only going to get more. The data capture technologies that they have, for example, having very, very high quality cameras in all these devices, generates greater pressure to put software that's able to take advantage of those cameras, and we're seeing, for instance, in the smartphone industry, a lot of new competition about being able to do local video capture, local video editing, present things, and then upload things any way you want. My guess is that what we're going to see happen over time is a lot of that will be presented as automated services that are available to applications. So a lot of things that we're doing on phones today, for example, will be rendered on phones as services that applications can exploit for certain qualities of what we might call edge behaviors. The same thing is going to happen when we start talking about sensors that are gathering enormous amount of information and then having a control element of feedback loop capability that can do increasingly complex types of processing to then affect some sort of behavior on a physical device somewhere through a control type of relationship. Okay, so that's the part I want to push you on because it sounds like that's another tier between the mobile device and the cloud. Are we going to see one tier, two tiers, and what functions might be on those if it's visible or discernible at all? You know, there has historically been made, a lot of attention has been paid to different tiers and different platforms. And while I think that's important, I think that in all honesty, George, I think the data is going to decide. I think the data is going to decide and the speed of the network. And very importantly, some of the emerging approaches to conceptualizing and building these new applications. For example, there are reasons to suspect that the kind of middleware-oriented, library-rich approaches to doing application development are going to have to be replaced by some set of technologies that allow the network to do even more of the decision-making about where device is, how to format data for the device presented to the device. The same basic things that we do in computer science are going to have to be done, but more than likely the network is going to take over more of that. Look, we know we're going to have big databases. We know we're going to have server farms somewhere that are processing those big databases. We know there's going to be a network. We know there's going to be varying stages of control points throughout the network to handle formatting, to handle control, to handle synchronization and state. Things that look like super context data networks that just do a lot more processing. And we also know we're going to have increasingly rich devices. So from an architecture standpoint, the thing that we need to look at is look at the nature of the problem, understand what work is trying to be done, who's going to participate in that work, and what capabilities we need to build so that we can do it better than anybody else. And then take a look at the availability of the technology from an infrastructure standpoint, from a data management standpoint, and from a development standpoint, and ultimately management and automation standpoint. And that's going to be a moving target for the next few years. Do you agree with that? Yeah, meaning it's all in flux and that platforms, we need a new platform for these self-tuning systems of intelligence or IoT. And it's not 100% clear what it's going to be, but there are, future never shows up all at once. We see a couple precursors where GE's predicts platform. It's not just paths for industrial objects or ecosystems of them, but it creates something where in the old web generation we had content distribution networks where you could serve up web pages close to where they're being consumed. Here they create these digital twins so that they can monitor and optimize the behavior of each individual industrial device. Then IBM told us that they didn't buy the weather company so that they could inject better weather forecasting into industrial activities like oil and gas exploration. They bought it because they wanted that same, or similar capability to what GE is doing in terms of a very distributed processing and data collection infrastructure. And I guess one big question is, will new companies like GE and IBM define that next tier of platform for intelligent internet of things apps? Or will we continue to see, you know, what's going on in the future? Will we see, you know, Amazon and Microsoft and Google spread their footprints out? So I think we need to make it. We got to wrap up after this. So here's what I'd say, George. It's going to be all of the above. But what we do know is that whatever happens, physics is not going to change. Computer science will have to evolve and vendors will take advantage of the gaps to introduce new technologies and ultimately users will decide what to adopt, how to adopt it and we'll dictate through their activities the shape that the market takes place over the next 20 years. But it's going to be a really, really interesting place to be. Certainly one of the most important things happening right now. So with that, George Gilbert, thank you very much. We're at the top of the hour. Like to thank everybody for joining us for another CUBE conversation. We intend to do this on a regular basis, as I said, hopefully weekly. My name is Peter Burris, Wikibon. SiliconANGLE Media, George Gilbert, Wikibon. Thanks very much for watching.