 Live from Boston, Massachusetts, it's theCUBE, covering Spark Summit East 2017, brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. Welcome back to Spark Summit in Boston, everybody. This is theCUBE, the worldwide leader in live tech coverage. We've been here two days, wall-to-wall coverage of Spark Summit. George Gilbert, my co-host this week, and I are going to review part two of the Wikibon big data forecast. Now, it's very preliminary. We're only going to show you a small subset of what we're doing here. And so, well, let me just set it up. So these are preliminary estimates, and we're going to look at different ways to triangulate the market. So, you know, at Wikibon, what we try to do is focus on disruptive markets and try to forecast those over the long term. What we try to do is identify where the traditional market research estimates, really we feel might be missing some of the big trends. So, we're trying to figure out what's the impact, for example, of real time? And what's the impact of this new workload that we've been talking about around continuous streaming? So we're beginning to put together ways to triangulate that, and we're going to show you, give you a glimpse today of what we're doing. So, if you bring up the first slide, which is, we showed this yesterday in part one, this is our last year's big data forecast. And what we're going to do today is we're going to focus in on that line, that S-curve. That really represents the real time component of the market, the spark would be in there, the streaming analytics would be in there. Add some color to that, George, if you would. Okay, so for 60 years, since the sort of Donna computing, we had two ways of interacting with computers. You put your punch cards in or whatever else, and you come back and you get your answer later, that's batch. Then, starting in the early 60s, we had interactive, where you're at a terminal, and then sort of the big revolution in the 80s was, you had a PC, but you still were either, you interacted either with terminal or batch, typically for reporting and things like that. What's happening is the rise of a new interaction mode, which is continuous processing. Streaming is one way of looking at it, but it might be more effective to call it continuous processing because you're not going to, you're not going to get rid of batch or interactive, but your apps are going to have a little of each. And so what we're trying to do, since this is early, early in its life cycle, we're going to try and look at that streaming component from a couple of different angles. Okay, as I say, that's represented by this old guy's curve or the S-curve. On the next slide, we're at the beginning, when you think about these continuous workloads, we're at the early part of that S-curve, and of course, most of you, or many of you know how the S-curve works, it's slow, slow, slow, for a lot of effort, you don't get much in return, then you hit the steep part of that S-curve, and that's really when things start to take off. So the challenge is, things are complex right now, and that's really what this slide shows. And Spark has designed really to reduce some of that complexity, we've heard a lot about that, but take us through this, look at this data flow from ingest to explore the process to serve. We talked a lot about that yesterday, but this underscores the complexity in the marketplace. Right, and while we're just looking mostly at numbers today, the point of the forecast is to estimate when the barriers representing complexity start to fall, and then when we can put all these pieces together, ingest, explore, process, serve, when that becomes an N10 pipeline, when you can start taking the data in on one end, data scientists can turn it into a model, stick, inject it into an application, and that process becomes sort of automated. That's when it's mature enough for the knee and the curve to start. And that's when we think the market's going to explode. Now, so how do you bound this, okay? When we do forecasts, we always try to bound things, because if they're not bounded, then you get no foundation. So if you look at the next slide, we're trying to get a sense of sort of real-time analytics. How big can it actually get, right? That's what this slide is really trying to do. Yeah, so this one was one firm's take on real-time analytics where by 2027, they see it peaking just under 80 billion. When you say one firm, you mean somebody from the technology industry. Well, publicly available data, and we take it as a, since they didn't have a lot of assumptions published, we took it as, okay, one data point, and then we're going to come at it with some bottoms-up and top-down data points in compare. Okay, so the next slide, we want to drill into the DBMS market. And of course, when you think about DBMS, you think about the traditional RDBMS and what we know, or the Oracle SQL server, IBM DB2s, et cetera. And then you have this emergent, new SQL and no SQL entrance, which are, obviously we talked today to a number of folks. The number of suppliers is exploding. The revenue's still relatively small, certainly small relative to the RDBMS marketplace, but take us through what your expectation is here and what some of the assumptions are behind this. Okay, so the first thing to understand is the DBMS market overall is about $40 billion, of which 30 billion goes to online transaction processing, sort of supporting real operational apps. 10 billion goes to sort of OLAP or business intelligence type stuff. The OLAP one is shrinking materially. The online transaction processing one, new sales are shrinking materially, but there's a huge maintenance stream. So... Yeah, which companies like Oracle and IBM and the developers of the living off of that, trying to fund sort of new development and... We modeled that declining gently and beginning to accelerate more going out into the latter years of the 10-year period. And what's driving that decline? Obviously you've got the big sucking sound of a dupe, in part, is driving that, but really, increasingly it's people shifting their resources to some of these new emergent applications and workloads and new types of databases to support them, right? But these are still, those new databases you can see here, the new SQL and those are still relatively small. A lot of it's open source, right? But then it starts to take off. What's your assumption there? So here what's going on is that if you look at dollars today, it's actually kind of interesting. If you take the NoSQL databases, you take DynamoDB, you take Cassandra, Hadoop, HBase, CouchBase, Mongo, Kudu, and you add all those up. It's about, with DynamoDB, it's probably about $1.55 billion out of a $40 billion market today. Okay, but starting to get meaningful, I mean, we're approaching $2 billion. But where it's meaningful is the unit share. If that were translated into Oracle pricing, the market would be much, much bigger. So the point is, 10x? At least, at least. Okay, so in terms of work being done, if there's a measure of work being done, operations per second, or et cetera, it would be enormous. Yes. But that's reflective of the fact that the data volumes are exploding, but the prices are dropping precipitously. So do you have a metric to demonstrate that? We're obviously not going to show it today. Yes. Okay, great. On the business intelligence side, without naming names, the data warehouse appliance vendors are charging anywhere from 25,000 per terabyte up to when you include running costs as high as 100,000 a terabyte by that their customers are estimating. Man, that's not the selling cost, but that's the cost of ownership per terabyte. Whereas if you look at, let's say, Hadoop for, which is comparable for the offloading some of the data warehouse workloads, that's down to the 5K per terabyte range. Okay, great. So you expect that these platforms will have a bigger and bigger impact? Yeah. You expect, what's your pricing assumption? Is price going to go up, or is it just volume going to go through the roof? I'm actually expecting pricing. It's difficult because we're going to add more and more functionality. Volumes go up, and if you add sufficient functionality, you can maintain pricing, but as volumes go up, typically prices go down. So it's a matter of how much do these NoSQL and NewSQL databases add in terms of functionality. And I distinguish between them because NewSQL databases are scale out versions of Oracle or Teradata, but they are sort of based on the more open source pricing model. Okay, and NoSQL, don't forget, stands for not only SQL, not SQL. Right. Okay. So the point, if you look at the slides, sort of big existing markets never fall off a cliff when they're in decline. They just sort of slowly fade, and eventually that accelerates. But what's interesting here is the data volumes could explode, but the revenue associated with the NoSQL, which is the dark gray and the NewSQL, which is the blue, those don't explode. Someone, you could take, what's the DBMS cost of supporting YouTube? You know, it would be many, many, many billions of dollars. It would support half of an Oracle itself, probably. But, you know, it's all open source there, so. Right, so that's minimizing the opportunity, is what you're saying. You see the database market is flat, certainly flat-ish and even declining, but you do expect some growth in the out years as part of that innovation, that volume, presumably. And that's the next slide, which is where we're seeing that growth. Okay, so let's talk about that. So the next slide, again, I should have set this up better. The x-axis here is worldwide dollars, and the horizontal axis, of course, is time, and we're talking here about these continuous application workloads, this new workload that you talked about earlier. So take us through the sort of three. There's three types of workloads that are in large part going to be driving most of this revenue. Now, these aren't completely, they aren't completely comparable to the DBMS market because some of these don't use traditional databases, or if they do, they're sort of toy databases. And I'll explain that. Sure, but if I look at the IoT edge to cloud and the microservices and streaming, that's a tailwind to the database forecast in the previous slide, is that right? Actually, it's actually interesting, but the application and infrastructure telemetry, this is what Splunk pioneered, which is all the torrents of data coming out of your data center and your applications, and you're trying to manage what's going on, that is a database application. And we know Splunk for 2016 was 400 million, in software revenue, Hadoop was 750 million, and the various other management vendors, New Relic, AppDynamics, Startups, and 5% of Azure and AWS revenue. If you add all that up, it comes out to 1.7 billion dollars for 2016. And so we can put a growth rate on that, and we talk to several vendors to say, okay, how much will that workload be compared to IoT edge cloud? And the IoT edge cloud is the smart devices at the edge and the analytics that are in the fog, but not counting the database revenue up in the cloud. So it's everything surrounding the cloud. And that actually, if you look out five years, that's maybe 20% larger than the app and infrastructure telemetry, but growing much, much faster. Then the third one, where you were talking about, is this a tailwind to the database. This, the reason microservices and streaming are very different ways of building applications from what we do now. Now people build, they build their logic for the application, and everyone then stores their data in this centralized external database. In microservices, you build a little piece of the app and whatever data you need, you store within that little piece of the app. And so the database requirements are rather primitive. And so that piece will not get, will not drive a lot of database revenue. So okay, so if you could go back to the previous slide, Patrick, what's driving database growth in the out years? Why wouldn't database continue to get eaten away and decline? Because in broad terms, the overall database market is kind of staying flat because as prices collapse, the data volumes go up. But there's an assumption in here that the NoSQL space actually grows in the out years. What's driving that growth? Both the NoSQL and the NewSQL. Right. The NoSQL probably is best serving capturing the IoT data because you don't need lots of fancy query capabilities and concurrency. So it is a tailwind in a sense, and that in that, so you've got the- IoT, but that's different stuff. Yeah, sure. But you've got the overall market growing, and that's because the new stuff, NewSQL and NoSQL is growing faster than the decline of the old stuff. Right. And it's not in the 2020 to 2022 timeframe, right? It's not enough to offset that decline and then they have it start growing again. You're saying that's going to be driven by IoT and other edge use cases? Yes. IoT Edge and the NewSQL actually is where when they mature, you start to substitute them for the traditional operational apps for people who want to write database apps, not who want to write microservice based apps. Okay. All right, good. Thank you, George, for setting that up for us. So now we're going to be at Big Data SV in mid-March. Is that right? Middle of March. And George is going to be releasing the actual final forecast there. We do it every year. We kind of use Spark Summit as to look at our preliminary numbers, some of the Spark related forecasts, like continuous workloads. And then we harden those forecasts going into Big Data SV. We publish our Big Data report like we've done for the past five, six, seven years. So check out, check us out at Big Data SV. We do that in conjunction with the Strata event. So we'll be there again this year at the Fairmont Hotel. Got a bunch of stuff going on all week there. Some really good programs going on. So check out siliconangle.tv for all that action. Check out wikibon.com. Look for new research coming out. You're going to be publishing this quarter, correct? And of course, check out siliconangle.com for all the news. And really appreciate everybody watching. George, been a pleasure co-hosting with you as always, really enjoyable. All right, thanks, Steve. All right, so that's a wrap from Spark Summit. We're going to try to get out of here, hit the snowstorm and work our way home. Thanks everybody for watching. Great job, everyone here, Seth, Ava, Patrick, and Alex, and thanks to our audience. This is theCUBE. We're out. We'll see you next time.