 over the years, as I said, we got a great program. John, why don't you say a few words and then I'll introduce George. I want to thank everyone for coming. Really appreciate it. One of the things that we've been doing here, certainly a big data NYCR event, has been we've been at every Hadoop world, it's our sixth year, mostly on site and now we have our own event. But our focus is to get the data to you guys as fast as possible out on the network, programming live videos, tweets, research, and specifically getting high quality research for free out there. And really sharing that data with everyone. I think our focus is live in the moment, get that in real time out there. And I think, I'm really proud to be part of that Wikibon team and lead that effort with Dave because it's just a really needed service. People want to know where the signal is. So it's a lot of noise and our goal is to extract the signal from the noise. And it's really fun to do. I love what we do and we're glad to share. We really appreciate it that you guys have come tonight. So thank you very much and we're gonna enjoy the presentation. Some new information being shared and then a party at seven, so thanks for coming. Okay, George Gilbert, one of our newest analysts, not our newest analyst, but on the chairs here you see his systems of intelligence, his sort of platform foundation, his version of a manifesto. George is gonna be sort of giving an update to that today, sharing his ideas, getting your thoughts. So without further ado, George, George Gilbert. So just as a matter of, by way of context, Dave said he wanted me to review the presentation with him before I gave it and he wanted to review it like a week ago. So I showed up and I had 45 slides and I said, I can't do it in one minute, less than three hours. And he said, you have 20 minutes. And I was like, well, that doesn't really work until he said your end of year bonus is gonna be inversely proportional to the length of your presentation. So this is gonna be very short. You're clapping already. We're actually contributing to that phone. Oh, okay. All right. So just by way of context, 50 years we've been building systems of record. Many of you are familiar with the term from Jeffrey Moore and it's been associated with ERP and CRM apps but it's really gone on much longer. These are the main stay apps that we've been building really since the 1960s, whether they're mainframes with terminals or client server systems with Unix backends or even SaaS apps. They're all about automating business processes. Now, it's become kind of fashionable to talk about systems of intelligence or systems of insight. And I'm gonna do the same, but I wanna add five, what I think are different things. First, I wanna talk about what customers should be investing in. There's two analytic pipelines that are different from what we have in legacy systems. Second, there's some trade-offs that every customer has to take into account when they build these systems. And the trade-offs depend on your particular circumstances and I'll explain what that means. Then, as part of considering the trade-offs, your customer journey, it's not like there's one customer journey for everyone. The path is gonna be dependent on some of the choices you make when you start out. Then I wanna touch on industry dynamics, industry trends and competitive dynamics. And how they relate to systems of intelligence. And then sort of action items for users, which is what platforms should you be considering? So with that as a lead-in, I wanna start with looking at systems of record. On the left side, this should be somewhat familiar, the ERP systems, the operational data, what's kind of new or the new channels of customer interaction. But still, automating business processes. The key things to remember are that we did, the analytics in systems of record were really about historical performance reporting. I'm showing a phone bill, but it could be a look at historical sales numbers. The key thing is it's backward-looking and the limitations of this type of analytics are sort of like trying to steer a ship by looking backwards at its wake. So there are two pipelines here, and I wanna point them out so that I can highlight what's different about systems of intelligence. The first pipeline on top, production ETL. These are hardwired to the reports and business intelligence visualizations that the end user organizations asked for. These are high latency. These take a while to crank. And there's no closed loop because the results come in so far after the fact that they can't reasonably feed back into the operational apps and improve them, for the most part. The second pipeline, this is when the end user organizations, they ask for new reports or new visualizations. But the production ETL pipelines are so brittle and hardwired to deliver just the data and just the answers that were originally requested that you kind of have to unravel the whole thing and start over. Not entirely, but the point is it's very brittle. And in terms of agility, it's pretty poor. Now, just to name a couple, companies and their old heritage in this world, we had the Teradata's and the Oracle's dominating data warehouses and Informatica's heritage was in the pipelines. But I want to use these as a stepping off point to look at what's new about systems of intelligence. Now, here, with systems of intelligence, I grade out what's the same. The operational applications don't really go away, but we modernize them by putting two new analytic pipelines on them. The first one is the production pipeline, sort of like the one that generated the reports. Only this one is about automated and high-speed predictions or recommendations. You might hear products relating to Spark Streaming, Data Touring, SAMHSA, Flink. The point here is you have very, very low latency, milliseconds to seconds, to make a recommendation or distinction. In this case, I'm talking about fraud prevention. So, do I accept this credit card transaction or do I decline it? You don't have a lot of time. The human is gonna move away. If you don't do it within a certain time limit, same with an ad, you have like 700 milliseconds round trip to get an ad bid in. The second pipeline is equally important. This is the pipeline that says, I'm gonna get better continuously at making those predictions or recommendations. So, this is like that last ETL design pipeline where someone wanted new reports. And here, it's again about agility, but rather than in the last pipeline where you had to take days, weeks, months to design new reports or visualization, here the predictions have to get better continuously, whether it's in seconds or in days. And it's together that when these are operating efficiently and fast, that's what equals competitiveness. Now, there's no free lunch here. Building these systems actually comes with a set of trade-offs. And this is really the heart of the presentation that I wanna talk about. I thought about describing this as a sort of systems budget where you had at the top line, the incremental revenue that you get from building a much more intelligent system. In other words, how much fraud you'll save, and then all the costs. But in thinking about it, it actually turns out that the knobs which are the accuracy of the prediction, the speed of the prediction, the agility, and then in the underlying platform, the operational complexity, the development complexity, and your existing infrastructure and skills, all these things are related. So like if you turn one of these knobs, it'll affect the others. Now, mathematicians have a way of describing this, but I was a humanities major, and so that kind of all went over my head. So all I know is if you turn one knob, it'll affect the others. The key thing is, so the top one, accuracy of prediction related to incremental revenue. Well, we talked about the speed of predictions and the speed of improving the predictions. Those were the two pipelines that we used to modernize systems of record. And then the choice of the platform is related to operational complexity, development complexity, and your existing infrastructure. So now with this guide of trade-offs, let's take a look at what this customer journey might look like. So now if you add up the knobs and trade-offs from the last slide, your journey really depends on two big factors. One is the skills that you've accumulated in your enterprise, your skills inventory. And the other is the progress or maturity of the technology platform. That's somewhat out of your control. You can choose which one you want, but your choice is constrained by your skills. So again, they're related. But partly what makes the journey up and to the right is that the trade-offs of today get progressively less constraining as your skills improve and as the technology matures. Now, I talked about the fraud prevention application in the lower left. Pretty popular, pretty mainstream today. This is identify spending behavior that's out of the norm. I wanna point to the intelligence systems management because in our research, we see a lot of people sort of gravitating to that as an application. But we know from many decades, and I'll get into this more later in the presentation, systems management is analogous to trying to walk through the Labrea tarpets. It's sticky. But where you would have an intelligence systems management working is that it would look at the network and the services that are operating on it and it would figure out what's normal or baseline behavior. And then when there's a failure, the key thing that's so hard about systems management is when something goes wrong, there's like alerts showing up everywhere because something upstream broke what the downstream components depend on. And in fact, sometimes that cascades back up. So it's very hard to figure out the root cause. And intelligence systems management, frankly, this is a research project. This is not commercial projects yet, but this is sort of what the vision is as we move up and to the right. It would do auto remediation for you and keep your system sort of in equilibrium. Now, in a few more slides, I'll explain why this is so difficult. But before that, I wanna dive into sort of the here and now of industry or trends and competitive dynamics. So with that, sorry. Now, I might not make some or I might lose some friends with this slide and probably a few more with a couple others coming down the pike. I wanted to divide this into two different categories. Hadoop 2.0, pretty much everyone agrees, once we had yarn, we had a multi-tenant platform. We could host multiple processing engines on one common dataset, whether on HDFS or HBase. But the application model for the most part was dev tool, engine, do your work, drop it in the storage, whether it's HBase or HDFS, hand it off to the next tool, it does its work kind of a sequential kind of chain. And I know chain is an overloaded term, but essentially it's a batch process. Now people can point out their storm and there are other tools in there that don't have to drop into storage. But the fundamental paradigm is handing off not as bad as MapReduce, where every step is going to the disk, but it's still fundamentally batch. Now, big data 3.0. This is not meant to say Hadoop is left out. I'm using this as a framework for what I contend is a bunch of themes that are coming together. First, we need to dramatically simplify operations and development. It's a bit of a mess right now. I mean, anyone who's participated in this, our recent survey work, which we just got back actually last week, 300 practitioners showed that the average number of administrators on a pilot Hadoop cluster, which is about six nodes, required four admins. So the productivity wasn't exactly very high. Now obviously people are learning how to operate this, but that number has to improve. So the three principles we see happening. Storage is consolidating, where we don't have 150 storage or database managers. The next one is gonna be controversial, but I contend that we're gonna see more and more APIs, open APIs, as opposed to purely open source products. And I'll take you through a couple examples. And then the most controversial, and I put it up here just because capturing the nuance won't fit in the headline, but Spark is hollowing out some of the, a good chunk of the processing components of Hadoop. It doesn't mean Hadoop goes away, but it's using sort of Hadoop's management and storage infrastructure and maybe security. But a lot of the processing is being substituted for. So let me explain. I'm gonna skip over Hadoop 2.0. Talk about storage consolidation in Big Data 3.0. Where I say polyglot access, you're seeing more databases show up that support JSON, that support key value stores. Dave's giving me the signal. I can't tell if I'm six minutes in or six minutes left. Okay. Okay. Yeah, there goes my bonus. All right. I guess I got a little carried away there. Key thing is, all right, to remember, we've got storage consolidation. The thing that I do wanna focus on, the dev tools, this is about Spark. We have, we put streaming up there. Streaming has always been good about machine data, time series, but if you wanna do analytics, you know, up until recently, it was about as good as an abacus. SQL was good for joining and filtering and aggregating, but on time series, it was abysmal. And then machine learning was really powerful if you knew how to sort of clean up your data. So what Spark's doing really well is combining all of these. It may not be the best in each one, but the power of reinforcing them all together is quite remarkable. I do wanna say one thing, API to storage, Microsoft just announced something called Data Lake, which not the most inspired name, but it's got access via SQL, Spark, Hadoop. And, sorry about this, I'm a little new to PowerPoint 15. Anyway, it's got multiple APIs and it definitely represents an example of storage consolidation that's not open source. Now, I wanna race through, we talked about the knobs. There's operational simplicity. If you look at the left-hand side, this is a network operation center run by a telco. This is AT&T, single pane of glass. That's the ultimate in simplicity. If you look on the right, I've got probably half a dozen consoles. Every tool helpfully ships with a console. And I say helpfully because like, I just was running out of battery. If you put all of those together, it would probably fit in one square because in terms of operations, you have change management, availability management, performance management, security management, and then one for each level of apps, infrastructure, compute. And so all those, if you took all those consoles and you multiplied by six and that would like fit into one of those squares. That's complicated. Okay, same thing with development. The reason I showed Spark before, one of the things that's very compelling about Spark is that integration that they present comes through in a notebook where you can have access to the machine learning, you can have access to the visualization, the statistical analysis. If you tried to do that on your own, you'd have a bunch of different tools that would have to go together that you would have to stitch together. So now I wanna look at the more actionable parts of what all these trends indicate. I've got a chart with two axes, development simplicity which we talked about where I showed the spreadsheet versus a bunch of different tools, operational simplicity where I showed the network operations center single pane of glass. If you start at the bottom left, you've got a bunch of data managers. This is where you sort of pick best of breed because if you're an ad tech company, you have two milliseconds to look up a profile. And I'm dead serious. Your budget is two milliseconds. So in that case, you're probably not gonna wanna rely on Spark. If you go all the way up to the other end, you have the most integration, the most simplicity with Spark. Now, let me take a quick look before I get the hook from Dave. This is at the lower left where it's pick all the best of breed components. This is an amazing chart done by the 451 group. I can give them credit for segmenting this to the ends of time. I do have to add, I don't think there's a whole lot of actionable information in there. But the con here is complexity. If you decide to choose this, you're an ad tech company. You are responsible for designing, developing, integrating, testing, delivering, and operating this platform. All the components. It's not one component, it's all of them. So you are a software company. I may be dating myself. I don't know how many of you remember when Evil Can Evil tried to jump the Snake River Canyon on a motorcycle? Well, don't try this at home. And you know, not unless you're Netflix or Airbnb or someone like that. The most important part here is where we're looking at the Hadoop ecosystem today and then what I was calling Big Data 3.0 tomorrow. And what I didn't really highlight when we looked at Big Data 3.0 is Azure, Amazon, Google Cloud Platform. Those guys are building a bunch of native services that are designed to work together. This is where they design it, develop it, integrate it, test it, deliver it, and operate it to work as a unit. So they're the software company, not the enterprise. That's really important. Now, the point here is as you move up into the right, it democratizes access to this type of, the platforms democratize access to this type of application. To recap, I just want to look back at the knobs and the trade-offs. Systems of intelligence, they create incremental revenue or cost savings through automated, accurate predictions. But you can't really create a line item type budget for all of these things. They're all interrelated. As an example, you want to squeeze every last ounce out of prediction speed, like I was talking about AdTech. I'm getting the, and he doesn't have one of those hooks, so I'm getting the, he's gonna break my neck. So here, you have to build a custom platform from the best of breed components, and that means greater operational and development complexity. In other words, again, just a matter of trade-offs. So in the end, as technology matures, the spread of the relevant skills also spreads, and that democratizes access to these systems. I've given you a very brief overview. I've probably lost my bonus, but I will write this up in excruciating detail, and I'll probably get axed for that, too. So with that, I'd like to turn it over to the far more interesting panel that comes next. So no access here, but I have to say, so a typical cube crowd, they want to know when the bar opens, so we open the bar early, so feel free to wander over, help yourself get a drink as we continue. But before we do that, so we're gonna set up for another panel, so we'll have a small break in between, as I say, feel free to go get a drink, and the food will be out shortly. But I was wondering if there are any questions for George. I wanted to cut you off because I wanted some time for audience interaction, because that was a major big data injection. And so, questions? James. Actually, you know what? Can I get you the... I was intrigued by that last point. So you seem to be implying that the definition of agility is the speed of improving predictions. Is that your definition of it, or what is agility to you in the model? What is agility? I guess, think of it as if you took your historical data and updated your, use that to run through the machine learning, engine and update your predictive analytics or your recommendations. If you did that weekly or monthly, that's what I would call probably not very agile. If you're in the fraud business, those patterns or fraud prevention business, those patterns can change rather quickly. And so, agility there might be measured where with every transaction that comes in to accept or decline, you should be adding to your knowledge. That would be very agile. Does that answer your question? It seems to complicitly, I'd never seen it. I think it's an intriguing way of defining agility. It's all about next best actions as calculated by predictive accuracy and being able to always hit the next best action at every moment in time across every touch point. So I like where you're going with this. That was a good way of framing it. I agree with the next best actions comment. That is what I'm trying to say. Next best action is, authorize or decline. From IBM, former Forrester analyst. Thank you. Other questions, comments, please. Robert Novak with Cisco. Long time big data veteran, survivor, whatever you want to call it. I was kind of curious about your number of four admins to run a six node Hadoop cluster. Having deployed a number of them with far fewer admins like one, I'm wondering if that includes other groups within an IT operation, network security, storage, server deployment, politics, whatever you may have or where that kind of number comes from. Thanks. Yeah, you know, just to repeat the question, where did we get that number of four admins for a six node cluster? We're still cleaning up the data. And part of it is time series, like we did a similar survey last year. So we kind of tried to keep the questions, many of the questions the same. But I would agree that that sounds hefty, although I have heard maybe not four for every six servers, but it is labor intensive because there are a lot of different types of servers that are necessary. Yeah, Tyler Radke with Citi. One of the comments you mentioned was, I think you said Spark was hollowing out some of the processing elements of Hadoop. I just wanted to get your take on the new project from cloud era, Kudu, which kind of attempts to bridge the gap between like HBase and HDFS. Yeah, actually, Tom Riley was in here just doing an interview. That was the last one before we got ready for the panel. You know, I have to take a closer look at it. It has to do, the core of it has to do with that pipeline that reduces the latency between the operational system and the analytic system. That's ultimately what it's trying to solve. Exactly how it goes about doing it. I'm not sure I'm ready to tell you yet because it looked to me like it was a MPP columnar database that was sitting under an MPP columnar database called Impala, so I don't fully understand it. Anybody else want to comment on that? Anybody have thoughts? No? Do you even fill in the gaps? No, okay. Good question or a comment? Please, please do. George. George, George Chao from Simba Technologies. My reading of Kudo is that it seems to be the missing third because if you think about Impala, that's the query execution, but they built Impala initially to try to help it, but Impala's not enough because they really want the storage layer to do more and Parquet as a format is actually geared toward read-only access. So Kudo is actually, you can say, a read-write optimize, a balance implementation. Oh, so in other words, it's a substitute, would you call it a substitute for both? In other words, it's got the point query capability of HBase with the scan query capability of Impala. Yeah, that's why they say it's a balance. They say if you measure it against raw HDFS, it will lose because it's not geared toward optimal scanning. Right. And if you do the same against HBase, it's not optimized for right either, but it is a balance in the middle. So that's why they position it as operational analytic because it allows you to do both. Okay. So that if you have a mixed workload, that's why they position it as a better choice. So if you take a look at the timeline, when you look at it, they claim it's like three years of history, right? Which actually is exactly the same timeline as Impala. Because if you remember, Impala was announced October 2012. Yeah, but the development started when they announced Impala. Would this suggest that they would de-emphasize Impala? This is complimentary. This is complimentary. I mean, Impala is the query execution. Kudo is just the storage side. Okay. Which enables Impala to do better. Okay. Thank you, George. Appreciate it. Other questions, comments? Okay, we're gonna wrap. Great, George, thank you very much. And no, you don't get your bonus cut. You don't get fired. Thank you, great job. Really appreciate it, George, you're over. All right. We're gonna do a two-minute stage swap, right? You guys wanna take a quick break? Grab a drink? All right, and then we're coming back. So please come back.