 Change everything, software, market came around that. And so I was comparing it just in cloud, which I don't see as a revolutionary market. I think it's an extension, more linear extension of data center, et cetera. But big data is a completely new industry. Do you agree with that? And if you do, could you or as well share your perspective? Because big data actually puts the hands of productivity, the productivity back on the hands of the user, the analyst, the C level, the business manager, from military to retail, not just one industry. So can you comment your perspective on that? If you take what Michelle just said, it's absolutely true. What is happening is what we call this almost a new business model where because of what big data empowers you to do, which is deliver actionable insight, it doesn't really matter how you deliver it. So because you have a platform that you can store all the data on and you can process all of it and structure it really doesn't matter. But what the business user wants is actionable insight. And if you give the actionable insight and all the factors you just mentioned become enabling factors, the cloud is an enabler of doing that as a service. The fact that Hadoop is the scalable, commodity hardware based solution enables companies like ours to build that at scale at incredibly, incredibly economical cost. And the last point is, just I agree with your comment, just like the PC made individuals like you and I have the power of computing on our fingertips. Big data is finally bringing the business back in intelligence where the business user can get the answer he or she wants in incredibly fast times for insights that really could not be delivered. People have struggled in writing reports and changing a column or report at the next three months. And we're trying to change that paradigm and say, forget looking at the past. You have to be able to use the data you have to do predictive analytics and look at what will happen if you made an action based on insights that you can get and harness at massive scale. I think that's the massive leap that we know us and see. So the other key thing I think that's important there is that it's not just using, so when I talk with customers and everybody looks and they say, look there's this nirvana. I buy into the idea that predictive analytics, optimization, simulation technologies are going to get me game changing business value. And that's great. But how do I get there? And what typically they do is they go back and they lean on what they already know, right? And they want to use their existing infrastructure. They want to use their existing data. They want to use their existing tools. They don't want to do anything different. And I say, well, if you don't do anything different, you're not going to get any different results, right? I mean, this is kind of the definition of insanity. So you got to take some additional risks. You got to infuse your applications. You got to infuse your data with net new information if you want to have additional insights. Now, so that could mean taking your information and going to a deeper level, down to the transaction or granular level, because you think about it, even today, like so people say to me, predictive insights, we've been doing predictive insights for some time. So how is this different? Well, the difference, the aha moment, okay, is around not taking it and aggregating it and looking at big segmentations, big classifications, but now going down to, you know, looking at micro segments, right? Segments of one. I love that phrase, okay? So how do you get to doing that? You have to have all the big data to build your analytics. Or predictive insights into things you don't know yet. Which is the old model was they knew what they were predicting. When's the next thing going to happen that they know will happen? And I think one of the things that I'd like to get both of your comments on is a concept that would be. So before you move there, I want to see, I just want to add one thing. One of the things that I see around big data is what the big data is also not telling you. So there's a distinction, right? So today when people are talking about big data, they're essentially saying that there's a bunch of noise out in the ethosphere. And what you've got to get good at is finding the signal in all of that noise. And that definitely applies, right? But there's also the inverse of that, which is sometimes the big data is telling you it's avoidance of information is actually telling you quite a bit, right? So you think about it's the white space, right? It's all the stuff that you don't know about. So I was visiting with a large chemical company and I said, hey, are you, they're chemicals, right? They're like, why are you talking to me about sensor data? And I said, well, because you make laminate tops, okay? Do you have any idea where those are used? Do you know how they're sold through your distribution channel? But do you know where there's net new opportunity for revenue out in just the United States? And they said, we could not possibly answer that question. And I said, you could put a chip in all of your countertops, right? And it could be an anonymized, there's no privacy, but now all of a sudden you realize that you have a great void in the Midwest as an example. And then you have something you can actually go and do about that, create additional distribution, do all kinds of activities, but that non-existence of information should tell you a lot as well. So sorry to interrupt. No, no, no, no, this is a cube. No, it's a thing, it's ideas. You fight for air time. And it's hard with me. It's like that movie that Jody Foster was on about contact, where that one little piece opens up into a massive amount of data. It's hard to see in that, you mentioned white space, that kind of popped into my head, but that is really about big data. It's finding that one piece of data, the grain of data in the overall noise and going, wow, and unpacking that going, look who's behind it. So with that, for the folks who don't know, go check it out, it's a movie showing my age there. But the concept that we've been kicking around on this point is the notion of derived data. You can't derive anything if you don't have data. So one of the things that we actually talked about at IBM Edge, this past event with the storage group was actually reorganizing around analytics actually, which is smart, is ingestion of data, capturing data is a real big part of it. So I'd like both of you guys to talk about the trends of capturing data, because obviously unstructured and structured is the way to go. But what is your views on the current trends around, the emphasis on capturing data? Knowing that derived data might be... I think, here's what we tell our clients. I think everybody agrees to the concept of storage is free, John. Storage absolutely is free. And it's not free sitting on tape, right? Because we need to stop talking about that. It is free sitting on disk as well. Our advice always is, you have to be able to understand and very quickly realize are why are the problems you need to seek. The challenge becomes, to the point you just made, you actually don't know what the ROI could be if you actually had all the information. So here's how we start. We say, look, these are common problems. An average large financial institution, or actually another company with an online presence, will generate between 10 to 20 terabytes of data a day on online mobile platforms that they actually throw away. They literally can't keep it. Why is it too expensive to even keep it? On disk or just managing it? In general. In general. Just storage. Forget analytics. And you and I both know that there are companies built around managing online data, right? Google being the name that comes to mind. Profitable companies as well. And you look at it and you go, well, but that doesn't really make sense. Why would you not keep this data where you need to keep it? Store all of it. And to the point you just made, and the point Michelle made, we may be seeing this flicker of what I call a data set that may seem as out of normal range today that may become a signal very quickly. You will not be able to do it if you can't keep it. We are seeing a universal acceptance of that fact. Almost everybody is going in and saying, unless I have a platform powered by the HDFS ecosystem, and I'm not capturing every single piece or trace of information, whether it's from a sensor, an inanimate object, product or service, my customers themselves, or my employees and my internal data, and not putting it together to find the knowledge that it is hidden in it, I will lose the goal that's sitting in there will not be mine. So there's a whole concept of you should store what you can keep. I think the game on that is over. Storage for data. Storage is easy. Storage for data. HBase is great. And for HDFS. And the thing we're telling people is unless, if you find a company, you guys are the masters of tribal knowledge and big data, right? I called you the tribalist. You are the new tribal leaders of the ecosystem. All right, that's us. Little kumbaya there. Thank you. Unless, if you are not seeing companies and products and analytics not being built on the HDFS ecosystem, it's game over. On the batch side, no doubt. Okay, done deal. Put a fork in that, it's done. The big question that's on everyone's mind at this show this year is, how do I get the data out of that? Near real time is cool, but I need it faster. I need it in bigger chunks. So there's a big episode on the analytics. That was Abhi's point, which is a lot of that data today is almost like exhaust, where it gets generated out of some device, out of some system, and it's not being used, or it's used for a small portion of its actual usage. So what you see, starting to see evolve is all different types of stream computing engines, right? Where what, so it's a little bit of a misnomer, okay? Because the stream computing engines, what everybody's looking to do is do the real time analytics. And you're not actually going to be building the model on the fly, which everybody seems to sort of have this notion around that, but there's not enough data to actually build the model on the fly. You could adapt it on the fly, but you're not going to be able to build it. So what you see is all that stuff coming in, you'll have a real time model that will score it in real time and take some kind of action on that. And then you're going to see that data then feed into back end systems that can, and Hadoop and HDFS is perfect for that, because you don't have to worry about the known value quality of it yet, right? You could just store it in mass at free and then come back to your point earlier, John, around it's a data liberation period, right? Where all of the users, what you want to do is expose this to them. So they can, we're not going to bind ourselves anymore in terms of these prescribed analytic insights for the organization, right? So the platforms of the future, just like with Trusata, are going to have prescriptive type of analytics in them, right? What I mean prescriptive is not, they're going to predict the insights, okay? And they're going to solve known business problems, but they're also going to be open formats where people can actually do that self discovery because the platform has brought together internal data, real time data, external data that's going to evolve and that all of it's going to be there and ready and transparent for them to use. So Michelle, are you saying that then that the first step in the journey is get some sort of vertical purpose-built application nailed first with the flexibility for data transport? Is that? I think it's a subtler point, I'm not glad to add to it. A great example of this is Nathan Mraz, right? We all love Nathan, back type guy, super smart, now on Twitter, has been Storm, right? Storm is a real time processing app that now works off HDFS. But there are two concepts, it's prequels on iron fashion and I'm going to use two pre-words that are very important. One, Nathan used, and I agree with him, and one we use at Trisada, is the concept of pre-computation. So even with Storm, so you have to pre-compute certain insight. So if you and I were to split the big data analytics into two parts, what I call discovery and analysis and delivery, the delivery has to be real time. There are very few tools, John, even in the old relational world where discovery and analysis is done real time. Even in the trading environment, you don't discover and write algorithms sub-second. You don't. You build and write algorithms at best, at best on a daily level. It's called pre-computation. And once you figure out the triggers that give you the advantage, you then implement, deliver on a near real, near real, oh actually, sub-second response time. Storm, right? The best project for real time analysis on Hadoop. And you look at how Nathan describes it, same concept. He has a pre-computation layer, which he pre-computes on certain factors due to discovery analysis, finds new factors, has a three hour lag. But once he's found that out, then he is three hours late. But it's not near real time. So discovery analysis always has a lag. Delivery can absolutely be made real time. Transaction processing, aka delivery, is always real time. That's one part. The other part that we believe in is what we call pre-curation. I think that is where you and I, and I want you to do this, when you get people come on and ping them on it, there is a lot of existing data. It's all publicly available. Whether it's the government, whether it's education, Mark Andreessen nailed it, right? Software is eating the world. I think data is making that software monster even more hungry. So, and he picks the verticals out, right? He says education, financial services, government, healthcare. You look at all of those four key verticals, a lot of the data is already publicly available. What has not been done to it? It hasn't been pre-curated. What I mean by that is, you can go and get census information. You can go and get weather information. You can go and get healthcare information. Now you need to do in a platform to be intelligent, pre-curated, and if you actually have pre-curated the data, put it in one place, found the common linkage, found the correlation, and then done the pre-competition, I can deliver you real time insight that no one else can. I think those two main concepts of who is pre-curating data and who is pre-computing the analytics, whoever's bringing that together is truly a game changer. Well, who do you see bringing that together? Well, I got to be a little humble with you, but. Yeah. In the financial services space, that is our goal. I think Splunk has done a phenomenal job, by the way. So, you've heard me say this before. There is a rise in what we call, in the last 10 to 15 years, you saw horizontal data stacks being built. IBM is a mass-reader, right? The hardware, the operating system, the server operating system, and the data operating system. We are now seeing the emergence of what we call vertical data systems, end-to-end data systems. Splunk calls themselves the Google of machine data. All they do is machine data and nothing else, right? Google is the Google of online data. Facebook is the same for social data. We, our ambition is to do the same for financial data, and we really hope that same thing happens on healthcare, on education, on government. Because the tools that exist, the model exists, the data exists. You got to put it all together, pre-compute, pre-curate, and then the last and most important part, deliver actionable insight. And that's where all these concepts on the smack stack, analytics, cloud, all of them come together, because they're all enabling factors. They're not the main factor, which is if you and I can build a company to deliver actionable insight, to solve healthcare problems, to solve financial problems, to solve educational problems, that company, that model will absolutely be the winner. Yeah, and that'll be proof positive. No one can deny that. No one can deny, is that correct? So, there will be lots of vertical companies that'll spring up, and you'll see, kind of like we did in the ERP era, where there'll be a lot. You'll see that we'll have best of breed come out of that. There'll be a path littered with those that have died along the way, because doing what Trasata has done is not easy to do, and then you're going to start to see some consolidation in terms of the plays that are available out there. I think what you're also going to see is that the delivery mechanisms for these types of platforms are going to be three-fold, okay? They'll be in the cloud, both public and private ones, so an extension of, as the data center, as you mentioned. We call them as a service, they're very cloud-scared, even aware of it. But the same concept, right? Exactly. Platform, software and appliance, and that's the stack, right? Exactly, exactly. You get through an appliance, you get everything. Exactly, so that's what it's all about. An appliance, okay, it's everything underneath. Exactly, so what you see is, there'll be some people that'll come to market around horizontal type of applications, there'll be people that'll come to it around vertical. I actually think what Trasata has done is not really a vertical play as much as it is a disruptive business model, okay? Because what they're focused on is total view of the customer, okay? And that is very different. So, yes, your first go-to-market is around financial services, but it parlays very well into retail, into CPG, into many other vertical industries. And what I think they're actually doing is breaking down the barriers, the walls, those functional siloed walls, because they're focused on this different perspective. And I think- The business model advantage for you is a short-term enabler to get to the marketplace. That's exactly right. And you're, because you're a startup and you're growing. Right, and we can take some to your point, risks that no other large company would do. Yeah, you're nimble. Yeah, but look, Jeffrey Moore is talking tomorrow. It's the classic crossing the chasm. I agree with you, and thanks for the compliment. I think the ability to break down internal silos, but if you're crossing the chasm, right? Yeah. You've got to prove it out. You've got to land on your feet when you jump. Yeah, exactly. You know what I'm saying? Hopefully it's not a big Grand Canyon, but you know- Yeah, and hopefully- It's not an evil. It's not a snake river canyon. And hopefully it's not a false floor that you keep jumping and the first and the second jump you make, it's broken, right? So I think the crossing the chasm approach is optimal. We have to together find, you have to find the signal from the noise and find companies that are enabling this next wave of innovation where you're breaking the business model. You're breaking the mold. You're delivering the answers. But you have to pick, and this is my advice to every single entrepreneur, you have to pick a business problem to solve. Because once you've solved it and you've shown, you've shown that whether it's pre-competition, pre-curation, whether it's the appliance, whether it's a HDFS based analytics, whatever it may be, you have to show that someone's willing to buy it, someone agrees with you, a checker agrees with you that there is value in it. And a large implementation in a vertical has to come back to a vertical that is proving truly disruptive to exist business models can be done. And then you can expand because I do agree with you. I think our total view of customer engine has the potential to be applied to multiple areas. So the other way that you see evolving in the marketplace, and IBM does this, is in our big insight stack, we have all these accelerators that come as part of the stack. So we have, so some of the tough problems that are trying to be solved here are, how do I take analytics and actually write analytics across, in an, excuse me, in an HDFS environment, in a map predictor, and it easy. So we have all these predefined accelerators. So we have text analytics, we have machine learning analytics that come out of the box. And then what we're doing is moving up the stack to application accelerators, so model accelerators. So churn is churn is churn, well your churn might be a little bit different. So in telco, it's rotational churn, but it's going to be somewhere between a 60 to an 80% fit to get them started more readily, right? As opposed to I need to go build the algorithms, then I have to build up from that to the model, then I have to figure out all the other components, right? And so all of this is moving towards much faster pace with that. I think the other thing, I think good news for us, the biggest question we get when we walk to a client on big data analytics, security. And I smile when I hear that because what that tells me is all the other questions that you and I were even seeing 12 months ago are gone. No one questions Edge DFS anymore. No one questions the ability to actually run analytics on some sort of data. I mean, at security table stakes it'd be enterprise ready. And I think everyone knows it. Doug Cutting was just on here saying it. So we're going to use the last two minutes of our segment here. I would like you guys to share your observations with what's happening here. We're here at Hortonworks, Hadoop Summit in San Jose, in the heart of Silicon Valley. Describe for the folks who aren't here who are watching what's happening here, what's the vibe, what's your observations, what could you share about the folks who aren't here roaming the hallways, bumping into all the different conversations? Okay, so first observation is that there's a lot of buzz. There's over 2,000 people here. That shows you this is coming of age, right? So that's number one. Number two is that you see people moving up the stack in terms of the types of tools, the types of applications that are coming about. And as Abi just said very well, which is the questions people were asking even six months ago have fallen by the wayside. Those are all assumptions and everybody's moving closer to enterprise ready. So those are the three big observations that I have seen so far today. But first thing, if you're not here, you've got to be here. You and I have done this now, how many years, three years? I've seen this grow from 200 people to 2,000, right? An exponential growth. Secondly, I think a very reassuring thing for me. You know the earliest companies to go out and say the action's going to be in the application space. There's money to be made. Lot of money to be made. The number, sorry, go ahead. It's a money machine. It's a money machine. The number that Sean shared in the morning, $100 billion from my old friends at Bank of America, Mary Lynch, the analyst report, $100 billion market. There's a lot of money to be made. Third for me, and personally this is reassuring for me, it's all about use cases. I've seen the commission go from technology, pieces, architectures to where are the use cases, right? So there's a lot of money to be made, but people are saying, show me the money as well. There's so much more bust. And they're actually showing it. And they're showing it. There's so much more activity. And real knowledge base around real use cases outside the web. Just incredibly reassuring. Folks, this is a groundbreaking event. Obviously it's the inaugural Hortonworks version of Hadoop Summit. Last year Yahoo ran it, but they spun out Hortonworks at that time. And Hadoop World is now run by O'Reilly Media. And let me just tell you, my view of what's happening here, as I said earlier, this is about tech conversations. A lot of developers here, not a lot of suits, a lot of tech geeks, alpha elite forces of the tech athletes as we call them are here because it's really, they're solving hard problems. And I think the use cases and the dollars in the valley of the opportunity is, it's like everyone's at the top of the mountain looking down on the valley of wealth, creation, and value creation in a way that's gettable. It's very disruptive market, and it's great to have you guys on here. Again, great conversation. Michelle Chambers from IBM Natesa, General Manager, Vice President, and Avi Mehta, the founder of Truseta, growing startup, doing really cutting edge work around analytics, curation at, what are you, is it predictive? Pre-curation? Pre-curation. Pre-curation. Pre-curation. Pre-curation. Got it. I'm John Furrier, the founder of SiliconANGLE. We'll be right back with our next guest after this break.