 Live from Boston, Massachusetts, it's The Cube at the HP Vertica Big Data Conference 2014. Brought to you by HP with your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone. We are here live in Boston, Massachusetts. This is The Cube, our flagship program. We go out to the events and extract the seeds from the noise. I'm John Furrier, my co-host Dave Vellante. Our next guest is Cube alumni, Brian Weiss, who's the VP and Global Head, subject matter experts at HP Autonomy. Welcome back to The Cube. Thanks a lot to be here. Love to read that title, subject matter experts. Yes, a lot of syllables in there. This is great conversation to have. Just kick it off with subject matter experts. I wish we had a crowd chat up on the main stage during the keynote, which is kind of a flash mob group discussion, kind of thought leadership going on. But subject matter experts in this new era of social is interesting. So being it's a big data conversation, is that a big part of how things get done now? What does the subject matter experts mean to you? Obviously there's two things you look at. One is people love to talk to experts about stuff. Two, with metadata and big data, curating data modeling requires experts, ontologies, whatnot, right? So what does it mean? So what does that mean subject matter experts to you? Yeah, that's a good question, right? So when you think about subject matter experts and the value they can bring to solving a business problem, right? And if the specific one is either it's in big data, it might be health care, it might be electronic discovery and compliance. What my group does really is I go look for those people who have deep depth and experience in what they do to help then solve those problems for our customers, right? So you're talking multiple 10 plus years in their domain. To your specific question around big data, I think there's something happening right now in this industry, and maybe five years from now, we'll be able to come back and check if we're right about this. But right now everybody's talking about making, getting insight out of information, getting value out of data. And frankly there's still a lot of people work involved in that. And I think where everybody's hyped up about data scientists and the era of the data scientists and everybody's trying to tune their resume, so it sounds like a joke about what's the difference between a data scientist and a statistician? Salary, right? But so we've got this era where there are tools. Got that John, right? Yeah, it's going in the crowd chat right now. Compostal service this morning, that is a very good point. But this is the age of the data scientist, right? It's in the same way that, so people who can understand data, make sense out of it, help their line of business get value out of or becoming valuable right now in the marketplace. And for me, what that means is there's, I think a short window of that where you're gonna use all kinds of different tools, whether it's structured or unstructured or Hadoop, et cetera, there's this huge learning curve to get up there. And so there's gonna be a lot of people in there and then over the next five years, someone will say, why am I spending $500,000 on people to do this work when I can really do it with software? So in the same way you see software start to do what people are doing that you increasingly, companies like HP and otherwise, are figuring out how to automate that, how to automate that process. But we are still at the very early stage of it. There's so much exciting work to do. So in answer to your question, I'll look for people who are very deep in their domain and as experts to bring them to bear on customer problems. I mean, I just think it's a really key thing and I bring it up only because it was in your title, but Dave and I have been talking about this data first, born and born on the big data market and subject matter experts is where the sales leads are converting. Literature is gone with social. You have also automation happening and orchestration with artificial intelligence and some of the things that autonomy in Vertica does is high performance based stuff. So I see this metadata thing happening and certainly the Snowden thing highlights it at a global level. What do your customers think? I mean, where are they in that spectrum? Honestly, we're just more advanced conversation, but like, at the end of the day, I just want to get close to my business and I want to orchestrate things, make money, solve my customer problems. Yeah, I think, you know, as we get into these conversations with companies and who have big data projects, right, and who know they can get value out of their information and can get competitive advantage, et cetera, et cetera, a lot of times they simply don't know where to start and how to get there. And so the idea that I want to get closer to my business, for us, it's how do I take the infrastructure? How do I take the stuff between the data I've got and the insight I need to get, right? And that was our conversation. Jeff was here talking about idle on demand, right? The fastest way to get to that is not have to set up idle in just a bunch of data, figure out what I want questions. How about we host that for you and you ask the questions? All right, so how do I take out the stuff in between the insight and the data itself and there's a lot of technology in there and go and speed that up? And so what we ask people is, what are you trying to do and why? What's the point of this big data? Actually, are you playing with Hadoop because you think it's a good thing to do and because you've got some budget and everybody says it's cool? Or do you have a business reason for executing on this? And at that point, what are the right tools? Let's bring that up, because this is something Dave pointed out earlier with Jeff Kelly, his study is that there's a big schism between IT and the business users. And that is that, hey, I want to see the meat on the bone in terms of the project. IT guys think they're successful, but yet 18% in his survey said it's not. So there's a mismatch. IT's like just, they're geeking out on big data, but they're checking the boxes. They go, great, we're done. But wait a minute, the business guys are saying, wait, where's the beef? So what's the point? Yeah, so this is where, this is an interesting conversation. You know, we talk about retargeting, short-term, long-term gain. These are the challenges. What looked good in the short-term might not be the outcome. Well that, you know, the interesting, what I like about this conference, and so we're here, it was a year ago, right? When we sat down and we were talking about, you know, the explosive interest in big data. And at that time, I would say, I'm hearing more conversations that are being driven by what the business outcome needs to be as opposed to, let me go play with the tools. I would say a year ago, and I don't know if you're having the same experience talking to people here at this conference, but the customers who are coming here are not, they're beyond the, I want to mess around with technologies that can host more, that can manage more information and distribute compute across it. They're coming with a business problem. Yeah. They're saying I need to look at, you know, my customers and understand what they're going to send and be able to drive really more sales out of it. Not just say, okay, they're interested in X, Y, and Z. Yeah, two years ago there was a lot of kick in the tires last year, a lot of proof of concepts and now it's like real business value. I wanted to ask you, so we think about autonomy's ascendancy. Things like the Federal Rules of Civil Procedure certainly helped. Yeah. And at the time, as you guys were, you know, exploding, the General Counsel was really driving the bus. It was information was a liability. Yep. Now we're talking about as information as an asset, as value, so who's in charge these days? Is it still the GC? Is it the CMO? Yeah. Where are we? That's a great question. So you throw the CMO in there, which is a bit of a wild card in it, but you got to break it down into two different polls. One of which is cost and risk, right? And you've got a whole group of people who look at information from a cost and risk perspective, and that's where your GC sits, right? And then there's a whole group of people saying there's value in this business value and money in that data, right? I can learn something about it. So the pendulum is either cost and they're actually fighting. Yeah, I was just saying, does value trump risk or no? No. It doesn't. No, it depends on the industry, I guess. It depends on the industry, but see, here we get back to the point. So great value, how much value, right? How much value is it to not get sued or to win that case, right? So historically, a lot of what we're seeing is you get the folks who are really, really good at mining data and understanding, funny enough is these people are doing the risk and cost management. They're the ones who are looking for fraud and communications around stock trades. They're the ones that are using advanced tagging and categorization and what we call automated coding for electronic discovery. The funny thing is those guys are really, really good at finding a needle in a haystack cheaply, okay? Which is the same business proposition as trying to figure out what's going on in information so I can capitalize on the value. So you say you stop fighting. So what we are seeing when we get together, I go to these discovery conferences and the lawyers are there talking about the value of the data. So the beautiful thing is where projects to delve deeply into, say, categorization. It's a great example. Instead of reacting to the information once I've been sued, help me go find the data and do it cheaper. Organizations are moving toward the middle and say help me categorize my data ahead of the fact. Think about that. How are you gonna categorize your data? You've got a half a petabyte of data and I want to know what each one means. I want you to automate that for me. I want you to automate that for me. And the reason I want you to automate that for me, one side of the aisle says is because there's cost and risk in there. I wanna delete it, right? I want you to automate because I want you to show me what's trash and I don't wanna host it anymore. And then there's the other side, or I wanna know what's on legal hold, right? I wanna know, I want the potential of smoking. And I don't have time to read it, let the computer do the work. And then there's the other side, it says I'll fund that project because I know there's business value in there. I can see the patterns of behavior in my workforce. I can see what they're writing about. I can see what they're saying. I can mind that. And so we're seeing these beautiful conversations of people who wouldn't ordinarily get together. Yeah, but I buy the thesis there and I want to add a complication to it and this is the interesting one. So I agree with you that if I'm looking at my risk cost to store the data and take more of a compliance view versus the upside that's not coming home yet, quote, crossing the finish line in value, which is what you're saying. That's okay, I buy that. However, let's throw in another caveat. Let's just change the game in there. So if you change it to, I will pay HP or IBM or other vendor for their pre-package approved global platform versus open source, which is more of a lagging time-to-value equation which you're pointing to. Now that complicates it. So now I can buy Vertica or Autonomy or other solutions, get the time-to-value faster. So do you optimize for time-to-value or do you optimize for compliance risk management? That is going to be ultimately, open source, I can see that lagging, but when you say, okay, got Vertica, Autonomy. Yeah, I mean, is your question whether or not there's, whether people will do it themselves with open source faster than they will to take a productized approach to it because folks like HP are the ones that we're going to productize and bulletproof this stuff and we're going to build solutions around it that actually a CIO will buy. Well, I'm bullish on open source. My point is that open source by itself is longer time-to-value. That's kind of what's being, people agreeing on that. And it's free, you buy, you get what you pay for and the security is filling the holes and global issues. But if I buy Vertica, I don't have to save time with more of the standardization, the Ferrari and that yet still got open source support. So I have that luxury. But now I'm buying a time-to-value equation. So I have a time-to-value equation that's faster with buying versus free open source. So that's kind of what we're hearing from customers. And that's a separate dimension that you're saying, which is risk versus the upside. So if I can buy a faster time-to-value with HP, what do I optimize for? So now it's all about the focus. So what am I optimizing for? I'm program management policy. Am I optimizing for risk management compliance? Or am I optimizing time-to-value? It's just, I don't know the answer. It's interesting to hear. I think you end up, we're in this kind of lovely place. We're almost doing both. So we see people wanting to do legacy data cleanup, whether it's structured data or unstructured data, they want to just get rid of systems and all that. But you can't because you don't know what the information means. This is the unstructured stuff. I can always do a structured analysis and say, it hasn't been touched for three years, it's three years old and it's a word file. Metadata. I can do metadata management. But what's it about? What's the actual document about? Is it a lunch menu or is it a critical document for my business? So being able to categorize and tag things is the same technology that I would need to do a big data conversation with my CMO than that I would to be able to, with my compliance folks. So basically you get two pots of money to dig into. The compliance guy comes to the table and says, I'm gonna buy this technology to be able to figure out what I shouldn't, shouldn't keep and what might be on legal hold. And by the way, you guys on the big data side, look what you're gonna get for my dollar. So but there are differences, because the risks of getting it wrong are higher. Yeah, absolutely. On the legal side than they are on the big data side. I mean, you can iterate in the big data side and say, oops. But I suppose you can iterate on the legal side as well. Yep. So where are we in terms of being able to auto classify, auto categorize data at scale? I mean, you guys obviously search. Yeah. You know, the heart and soul. Yeah. But in some case, I mean, you're kind of using search as a blunt instrument, right? And so I still need to pull a lot of data in as the data grows. If I'm paying lawyers to do searches, it's expensive. So if you could auto categorize, might save me some dough and then search on a small- Well, that's really the secret sauce of what we're doing with unstructured information. And it's having the computer automated insight, right? So what you might ordinarily require a person to do, the machine can do a lot of that work. It can tell you what the document's about. It can, here's a phrase, find similar. So if I hand you a document and say, go get me things that are like this. One, you have to make a decision about what it's about. And maybe it's not a two sentence tweet, it's a 40-page document. So you have to read it, find out, figure out what matters and what doesn't matter and figure out what's similar. And you don't just go look for things that have the same word in it. It's conceptual. And so that underlying technology ability to do almost to get insight out of that in human ways is what really what's autonomy is about. And we're using it broadly powered, you know, the big data proposition. How do you address the fact that that data is distributed in nature now with mobile more so than ever? I don't want to put it all until a single repository, right, that doesn't work. That's kind of been the, you know, I think the 1.0 version of that has tripped over that. Really, and the way we think about it and talk about it is you need connected intelligence. You need intelligence, you need connectivity. You got to solve the connectivity problem. Nobody's going to move it into a big bucket in order to make sense out of it. It just doesn't, there's nobody's going to do that in their business process. You need to be able to connect to data source, you need to be able to understand the variety of information, the video, the audio, right? They're both, that's both unstructured as well as free text. And how do I look at all that in the same context and ask or interrogate that same set of data with one intelligent view? And by the way, I got to do it securely. So I have to still honor the underlying security mode or the underlying security framework or the documents and the repository. So I got to be able to do all of that in the enterprise at scale and give you human insight across it. And this is, there's, I think I mentioned we're moving from tire kicking in the, in the mode where I think the business problems are becoming front and center in these big data conversations. We've got some really exciting ones in there. This afternoon, we're going to be talking about our healthcare initiative. And this is a great use case of where you get, you take an example of where there's an awful lot of very rich information. And the rich information is structured, semi-structured and frankly, really unstructured. And that's patient information, all right? So when a doctor makes a diagnosis and they make notes about you and coming in, they take those notes and sometimes those notes have structured information and they don't and they're dictations and there's all kinds of, it's very difficult to make sense out of that information. And historically, what has happened in the medical industry is that all gets put down to one code, which is the billing code. So all the things that happen in this information, it's like the, it's the travesty of, you know, ETL in the medical world is that at the end of everything I've learned about you and what might be, and I have to say, and I'm going to bill it to the following code. So read a 500 page book and give me one code to describe it. Now, what if I could take all that other information, what you said, how you said it, and I can take those descriptions and correlate it with other types of information, whether it's structured or unstructured, right? There's all these data systems that are out there. If I can connect to those and I can also analyze the notes itself and understand we can look at patterns of not only how people are talking about what's happening, but what's actually happening. We've got a project with Stanford Children's Hospital we've been running for about two and a half years now and they're doing exactly this and they're able to look at, you know, increasing their standard of care, increasing their efficiencies. It's really a tremendous project and, you know, it's not on TV, but we'll be talking about it later this afternoon. Well, you're going to go on, you're going to speak to this audience with Dr. John Palma of Stanford, right? So, can you take us through more of that case study? I mean, what are they actually doing? Well, first of all, it's a backdrop because, you know, I'm in Palo Alto, Stanford. I know they're doing a ton of big data across all their disciplines. It's pretty much been infiltrated across, you know, from cancer research to all kinds of stuff. So they are really doing some great work. So go ahead. They're going through all of the records they have that are associated with patient care and they're looking at the way in which the system is describing, right? The tag that's been associated with that. But they're digging deeper and looking at the way in which any of the free text fields which are associated with that, so then the way the customer described, the customer, see here, I'm talking about customer, the way the client might have described what they were, what's happening to them, the way the doctor may have described what's happening to them and they're able to draw correlations out of that information, which are very different, perhaps, from the way that the information was coded, right? Because they don't talk about things in a similar way, right? So difficulty of breathing might be abbreviated DOB, which is date of birth, right? All of the problems around understanding context and semantics around, you know, he denies having any chest pain. Well, if I'm just looking for chest pain and I haven't understood that it's in the context of denying something, right? So the way people talk about things is not always coded into rows and columns and so they're pulling this data out and able to run really lovely analytics on what's more likely happening than the way things have been coded. So their standard of care and what's beautiful about it is that they're getting to a point where if they can really do this across, not only their data, but other healthcare data, right? So it's not just Stanford's hospital's data, it's if you could do this writ large, you start to do really proactive medicine in a different way. All right, so I gotta ask the question. I mean, it sounds eerily similar to IBM Watson in a way. What's similar, what's different? Well, there's significant technology differences, right? You know, what Watson does and what it can do at scale, you know, Idle has a whole different set of capabilities. It's more based on a probability model than a natural language processing. Although when you look at healthcare, you kind of have to do all of it, right? So you have to do not only what Idle will do with probability modeling, but natural language processing as well as any kind of ontology and lexicons from snowmen and the dictionaries, et cetera. So it's a combination of everything that, you know, we have, including some of the coming from HP Labs, using Idle, using autonomy, and then as this progresses out, we'll get into, you know, much deeper, larger data lakes of information. So it's really apples and oranges, but it's, you know, to me it's a very exciting place where we really could potentially change the way we do medicine. And in a place, we had a question from the audience today, when are we going to hear from, you know, big data not getting people to click on ads and drive revenue, but to change the world and improve the human condition? And how would you answer that question? Are we seeing it already? Is it, are we in the cusp of that? Well, I think we're at the beginning of it. To be honest, I think we're at the very beginning of that process. And innovators like Stanford Children's, who clearly see that, I mean, we have, you know, if I ask you to describe one piece with one field, it doesn't, it's not right. There's so much rich information out there. And if we could tap it in a way that we could make use of it for every clinician that's out there, we could really change things. And so there are innovators out there, Stanford, and other ones who see this very clearly. Yeah. It's human information. It's human information. It's human information. It's real value ads. It changes your life. So big data changing the world, helping people live longer. Trends like Fitbit and these things are here. Well, you add that to Internet of Things and you add all the data we're pulling off of people. And whether they checked into Stanford Children's or not, but that all becomes a data source and a data set that's worldwide and you start to, it's very tantalizing. If you free the data, good things can happen. That's what we've got to do, unleash the data. You know, medicine's really interesting right now because it's moved from, the pressure on it is not about fee for service, it's about fee for performance. And also the personalization of medicine means that we really need to be proactive and manage somebody before they get to the emergency room. And both of those things drive analytics. The only way you're going to achieve those is by getting better information out of the data that you have and adjusting your systems accordingly. Brian, thanks for coming on the key. We really appreciate it. Always my pleasure to talk. You guys doing some great work. Obviously, human information to business, commercial and to prize all the above Internet of Things, all happening. Great to be here, great to hear all those stories and loves the epic tweet, thrusting the statistician and the data scientist. Salary, great job. Thanks for that quote. We'll be right back. It wasn't mine, it came up this morning. We'll be right back with our next guest after this short break. Take care.