 Live from Boston, Massachusetts, extracting the signal from the noise. It's theCUBE, covering HP Big Data Conference 2015, brought to you by HP Software. Now, your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone live here in Boston, Massachusetts, the Silicon Angles theCUBE, our flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, Silicon Angle. John, my coach Dave Vellante of wikibond.com. Here at the HP Big Data Conference, HP Big Data 2015 is the hashtag. Our next guest is Sudeep Venkatesh, VP of Solutions Architects at HP Security, part of the Voltage Security. Welcome to theCUBE. Thank you so much, it's great to be here. So we had the new worldwide sales guy on yesterday and he comes from the security group. Yes. And again, security is a big data problem. Waitin', I mean, just sittin' there. We can't go to a Big Data Conference without talking about it. I mean, security's headlining. Big Data is really the key weaponry right now to stop cyber attacks and other threats. Exactly, exactly. But Voltage Security and now, of course, HP Security Voltage takes a slightly different approach to this field. So what we're seeing with Big Data essentially is that deployments of Big Data are extremely hard to protect for a number of reasons. So that's the approach that we take. Some of the reasons for that include the fact that you have data from a number of disparate sources coming into your Hadoop deployments, for example. You have a number of analysts, for example, that need access to that data to make real value out of it. And the types of data that we're seeing that go into Hadoop systems are very different as well. So we work with telcos that put things like GPS data along with other personally identifiable information into it. We work with a large healthcare organization out of Connecticut, for example, that put a lot of healthcare information, things like social security numbers, prescription information, et cetera, into Hadoop as well. So our approach essentially is to protect that data at the data level itself, but still make it usable to the vast majority of consumers. So are you trapping on stuff or is it more of detection or a pattern recognition or both? Right, neither, neither. So when you think of security, security can come at it from various different angles. I mean, of course you have components like firewalls, you have components like malware detection, anti-virus detection, you can do IPS as well. But our whole value proposition is that you can build as high walls as you want, but attackers will still get beyond those. So our whole approach is to take data and move security close to the data by encrypting the data or tokenizing it, depending on what your needs are. So rather than being in IPS or malware or firewalls, of course all very essential technologies in the market, we actually go straight to the source, we could go to the data and then encrypt our token. So would it be safe to say that for companies that are looking at perimeter-less security models, this is one answer? Exactly, so it's the old cliche that the perimeter is no longer existing, data is moving into the cloud. API economy, people are accessing all kinds of variety. I mean, Stonebreaker said in the keynote here, variety of big data is a problem, which means, you believe that which is true, connecting to data sources means you're connecting. Exactly. And inside and out. Exactly, exactly. So where's the vulnerability? Is it the encryption key or do you somehow distribute the encryption key? Right. Can you talk, when I decrypt, can you talk about that? Right, that's a great point, that's a great point. So encryption is fairly easy to do with today's technology. The hard part is, how do you make sure that the keys are properly managed? So you need to make sure that the right people have access to the right keys. You need to make sure that if someone's changed their role, they no longer have access to that data by having access to the keys. So along with encryption at the data level, we also provide a key management service that's typically deployed at the customer side itself. The key management that we provide is called stateless key management. So in this case, we are not required to store keys, but we can generate keys on the fly as when required. So this can make it extremely scalable for even the largest organizations. So we solve the problem of encryption using this technology called Format Preserving Encryption. That's very interesting. And then we also solve the problem or the complexity that key management brought into an organization using the stateless key management concept. What is your take of this conference here? A lot of engineers here, it's not a security conference, it's not a developer conference, it's an engineer conference. A little bit DevOps, got a little bit of this, got a little bit of security. What's the conversation like here for the within the security paradigm that is needed out there? Is it embedded in naturally in every conversation or is it specifically brought out in certain use cases? Right, so what we are hearing a lot is that, especially when I'm talking to customers who come here. So not only the vendors obviously who come here but also customers, is that big data projects are stalling because of lack of security. Data that gives you the most value is also the most sensitive data within an organization. You can get maximum juice out of credit card number usage, out of prescription medication usage, out of people's GPS information, but that is also the most valuable information to an organization and also the most sensitive. So what we are hearing from customers, especially here, is that big data projects are being stopped by security teams because of lack of security. And that's where of course our products and our solutions come into play. So what's your take on Vertica's performance? We're hearing that's becoming a part of the reporting piece of it. Security to me seems like it needs a real time. Or can you have a slower moving layers of data, real time? You know what I'm saying? I'm trying to get at the Vertica piece of it. Where does that all fit in? Were you seeing the deployments or say the HP software in security the most? Yeah, right. So one of the problems with traditional ways of doing data level security was that the format of the data changed. So right if I took your social security number or your name and I encrypted it, then the format changed, which meant that it was extremely disruptive to the underlying Vertica deployment or to the underlying database deployment. And also you would need to constantly encrypt and decrypt that data to access it, which led to a lot of performance problems, which I think you're talking about. But what we can essentially do from the HP voltage side is we can take data elements and we can encrypt them, but retain the format. And what we're seeing is that 90 to 95% of business processes where you're running analysis, you're counting, you're doing searches, index, joins, et cetera, can happen on the protected data itself. And in that case, of course, security through encryption has zero impact on performance because you can run a lot of your queries on protected data because the format is retained and the properties are retained as well. So you preserve the format, you dynamically change the keys. We could do that, yes, yeah. So that's, is that fundamental to the architecture or not necessarily as an optional sort of? That's fundamental to the architecture. What do you make of these emerging approaches that are inspired by like the Bitcoin blockchain, like Enigma from MIT and others are trying to do that as well? Sure, so I mean, Bitcoin is obviously, you know, a slightly different topic with the currency that it tries to match. But a lot of the, you know, there are several companies out there that provide cryptography and that provide encryption, which was the equivalent of being developed in the back of a garage, right? So there's a lot of proprietary encryption out there. There's a lot of proprietary tokenization out there as well. One of the things that we've been almost fanatical about is the standardization of the encryption that we bring to the market. So what we advise customers on is that not all encryption is the same, right? You know, look for encryption technologies that are open, that have been peer reviewed, and most importantly, that have gone through the NIST certification process as well. So you're right. I mean, there are some, you know, weak encryption technologies out there that claim to do some fantastic things. But we always caution customers to really look at the provenance of that encryption and make sure it's coming from the right place and has been looked at by the right place. Yeah, well then these are more sort of whiteboard technologies now, you know, coming not so soon, right? But the idea of distributing, you know, using a blockchain type approach has been getting a lot of attention in different circles. Okay, so the other question I want to ask you is a security professional. A lot of times people say, well, security, we've failed, you know, it's a do-over, is that fair? I mean, are you part of the do-over? That's a great question actually. Obviously, you know, security at the end of the day is a control, right? It's providing a control to a business problem. What we're seeing now more and more is you're seeing an increase in attacks, you know, by folks who want to take your data and want to commercialize it illegally. You're seeing an increase in attacks by things like foreign intelligence agencies as well. So I think security has always been making steady progress in response to what technologies and what architectures it's protecting. But nowadays you're getting a lot of publicity, especially with the recent breaches that we've heard about in the enterprise space as well as with the federal government. Yeah, and I feel like, I mean, Stuxnet was a new high-water market and opened up Pandora's box. Right, exactly. So what's your take on the show here today? Give us, share with the audience out there why this show is so special. Yeah, I think it's been very interesting, talking to a lot of customers, seeing what innovations customers are doing. I know you spoke to one of our customers in Mar this morning, having good. So he is, they are a great example of where someone is really trying to monetize big data, make sense of it, but use security not as an inhibitor, but as a friend that helps you make progress there. So also I think there is a lot of desire amongst customers to move from, you know, demo or proof of concept systems of Hadoop to more production grade Hadoop. So they're sort of hungry, I would say, to look for new use cases to monetize their existing Hadoop deployments and also increase them into more production systems. So we heard from Robert Young-Johns about practicality, you know, trying to really level set the audience and the world that there's a lot of hype out there, a lot of noise. So as the solutions architect here, they're all working under the covers. How does someone prepare to kill two birds with one stone, so to speak, to where they can architect the big data platform, be scalable with large scale, whether it's bare metal, data center or cloud, enable the analytics to scale, and check the box on security. I mean, it's a tall order, but people are doing it. Can you share your thoughts on that? People are doing it, absolutely. And, you know, we are seeing a lot of customers doing it. We know that there is a lot of hype in this industry, obviously that's been obvious for the past couple of years, but we are seeing people do really big investments and big data and benefit from it as well. So for example, the healthcare organization that I was talking about in New England, they are building towards a thousand node deployment, a thousand node Hadoop deployment, and they're primarily using it to pick prescription for fraud and also catch on Medicare overpayments as soon as possible. The large telco that we spoke about is working towards a 2000 node Hadoop deployment, and they're putting all sorts of information. So they import something like 330 million records into Hadoop with 17 sensitive data elements in them that they protect with our technology, and this is everything from GPS information, location information, et cetera. We work with some large banks and they're putting a ton of credit card data into Hadoop. The marketing team obviously benefits from it to give their customers new offers, and they're also feeding that data into different fraud analysis as well. So yes, there's a lot of hype and a lot of talk about big data, but we're really seeing deployments that are of the hundreds of nodes, if not thousands, adding real business benefits and value. And speaking of John Mitchell Cloud, you got to ask to the Cloud question. So with a lot of people, and specifically as it relates to data, what a lot of the public Cloud guys are doing is trying to build data management pipelines that are integrated and deliver them as services, and you get the simplicity of that service, but maybe lack some of the functionality and maybe some of the ability to customize from a security standpoint. What's your take on the state of security for Cloud, specifically as it relates to data management? Yeah, absolutely. I mean, once data leaves an organization's parameter, then it is vulnerable to a whole different set of attacks outside as well. For example, if a Cloud provider gets breached, then that means that sensitive data from a particular enterprise would get breached as well. Sometimes we're seeing things like blind subpoenas where different governments would require a Cloud provider to supply all VMs or all hardware from a particular environment for forensic purposes. So that sort of only bears into our story of data-centric security. That if I take sensitive information like your social security number or your email address or other behavior patterns, and if I protect that using things like encryption and tokenization, it doesn't matter if that is proliferated into a thousand node Hadoop deployment. It doesn't matter if that is sent temporarily into AWS for doing an elastic map of these jobs. As you make your Hadoop more elastic, the security moves along with that data. So, you know, with large big data deployments, especially those in the Cloud, we're only seeing a validation of this data-centric approach that we have. And it doesn't matter if some government's looking at it other than the symbolic implication. Right, they only get tokenized data. Right, if a Cloud provider gets breached and they get to all the VMs, all they get is not your SSN, but a surrogate of it. Okay, well, thanks for sharing your insight on theCUBE. We really appreciate it. What do you expect to see in the next year? Last question. I think in the next year, we'll see a couple of things. So we'll see more and more organizations really figuring out what to do with their big data. So of course, everyone has gone. You know, one of the interesting terms that I heard yesterday was the data swamp, right? So we start off building a data lake, but it turned into a data swamp. So I think we will see a lot of organizations finally figuring out how to turn some of those swamps back into a very usable data lake. That's one thing that we, I hope to see in the next year. Well, we will see data oceans, that's our prediction, more dynamic, a lot more threats, a lot more currents, a lot more unpredictability. You got those rogue waves, obviously a security threat, that David couldn't help that. Again, it's live in Boston, so keep me right back up this short break.