 Live from the Julia Morgan ballroom in San Francisco, extracting the signal from the noise, it's theCUBE, covering Structure 2015. Now your host, George Gilbert. This is George Gilbert. We're live at the Julia Morgan room at Structure 2015, the iconic conference. And we're with Paula Long, founder and CEO of Data Gravity. Paula, good to have you. Thank you for having me. So, for those of you who don't know your fame and fortune and glory at Equalogic, tell us what made you found Data Gravity and what its mission is. Okay, so I was the founder, co-founder of Equalogic, which we sold to Dell in 2008. In 2011, I found myself sort of unemployed and trying to think about what to do next. And I had a great ride with Equalogic. And I said, I'm never going to do IT again unless I can come up with a really good idea about a new problem. And so, right around that time, big data was coming out, there were security issues, there were storage issues. And I started to wonder if maybe storage could help you solve your big data analytics issues and your security issues right at the point of storage. So, could the storage array tell you about the data it held? And could it help you secure it, protect it, and service it in a way that was effective for the end user? And I said, yes you can. And so, Data Gravity was found. Okay, so take us then into the next, that first use case. What type of data, who was the sort of, the Zebra customer, the one you could see at 100 yards away? What were you solving for them? Okay, so what I then observed is, lots of focused on structured data, not a lot of focused on unstructured data, yet unstructured data was growing at 80% faster than structured data. It was 80% of your data, and it was most likely the mismanaged. So I said, we're going to go unstructured data for the first iteration. And then I said, your customer's going to be people who have regulatory requirements or sensitive data. So sensitive data can be personally identifiable information, like credit cards or social security numbers, but it also can be company IP like software or plans for manufacturing, et cetera. So our customers really have a pretty broad vertical market, we have customers in legal, customers in healthcare, customers in education, customers in state and local government. And the common theme is, they need to be able to understand and protect their data, both to make sure that they are in compliance, but also to get leverage from that data. Okay, so let's start with security, sort of constructs many people are familiar with. They log in, a directory says, okay, your ID is... You get a SID, so if you're an active directory user, you log in, you get a SID, you get an identification, it tags to track your address. And you're authorized to get to these resources. So that's sort of the basic infrastructure. Now what do you layer on top of that? So what really happens is you come in and you have a set of data that you're allowed to get to. A lot of security is done by obfuscation. So a lot of times you're able to actually see stuff that you didn't know you could find, right, and then you'll just stumble into it. This is what Bill Joy used to call security through obscurity. Absolutely, security through obscurity. So what we do is we let you understand where your classified information is, what is the permissions on it, and you decide what's classified. We have some built in what we call tags or classifications, but you can build your own set as well, and we have very easy tools to do that. So then you log in and you start looking at stuff, and you'll see us over time start to create profiles. So just because you had active directory permissions to read something, if you weren't in a classification profile that matched the classification of the data, you'd still be denied access. So that's a future, but that's coming. Give us a more concrete example of that. So for example, I'm working in a database and I cut and paste some things out of the database that happen to have some security numbers and happen to have some credit card numbers from an employee, a health form or an employee. And I cut and paste that and it ends itself up in a temp directory somewhere and that inherited the permissions from the directory. So now it's wide open with your personal information in it. What you really want is that file, even though it's wide open, you want to know that George doesn't have rights to read that because he doesn't have rights to read that classified information. So you want to shut them down, even if active directory or even if the ACLs access control this, I'm sorry, I hate to use shorthand, tells you you can't. Now this is all things we'll be doing. Some of the things we do now and some of the things are on the roadmap for the future. But what you really want to do is not just do things by by just plain rule, you want to do things based content and people will wear as well. Okay, tell us what that, I mean, for a directory like an access control list says these people have this type of access to these documents. So what did you have to layer on top of that? Now being a little bit more abstract than the, I can't read that temp file, tell us what did you put on that new layer to make it more general? So what we do that no one else really does today at the point of storage is we understand people and we understand people how they are identified from either active directory or LDAP or NIS, which are the name services you use to get into storage. And then we understand content. So we look at over 400 different unstructured data types and we extract content, whether that's in a file share or if that's in a VM or if that's in a block device. And then we look at time. So what we have is we have a map of you, we have a map of content and we layer on top of that classifications to say who can look at what and what can do what. So what we've built is I hate the word platform and I always rag on people who use the word platform but we've created this set of metadata and data that we can use to transform and make rules about how the data is accessed. And they're not arbitrary rules, they're rules based on information. So if you were to take an MRI of your hand for example, you can see some bones are very healthy, some bones aren't, if you look at the hand it seems fine, but when you look inside you can see like this finger was broken once before. We can do the same thing with your data. We can take an MRI of your data and we can map to that the users and we can map to that the access. So we can do amazing things based on what we put together for data transformations and actions. Would it be a fair to call this like a forensic analysis of how your unstructured information is being accessed and used? And then. Sort of, except forensics analysis applies kind of a post-project, it kind of sounds like somebody already died and then weeks later you solve the case. We're actually actively participating in the, we're at the crime scene if you will, storage is at the crime scene. So we're actually seeing what's happening in the data as it's happening and we can be much more near real time. Nobody's really real time, right? But near time, whereas forensics is sort of more of a passive thing. So we were talking about this earlier. So SharePoint is like the sort of new file server and it's a way of, you know, people have to actively go in and classify information and set permissions and things like that. Is there a way for you to, where you're sort of automatically assigning, you know, who can get access, but also tagging that information so that when you build these policies for access, could those feed into a sharing environment to help sort of surface more information more easily? Yeah, so anything that relies on the end user tagging things is destined to have problems. Unfortunately, including myself, humans make errors. And to be honest with you, we don't really, if you look at the number of tags companies have for how they're going to classify data, no one's going to do that. So what we do is we let you create a set of, you know, basically regular expressions or ways to scan the data to tag and we'll auto tag for you. Now we aren't doing the access check based on content permissions today, but you'll see us doing that in the future. Okay, replay that last little bit because it sounded important and I'm not quite sure I got it. Okay, so end users being responsible for tagging things is destined to be problematic. What we do is we're actually scanning the content and based on rules you provided and rules we have, we're creating what we call tags or classifications on the data dynamically. Almost like we're... We're learning about your data and we're helping you put your rules on your data. Okay. And then today we are looking at reporting on who's accessing what and when and you'll see us in the future have capabilities in the not too distant futures to be able to classify who can actually access things based on content. We do not have that today, but we will have that soon. That sounds like not only who would be able to access things based on content, but you might be able to have performance tiering even behind it. You can imagine, so for example, today's storage tiering and I, with Equalogic, we were one of the people who did tiering, it was based on who was asking for performance. So the more you ask, the more you try to give it. Well, if this is a movie everybody's watching, you might not want to give that performance. What we'll know is we'll know about the content, we'll know about what's going to be ahead next because we know the structure and we'll be able to do performance work knowing about content. We're not doing that today, but it's definitely a roadmap thing we can do and we can start to say that that movie is not important, but this aura file is. Okay, so now let's jump to the sort of the value, how you deliver value. So it's O'Coran to talk about converged infrastructure and hyperconverged infrastructure and you deliver an appliance. So tell us how that fits in. Our value add is 100% software and it's delivered in a storage array. So we are a storage array that knows about your data that can help you protect it, give you information about it and how to manage it. So what we talk about is we're not converged in the traditional sense, but what we've done is we put a layer of data services next to the data that benefit from being next to the data. A lot of the converged stuff lets you put whatever application you want there, but there's no really performance benefit from living in the, it's a packaging convenience exercise, it's not a, they're leveraging the fact that they're close to the data. We leverage the fact we're close to the data. So typically storage in a converged environment might be more about sort of patching and upgrade, simplifying that sort of thing. Well, it's about scale and what you'll see in a converged infrastructure, when storage and compute scale sort of in match sets or linearly, it provides a really good value and it's an easy way to deploy when they don't actually end up scaling that way. You find yourself with either way less capacity than you wanted and you have to buy more computes than you want or you find yourself with way more computes than you want because you've got too much storage. So it's not, it's not, I say storage is a couch potato because it keeps getting bigger and growing and isn't always active, where computes are always active. So the scaling and the economics are not always right for hyperconverged but it is an important part of an overall IT strategy. Okay, can you give us a sense of how you're going to take this? It sounds like primarily files right now. We have lots of customers, I'd say 95% of our customers are running some form of virtualization and we're looking at unstructured data. Now remember unstructured data is generated by applications and humans. So we are looking at application data and we're looking at human generated content. Okay, so when you say application type data, that means application aware and looking inside. Application aware and looking inside. Not application aware and that I can take a backup because they told me to, application aware and that we know what the content is and we know who the users are and we know what the purpose of the application was. Okay, so instead of a file being something that's opaque, you know, you can add value in the way you do. Okay. So we've heard a lot about, you know, the rage right now is all these flash SSD devices and sort of no one seems to be picking on you. Sort of why is there so much activity there and how did you stake out such a defensible position seemingly on the other side? So when we started Ecologic and I will get to the point I promise, you know, we decided to go after mid-tier customers much like data gravity is going after mid-tier customers, you know, 50 to 2,000 employees, less than a quarter of a petabyte of data. So we're going after mid-tier customers. We defined a real customer problem and need in a market big enough and then we did differentiated stuff. So in the case of Ecologic, people couldn't get to a sand because nobody knew how to manage storage and didn't really, well, they did not know how to manage storage but Fiber Channel was new and complicated and storage required a lot of like a PhD and storage to manage it. We made it all self-managing. With data gravity, what we've said is not everybody's going to have a CISO, not everybody's going to have a security officer or an analytics officer, so why don't you make the storage do that for you? Right, so why don't you add the software to do that for you as opposed to having to hire people to do it. Okay, and everyone was off with a shiny new toy. Everybody else off doing Fiber Channel, trying to be the fastest guys on the planet and we were saying, you know what, we're going to be fast and we were fast and we are fast, but we don't have to be the fastest guy on the planet because you can be a jock and an intellect at the same time. You don't have to be stupid and to run fast. That never happened to me, just FYI. Me neither, but they don't have to be mutually exclusive. Okay, we've got to wrap it there. This is George Gilbert with Paul Along from Data Gravity and we're at the Julia Morgan Ballroom at Structure 2015 and we'll be right back in a few minutes. Thank you, George.