 Welcome to the Moscone Center, everybody. This is theCUBE, Silicon Angles flagship production. This is our sixth year at VMworld. We're in Moscone North this year. David Flynn is here. He's the CEO of Primary Data, longtime guest, entrepreneur, founder. David, it's always a pleasure to see you. Welcome back to theCUBE. I'm glad to be here on theCUBE again. So, you know, we're here. Things just keep getting bigger and bigger, right? VMworld 23,000 people. You're doing another, you know, amazing startup that we're going to talk about, but first of all, you look great. Thank you. How do you feel? I feel awesome. I feel awesome. And part of that is because now I give the CEO to Lance. Got a lot of the team from Fusion IO here together and now I get to focus on the technology as CTO. Right, so I said CTO. We talked about it beforehand, but yes, CTO of Primary Data. Although, so you founded, now this is your least second company that I've been watching very successful Fusion IO. We saw the innovations that occurred there and now you're on to your next startup, Primary Data. Let's talk about what Primary Data is doing. One of the more interesting startups that we've seen in a while, you're solving a very, very challenging problem. Basically, the way I describe it is a platform for management, data management for the cloud and you're approaching it in a very interesting way, always trying to solve hard problems, you and Rick. So describe Primary Data, let's get into it. Really want to help our audience understand it. This is kind of a continuation of what got started with the introduction of Flash in the enterprise. Kind of proved that the managing data via the storage silo was unable to address the full dynamic range of storage from the ultra performance that you can get from Flash, especially server side Flash, near to my heart and the ultra capacity that you can get from cloud storage and object based storage. Those things can't be put into the silo. So the days of siloing data have to come to an end but that creates chaos if you're trying to manage data in different storage systems because the identity of the data is trapped in with it. So what we have done is we have basically freed data from the silo by taking the metadata and the control plane and being able to serve that from the side, not with a box in the middle that reroutes the data, but a service we call DataSphere which manages all of the metadata and control channel that combined with a universal client that's able to speak all of the various protocols and route the data natively and direct, whether it's over fiber channel, local attached or out over the cloud. This ultimately lets us solve the fundamental problem in storage of being able to bring together, doesn't matter the protocol, doesn't matter the connectivity, all of these into a single global data space across which the data is fluidly mobile and maintains its identity. See today when you take a piece of data from one silo to another, it's not the same piece of data, it's a different piece of data. And we're trying to manage data, not the storage and yet managing it via the silo means that we have this issue, I call it the identity crisis. So when I create data, data has value, some data has more value, some data has less value, some data I need to back up, some data I don't really care, some data I need to have high performance, the characteristics of that, that access will change over time and the challenge that we've had as an industry is there's been no way to manage that fluidly throughout the infrastructure and some have tried to do it within their silos but essentially that metadata that you talked about, if I understand it correctly, has been locked inside of boxes and what you're doing is taking an approach to free that metadata and then leverage it in creating a value-based infrastructure out of it. So talk a little bit more about how you do that. Well, the implements of getting the data virtualized and by that I mean abstracting the location of the data away to where you as a data administrator see it as the same piece of data no matter where it resides and the application can reference it no matter where it resides and even reference it while it's moving, okay? So that's the underlying platform of data virtualization is to abstract the location but that's really only the beginning because once you've done that, now you have to schedule where should the data be at any precise point in time and that's where we're able to move to a world of objective-based data management where you put the requirements that the data needs on the storage infrastructure are expressed through the language of objectives which are mated to the service levels that are available in the infrastructure and that's how it determines where things should be placed and moved. All right, so let's do very, very ambitious. Yeah, it is, so David, you know, we're here at VMworld and we spent the last decade trying to fix that, you know, the challenges we had of managing storage in a virtualized environment and gosh, I mean, when I joined Wikibon five years ago, David Floyd was saying it's the metadata, it's the metadata, but metadata is tough. Does it happen automatically? And I've seen, you know, I really like what I'm hearing from what you guys are doing. The networking guys that I track are trying to, you know, understand what's happening with analytics but it's a very different, you're flipping the mindset. It's not about storing it, it's about the data, it's about information. So for the virtualization community, you know, we've got all these, you know, virtualization admins, you know, can they grasp this? Are they ready for this? We're already familiar with the concept of abstracting the logical from the physical. Now we work with logical machines, virtual machines separate from the physical and they can be instantiated on any piece of physical hardware, shut down, move somewhere else. The lifetime of that virtual machine is no longer coupled to the physical. What we're doing is finally doing the same thing for the data objects. Where those data objects can live on this piece of storage, that piece of storage doesn't matter, the vendor doesn't matter, the protocol can be object, file or block. Doesn't matter how it's accessed, it can be object, file or block accessed but the data is able to fluidly move between the different types of storage. So I think people familiar with virtualization get the concept and inherently understand the value of that decoupling of the logical layer from the physical. It just hadn't been done yet in the storage world. We've been too stuck on making bigger silos or trying to stretch the data out front of the silo with a catch or stretch it out back with backup and archival or stretch it horizontally with scale out. These are all expressions of the same underlying problem but each of those are just fixing a symptom. Caching is trying to tap into lower cost performance by lending the data out front. Backup and archival is trying to tap into cheaper capacity by lending it out back. It's just the immobility of data that's really at the heart of all of these. So I want to describe my perfect world and knowing you, you're going to make my world even better because I'll probably miss a few things. But so I would like when data is created, I would like attributes associated with that data to be tied to that data ideally automatically and then assigned and applied so that the policies, I'll use that sort of outdated term, can determine where it goes, how often it's backed up, all kinds of different things, how many copies I need, et cetera, et cetera. When I need to defensively delete it and as the attributes of that environment change, as the data changes, as the characteristics change, I want you to take action on that data. Can your system do that? Is that the vision? Is that available near-term, mid-term, long-term? Absolutely, absolutely. First it's the fact that we have split the control plane from the data access plane that allow us to add the sophistication in the control plane that you're talking about. See today, storage is broken because the metadata and the data are commingled and the same agent which resolves a piece of metadata has to subsequently route all of it. That's as if your DNS server had to proxy all of your traffic to Google. The internet wouldn't scale if you had to have your DNS servers proxy the traffic. Yet storage is like that today. The same thing that opens the file, foo, has to pass all the data to it. So fundamentally, by splitting the control plane, we no longer have that false dichotomy of having to choose performance in the data path or sophistication in the control plane. But once you've now freed up the control plane to be more sophisticated, now you can introduce a language to make it easier to manage those things. And we talk about that as the language of objectives. These are the intents of the data. What does it need? And all of the things you listed, whether it be durability, availability, security, all of these things drive the physical placement of the data. So if you say I need higher durability, it might have to put multiple copies. You say I need disaster recovery, it might need to put some of those copies off site. All of the placement is done in response to trying to meet these more abstract objectives. So we talk about objective based data management. Okay, and you have knowledge, you've got things like catalogs that can help me understand where it is, you know. Where it's moving around. Okay, and then, okay, so you are the control point. Now, so your vision is for cloud, obviously. We're living in a cloud world. Talk about how you're going to market, maybe partnerships that you have. How do you get this product out there? Is it direct sales or do I buy it? Do I, is it a service? You know, there's awesome opportunities for this to solve real pain points and real immediate pain points because most organizations are faced with the question of how do I take advantage of the goodness of flash? When I operate in a siloed fashion, flash is too expensive to be the one silo, right? So maybe I use flash as a cache but now I'm introducing caching technologies yet more stuff to manage, these point solutions. On the flip side, saying look at how cheap and deep I can get object storage and even if I'm only using it for my archival tier but then I have to introduce these cloud gateways or methodologies for moving the data and those are point solutions that are yet more things for IT to manage. With this, we connect all the way from the cloud through what is today the traditional primary storage systems all the way into server local and make those all one data space where you don't have to manage them with these separate point solutions. But to get more concrete on the go to market, VMware with vSphere 6 has introduced the concept of vVols as a way to kind of normalize the storage infrastructure. But every storage vendor is going to have their own Vasa 2 provider and vVols implementation if you're lucky enough to get one because there's $300 billion of already installed storage equipment supporting virtual environments. A good portion of that will never support vVols but with primary data able to envelop all of those storage systems and virtualize the data across them, we become the universal vVol provider, providing Vasa 2.0 in a uniform fashion regardless of the storage, whether it's file, block, object, what have you. You don't care what the storage infrastructure is and you obviously have a high speed data mover that can get the data where it needs to be. Correct, correct. That's implied. So I'm wondering those that have been tracking storage for years to look at it, there were a lot of failed attempts at that storage virtualization or software defined storage, we spent a couple of years talking about it and saying maybe this is kind of the next generation. But those are technologies not really solving the solutions. You're making bigger silos from smaller silos. Even hyperconverged is just making a bigger silo of your data, a bigger prison. We had, I love the fun competition with EMC when we were part of FusionIO, being the little upstart, right? And EMC folks would tell the customer, yeah that FusionIO stuff, it's great. But it's an island of data that's trapped in your server and unmanageable. And when we would talk to these same customers, they would say boy that's the pot calling the kettle black because that sand is this island of data in the middle of my data center. Just a big island. Yeah, it's the same problem whether in miniature or macro it's the fact that the data can't fluidly move across these things while being accessed with the same identity. So what about public cloud, Amazon, Azure, where do they fit into your vision? Well we think of these on two dimensions. One of course is using the cloud storage as an extension of your global data space that happens to have maybe very slow access characteristics but very, very cheap and very deep, somebody else can manage it. So we connect to cloud and object storage whether it's on-premise object storage or cloud. Just another target to you. And if the objective on the data allows for it or necessitates it to be off-site and archival then it may get posted up in the cloud. When you go to access it, it may pull it back down, right? So it's just another tier, another storage with different service levels that can be used to meet the objectives. That's the cloud storage side. The cloud compute side, that's where our data sphere and the agents that are needed can run on virtual appliances or actually even as software appliances if you want to run them bare metal in cloud as well. So by being able to deploy this as a software defined kind of model, now you can put it up into the compute infrastructure and create this virtualized data in the cloud where you're unifying file block and object and expressing it as file block or object. Let's talk about the company a little bit. So California based company. Where are your alpha geeks? Where are they located and where are you getting them from? Well, you know, we have teams. Other than you. And I mean that. We have a super talented team. In the most respectful way. We have a super talented team. We've got engineering centers of excellence here in the Bay Area, just in Palo Alto, border to Los Altos on El Camino Real and we have offices in Tel Aviv, Israel. A lot of storage talent there. And of course, my homeland in Salt Lake City, Utah, where we started Fusion IO back in the day. A lot of engineering talent there. And we have a very distributed team too. We have, I think at this point, two official Linux kernel maintainers, guys who are gatekeepers on what goes into Linux around the NFS stack and the SMB stack, as well as a very distributed team all the way from China to Europe that are part of this distributed. It's the culture that comes from the Linux community. And of course, you are very active at Fusion IO and contributions to Linux. I mean, we have had interesting discussions about how everybody forgot about paging and then how important it became after Flash who was injected into the world. Yeah, yeah, exactly. Well, you know, this is very intimate with, what we've done is we've driven all of this in the standards body by leveraging the extensions to NFS so that we can elevate all of these other types of storage to have the full file system kind of metadata where we can express objectives. So you're not working with these ethereal lones that are floating off nowhere. These are, you know, tethered in a namespace where you can see them, even though they're blocked underneath and they're consumed as block, they can show up in a file system namespace and without slowing down the access. That's the beauty. There's no longer the false choice of having to choose the sophistication of NAS with a nice namespace or the performance of SAN with high performance block, get both. By splitting the control plane and the data plane. David, I'm curious. When you have a solution like this, over time the customer can make changes to what storage they're using underneath. So I wonder if you can kind of paint a picture as to how you see, you know, kind of the storage environment look in the future. With server virtualization, we've decoupled management of the physical servers from the logical servers. That has made both jobs much easier, right? The same is true here because now you can decommission, you can stage new hardware, you can take it down for servicing, and you can do all of that without interrupting service because with data virtualization, if the data needs to move somewhere else to maintain its availability objectives, it will. So all you have to do is advertise that the storage system is not going to be able to provide availability because you're going to take it down and the data will automatically migrate somewhere else. You can service it. So now you can service the physical without having to worry about interrupting a lot. Just to point on that, David Fleuer said very conservatively, we spend at least 30% of the storage budget on migration, just getting it on, getting it off, and that's low hanging fruit just to save huge amounts of money when it comes to our infrastructure. That's right. So you could be talking about a C-mode migration with NetApp, this makes it seamless where you don't have to take it offline and copy data around. So just in the last couple of minutes, some non-CTL questions, but I know you know the answer. So you guys, Excel partners, other investors, who are the investors, how much battery, how much have you raised, where are you at, what's your headcount these days, what can you share with us? So we raised 63 million almost two years ago when we first got the thing started. About 75 people at this point in time, almost entirely engineering and product. We have a number of strategic investors and I'm not sure I'm free to speak to them, but folks on the supply side and folks on the distribution side of things, so a good blend there. And then a number of individual investors, that you know folks that were part of FusionIO founded lots of the brand name storage companies that you're familiar with in the past are also part of this one as well as individual investors. And product availability, where are you at? So we're announcing the product this week, we announced the product this week, we're in the early access with proof of concept customers, probably going to take that through to the end of the year before GA. We're targeting the enterprise market to start with, going after the high ground. You can always, you know, go down into SMB, but you have to have your shit together. Sorry, I'm not sure I can say that. You can say that in the queue. Okay, I'm not sure. Before you go into enterprise, so we're holding ourselves to a high standard like we did with FusionIO. Yeah, I got to tell you my junk CLEE story off camera. And then the product is called, what's the name of the word? So DataSphere. Yeah, okay. Is the name of the metadata control plane. And that's deployable as a virtual appliance or as a physical appliance and as a highly available couplet. So you can do it software defined or you can do it with hardware where you get the maximum in performance and availability. Excellent, all right, David, we'll have to leave it right there. Thanks so much for coming back. Thank you, congratulations on your next startup. Big lift, as we say. And it's just awesome to watch it. Really fantastic to see you again. Thank you so much. All right, keep right there, buddy. We'll be back with our next guest right after this. This is theCUBE right VMworld 2015. Right back.