 It's theCUBE, covering VMworld 2015. Brought to you by VMware and its ecosystem sponsors. And now your host, Dave Vellante. Welcome back to Moscone Center, everybody. This is theCUBE Silicon Angles continuous production of VMworld 2015. Brian Biles is here. He's the CEO and co-founder of Datrium. Brian, of course, from Data Domain fame. David Floyer and I are really excited to see you. Thanks for coming on theCUBE. It's great to see you guys again. So been a while. Coming out of Stealth, right. It's been a while. You've been busy, right? You did a domain, worked at EMC for a while, kind of disappeared, got really busy again. And here you are. New hats. Got new hats, yeah, yeah. So tell us about Datrium. I'm trying to catch up to you guys on ties. Yeah, yeah. Well, we're big on ties on the East Coast. Ah, you too. Well, he's even more East than I am, even though he lives out in California. But yeah, tell us about Datrium. Fundamentally different? Fundamentally different from other kinds of storage. Different kind of founding team. So I was a founder of Data Domain and Hugo Patterson, the CTO, their EMC fellow became a CTO for us. We hadn't, when we left EMC, we weren't sure what we were gonna do. We ended up running into two VMware principal engineers who had been there 10 or 12 years working on all kinds of stuff. And they believed that there was a market gap on scalable storage for VMs. So we got together. We knew something about storage. They knew something about VMs. And three years later, Datrium is at its first trade show. So talk more about that gap. I mean, it happens all the time, right? Guys, alpha geeks, no offense to that term. It's a term of endearment. Sorry, I'm a marketing guy. Tech athlete, right? So they get together and they sort of identify these problems and they're able to sniff them out at the root level. So what really, can you describe that problem in more detail? Sure. So broadly, there are two kinds of storage, right? There's sort of arrays and emerging, there's hyperconverge. They approach things in a very different way. In arrays, there tends to be a bottleneck in the controller, the electronics that do the data services, the raid and the snapshotting and cloning and compression indeed, even whatever. And increasingly, that takes more and more compute. So Intel is helping every year, but it's still a bottleneck. And when you run out, it's a cliff and you have to do a pretty expensive upgrade or migrate the data to a different place. And that's sticky and takes a long time. So in reaction, hyperconverge has emerged as an alternative and it has the benefit of killing the array completely, but it may have overcorrected. So it has some trade-offs that a lot of people don't like. For example, if a host goes down, the host has assumed all the data management problems that arrays used to have. So you have to migrate the data or rebuild it to service the host. You can't have a fit very cleanly between, for example, a blade server, which has one or two drive bays and a hyperconverge model where you look across the floor, the average number of capacity drives is four or five, not to mention the cache drives. So a blade server is just not a fit. So there's a lot of parts of the industry where that model is just not the right model. You know, if everybody's writing to everybody, then there's a lot of neighbor noise and it gets kind of weird to troubleshoot in tune. Arrays, you know, we're better in some respects. Things change with hyperconverged, a little different. We're trying to create a third path. In our model, there's a box that we sell. It's a 2U rack mount bunch of drives for capacity, but the capacity is just for at rest data. It's where all the rights go. It's where persistence goes, but we move all the data service processing, the CPU for RAID, for compression, for DDoop, whatever, to host cycles. We upload software to an ESX host and it uses, you know, anybody's x86 server and you bring your own flash for caching. So, you know, Gardner did a thing at the end of the year where they looked at discounted street price for flash, the difference between what you could pay on a server for flash, you know, just a commodity SSD and what you could pay in an array was like an 8X difference. So, if you don't, you know, we don't put RAID on the host. All the RAID is in the back end. So, that frees up another, whatever, 20%. You end up getting an order of magnitude difference in pricing. So, what you can get from us in flash on a host is not, you don't aim at 10% of your active data in cache. It gets close to $100 a terabyte after you do DDoop and compression on server flash. So, it's just cheap and plentiful. You put all your data up there. Everything runs out of flash locally. It never gets a network hit for a RAID. We do RAID caching locally. Unlike a hyperconverged, we don't spread data in a pool across the host. We're not interrupting every host for RAID, for rights, for somebody else. Everything is local. So, when you do a RAID, it goes to our box on the end of the wire on a 10 gig attached. But all of the compute operations are local. So, you're not interrupting everybody. All the resourcing you would do for any IO problem is a local either cores or flash resourcing. So, it's just a different model. It's really well suited for blade servers. No one else was doing that in such a good way. Unlike a cache-only product, it's completely organically designed for manageability. You don't have a separate tier for managing on the host, separate from an array, where you're probably duplicating provisioning and having to worry about how to do an array snapshot when you have to flush the cache on the host. It's all completely designed from the ground up. So, it means the storage that we store to is minimal cost. We don't have the compute overhead that you have with the controller. You don't have the flash, which is really expensive there. That's just cycles on the host. Everything is done with the most efficient path for both data and hardware. So, if you look at designs in general, that the flash has either been a cache or it's been 100% flash or it's been a tier of storage. So, if I understand that correctly, there isn't any tiering because you've got 100% of it in flash. So, we use flash on the host as a cache, but only in the sort of, I only use that word, regardedly in a sort of degenerate case. It's all of the data. So, it's a cache in the spirit that if the host dies, you haven't lost any data. The data is always safe somewhere else. But it's all the data. It's all the data. So, they're sitting on the disk, the backend, and presume you're writing sequentially to that all the time with log files and using the disk in the most effective way. That's right. Both sides. Both the flash, it's a log structure and the disk, it's a log structure. And we had the advantage to data domain. It was the most popular log structured file system ever. And we learned all the tricks about DDoop and Garbage Collection a long time ago. So, that CTO team is uniquely qualified to get this right. So, what about if it does go down? Are you clustering it? What happens when it goes down and you have to recover from those disk drives? That could take a bit of time. So, there's two sides to that. If a host fails, you use VMHA to restart the VM somewhere else and life goes on. If the backend fails, it fails the way a traditional mid-range array might fail. We have dual controllers, so there's fail over there. All the disks are dual attached. There's dual networks on each controller. You can have switch fail over. It's arrayed six, so there's a rebuild that happens if a disk fails, but you could have two of those and keep going. But the point I was getting at was that if you fail in the host, you've lost all your active data. I presume that move. You've lost the cache copy in that local flash, but you haven't lost any data. You haven't lost it. I meant you've lost it from the point of view. Only from a standpoint of speed. So, at that point, if the host is down, you have to restart the VM somewhere else. That's not instant. That takes some number of minutes, and that gives us some time to upload data to that host to the new cache. Great. And the data is all laid out in our system, not for interactive use on the disk drives, but for very fast upload to a cache. It's all sort of sequentially laid out, unblended per VM for blasting to... So what do you see as the key application types that this is going to be particularly suited for? So, our back end system has about 30 terabytes usable after all the raid and everything, and de-dup and compression. So I figure, two, four, six, six, data reduction. Call it 100 terabytes-ish. Depends on mileage. So, 100 terabyte box will sell, that's kind of a mid-range class array. It will sell mostly to those markets. And our software supports only VM storage, virtual disks. So, as long as it meets those criteria, it's pretty flexible. The host, each host can have up to eight terabytes of raw flash, post-de-dup and compression, that could be 50 terabytes of effective capacity of flash per host. And reads never leave the host, so you don't get network overhead for reads, so that's usually two thirds of most people I own. So, it's enormously price and cost-effective. And very performance, you know, performance as well, because of the latency stuff. And your IP is the way you lay out the data on the media, is that part of the IP? Well, listen, it's like two custom file systems from scratch. Yeah. One for the guy, so one for the box and one for the host. Not to mention all the management to make it look like there's one thing. So, there's a lot going on. It's a much more complex project than Data Domain was. Yeah, so you mentioned, you know, you learned from your log-structured file garbage collection days at Data Domain, but the problem that you're solving here is much closer to the host, much more active data. So, was that obviously a challenge, but so that was part of the new invention required, or was it really just directly sort of? I mean, it's at all levels. We had to make it fit, so we're very VM-centric. It looks to, the software looks to ESX as though it's an NFS share. But NFS terminates in each host, and then we use our own protocol to get across 10 gig to the back end. And this gives us some special effects we'll be able to talk about over time. Every virtual, like a Tintry design in some way. Well, it's NFS. So you get to see every VM's storage discreetly. It's sort of a, you know, before V-Volls, there was NFS. And we support 5.5, so this was a logical choice. So everything's VM-centric. All of the management, it just looks like there's a big pool of storage and everything else is per VM. From diagnostics to capacity planning to whatever, clones are per VM. You don't have to, you know, spend a lot of analytics to figure, you know, back out what the block, lones look like with respect to the VMs and try to, you know, figure it out. It's just, that's all there is. So I've talked to a lot of, Wikibon's been talking to a lot of flash-only people, and this is almost the flash-only in the sense that you are, everything is going, all of the IOs going to that flash. Once flash is sufficiently cheap and abundant, then yes. Yeah, yeah. And we write to NVRAM, which is the same as an all-flash array. So one of the things that we've noticed is that, what they find is that they have to organize things completely differently, particularly as they're trying to share things. And for example, instead of having the production system and then a separate copy for each application developer and another separate copy for the data warehouse, they're trying to combine those and share the data across there with snapshots of one sort or another. To amortize their very high costs? Just because it's much faster and quicker. You know, it's the customers that are doing this, not the vendors, they don't even know what's going on. But because they can share it, you don't have to move the data. Well, so it allows the developers to have a more current copy of the data so they can work on near production. All right, yeah. So I was just wondering whether that was an area that you were looking at to, again, apply a different way of doing storage. So it's a test dev use case, you're saying? Yeah, well test dev or data warehousing or whatever. I mean, we're certainly sensitive to the overhead of having a lot of copies. That's why we dedupe and so on the way we do. So it's very efficient, but it allows you to, for example, if you're doing a clone, it's a dedupe clone. So it gives you a new namespace entry and it keeps the rights separate, but it lets the common data, the data with commonality across other versions be consistent. So we got a wrap, but the time we have remaining. So just quick update on the company, headcount, funding, investors, maybe just give us the rundown. Sure, we've raised series A and B, we've raised about 55 million so far. NEA and Lightspeed, plus some Angels, Frank Slutman, Kylie Diane Green, original founder of VMware and Ed Booneon, who was the original CTO. About a little over 70 people. Great. And that's our first trade show. And yeah, awesome, well congratulations, Brian. It's really awesome to see you back in action. Not to have you in action, but now invisible action, so. Well, it's great to be here. Thanks very much for coming on theCUBE. Congratulations. Keep right there, everybody. We'll be back right after this. This is theCUBE, we're live from VMworld 2015. Be right back.