 Hi everybody, this is Dave Vellante, we're winding down day two of theCUBE's coverage of Veeam on 2022. We're here at the Aria in Las Vegas. Myself and Dave Nicholson, we've been going for two days. Everybody's excited about the Veeam on party tonight. It's always epic and it's a great show in terms of its energy. Danny Allen is here, he's CTO of Veeam. He's back, he gave the keynote this morning. I gotta say Danny, you look pretty good up there with two hours of sleep. I had three, I don't look that good, but your energy was very high. And I gotta tell you, the story you told was amazing. It was one of the best keynotes I've ever seen. Even the technology pieces were outstanding, but you weaving in that story was incredible. I'm hoping that people will go back and watch it. We probably don't have time to go into it, but wow. Can you give us the one minute version of that long story? Sure, yeah. I read a book back in 2013 about a ship that sank off Portsmouth, Maine, and I thought I'm gonna go find that ship. And so it's a long, complicated process, five years in the making, but we used data, and the data that found the ship was actually from 15 years earlier. And in 2018, we found the bow of the ship, we found the stern of the ship, but what we were really trying to answer was, was it torpedoed or did the boilers explode? Because the Navy said the boilers exploded, and two survivors said no, it was torpedoed or there was a German U-boat there. And so our goal was find the ship, find the boiler. So in 2019, sorry, it was 2018, we found the bow and the stern, and then in 2019, we found both boilers perfectly intact, and in fact, the rear end of that torpedo wasn't much left of it, of course, but data found that wreck. And so it exonerated essentially any implication that somebody screwed up in the boiler system, and the survivors, or the children of the survivors obviously appreciated that, I'm sure. Yeah, several outcomes to it. So the chief engineer was one of the 13 survivors, and he lived with the weight of this for 75 years, 49 sailors, dead because of myself. But I had the opportunity of meeting some of the children of the victims, and also attending ceremonies. The families of those victims received purple hearts because they were killed due to enemy action. And then you actually knew how to do this. I wasn't aware you had experience finding wrecks. You've discovered several of them prior to this one, but the interesting connection, the reason why this keynote was so powerful was we're at Veeam, it's a data conference. You connected that to data because you went out and bought a, how do you say this? Magnonim, magnometer? Magnotometer. Magnotometer, I don't know what that is. And a side scan sonar, right? I got that right, that was easy. But then you know what this stuff is. And then you built the model, tensor flow, you took all the data, and you found anomalies, and then you went right to that spot, found the wreck with 12,000 pounds of dynamite, which made your heart beat. But then you found the boilers, that's incredible. But the point was, this is data, let's see, a lot of years after, right? Yeah, two sets of data were used. One was the original set of side scan sonar data by the historian who discovered there was a U-boat in the area, that was 15 years old. And then we used of course the wind and weather and wave pattern data that was 75 years old to figure out where the boilers should be, because they knew that the ship had continued to float for eight minutes. And so you had to go back and determine the models of where should the boilers be, if it exploded and the boilers dropped out and it floated along for eight minutes and then sank. Where was that data? Was it scanned? Was it electronic? Was it paper? How'd you get that data? So the original side scan sonar data was just hard drive data by the historian. I wish I could say he used beam to back it up, but I don't know that I can say that, but he still had the data 15 years later. The weather and wind and wave data, that was all public information. And we actually use that extensively. We find other wrecks, a lot of wrecks off Boston, sunk in World War II. So we were used to that model of tracking what happened. Wow, so yes, imagine if that data weren't available and it probably shouldn't have been, by all rights. So now fast forward to 2022, we've got, let's talk about just the cloud data. I think you said a couple of hundred petabytes in the cloud, 2019, 500 in, no, yeah, in 2020. 242 petabytes in 2020, 500 petabytes last year, and we've already done the same as 2020. So 240 petabytes in Q1. I expect this year to move an exabyte of data into the public cloud. Okay, so you got all that data. Who knows what's in there, right? And if it's not protected, who's gonna know in 50, 60, 70, 100 years, right? So that was your tie in. Yes. To the importance of data protection, which was just really, really well done. Congratulations, honestly, one of the best keynotes I've ever seen. Keynotes often really boring, but you did a great job, again, on two hours sleep. So much unpack here. The other thing that really is, I mean, we could talk about the demos, we could talk about the announcements. So, well, yeah, let's see. Salesforce data protection is now public. I almost spilled the beans yesterday in theCUBE, caught myself. Veeam version 12, obviously. You guys gave a great demo showing the Island cloud with, I think it was just four minutes. It was super fast recovery and four minutes of data loss. I was so glad you didn't say zero minutes, because that would have been a lie. Live demos, which I appreciate and also think is crazy. So some really cool demos and some really cool features. So much unpack, but the insights that you can provide through Veeam, it's Veeam one, was actually something that I hadn't heard you talk about extensively in the past, maybe I just missed it, but I wonder if you could talk about that layer and why it's critical differentiator for Veeam. It's the hidden gem within the Veeam portfolio because it knows about absolutely everything. And what determines the actions that we take is the context in which data is surviving. So in the context of security, which we are showing, we look for CPU utilization, memory utilization, data change rate. If you encrypt all of the data in a file server, it's going to blow up overnight. And so we're leveraging heuristics in the reporting. But even more than that, one of the things in Veeam one, people don't realize, we have this concept of Veeam intelligent diagnostics. It's machine learning, which we drive on our end, and we push out as packages into Veeam one. There's up to 200 signatures, but it helps our customers find issues before they become issues. Okay, so I want to get into it because I often times don't geek out with you and don't take advantage of your technical knowledge. And you've triggered a couple of things, especially when the analysts call and you said it again today, that modern data protection has meaning to you. We talked a little bit about this yesterday, but back in the days of virtualization, you shunned agents and took a different approach because you were going for what was then modern. Then you went to bare metal cloud, hybrid cloud, containers, super cloud. You used it in the analyst meeting, you didn't use the table. Say super cloud and then we'll talk about the edge. So I would like to know specifically, if we can go back to virtualized, because I didn't know this, exactly how you guys defined modern back then. And then let's take that to modern today. So what'd you do back then and then we'll get into cloud. Sure, so if you go back to when Veeam started, everyone who's using agents, you'd install something in the operating system. It would take 10%, 15% of your CPU because it was collecting all the data and sending it outside of the machine. When we went to a virtual environment, if you put an agent inside that machine, what happens is you would have a hundred operating systems all on the same server consuming resources from that hypervisor. And so we said there's a better way of capturing the data instead of capturing the data inside the operating system. And by the way, managing thousands of agents is no fun. So what we did is we captured a snapshot of the image at the hypervisor level and then over time, we just leveraged change block tracking from the hypervisor to determine what had changed. And so that was modern because no more managing agents, there was no impact on the operating system and it was a far more efficient way to store data. So you leveraged CBT through the API, right? Is that right? Correct, yeah, we used the vSphere API for data protection. Okay, so I said this to Michael earlier. Fast forward to today. Your data protection competitors aren't as fat dumb and happy as they used to be. So they could do things in containers and we talked about that a little bit. So now let's talk about cloud. What's different about cloud data protection? What defines modern data protection and where are the innovations that you're providing? Let me do one step in between those two because one of the things that happened between hypervisors and cloud was let's offline the capture of the data to the storage system because even better than doing it at the hypervisor clusters, do it on the storage array because that can capture the data instantly, right? So as we go to the cloud, we want to do the same thing except we don't have access to either the hypervisor or the storage system. But what they do provide is an API. So we can use the API to capture all of the blocks, all of the data, all of the changes on that particular operating system. Now here's where we've kind of gone full circle. On a hypervisor, you can use the vSphere agent to reach into the operating system to do things like application consistency. What we've done, modern data protection is create specific cloud agents that say, forget about the block changes, make sure that I have application consistency inside that cloud operating system. Even though you don't have access to the hypervisor or the storage, you're still getting the operating system consistency while getting the really fast capture of data. Okay, so that gets into you talking on stage about how snaps don't equal data protection. I think you just explained it, but explain why. But let me highlight something that Veeam does that is important. We manage both snapshots and backup because if you can recover from your storage array snapshot, that is the best possible thing to recover from, right? But we don't, so we manage both the snapshots and we convert it into the Veeam portable data format. And here's where the super cloud comes into play because if I can convert it into the Veeam portable data format, I can move that OS anywhere. I can move it from physical to virtual to cloud to another cloud back to virtual. I can put it back on physical if I want to. It actually abstracts the cloud layer. Now there are things that we do when we go between clouds. Some use BIOS, some use UEFI, but we have the data in backup format, not snapshot format, that's theirs, but we have it in backup format that we can move around and abstract workloads across all of the infrastructure. And your catalog is in control of that. Is that right? Am I thinking about that? That is 100%. And you know what's interesting about our catalog, Dave? The catalog is inside the backup. And so historically, one of the problems with backup is that you had a separate catalog and if it ever got corrupted, all of your data was meaningless because the catalog is inside the backup for that unique VM or that unique instance, you can move it anywhere and power it on. That's why people said we're so reliable. As long as you had the backup file, you can delete our software, you can still get the data back. So I love this fast pace. So that enables what I call super cloud, we can now call super cloud because now take that to the edge. If I want to go to the edge, I presume you can extend that and also presume the containers play a role there. Yes. So here's what's interesting about the edge. Two things. You don't want to have any state if you can help it. And so containers help with that, right? You can have stateless environment, some persistent data storage, but we not only provide the portability in operating systems, we also do this for containers. And that's true. If you go to the cloud and you're using say EKS with relational database services, RDS for the persistent data later, we can pick that up and move it to GKE or move it to OpenShift on premises. And so that's why I call this the super cloud. We have all of this data actually, I think you termed the term super cloud. Yeah, but thank you for, I mean, I'm looking for confirmation from a technologist that it's technically feasible. It is technically feasible and you can do it today. And that's a, I think it's a winning strategy personally. Will there be such a thing as edge native, you know, there's cloud native. Will there be edge native new architectures, new ways of doing things, new workloads, use cases? We talk about hardware, new hardware architectures, ARM based stuff that are gonna change everything to edge native? Yes and no. There's gonna be small tweaks that make it better for the edge. You're gonna see a lot of ARM at the edge, obviously for power consumption purposes. And you're also gonna see different constructs for networking. We're not gonna use the traditional networking, probably a lot more software to find stuff. Same thing on the storage. They're gonna try and minimize the persistent storage to the smallest footprint possible. But ultimately I think we're gonna see containers will lead the edge. We're seeing this now. We have a, I probably can't name them, but we have a large retail organization that is running containers in every single store with a small persistent footprint of the point of sale and, you know, local data. But what is running the actual system is containers and it's completely ephemeral. So we were at Red Hat, I was saying earlier last week and I'd say half, 40, 50% of the conversation was edge open shift, obviously playing a big role there. I know you do work with Rancher and Town Zoo and so there's a lot of options there. But obviously open shift has strong momentum in the marketplace. I've been dominating, you wanna chime in? No, I'm just, no, I know sometimes I'll sit here like a sponge, which isn't my job absorbing stuff. I'm just fascinated by the whole concept of a portable format for data that encapsulates virtual machines and or instances that can live in the containerized world. And once you create that common denominator, that's really, that's the secret sauce for what you're talking about as a super cloud. And what's been fascinating to watch because I've been paying attention since the beginning, you know, you go from simply, you know, VMFS and here it is. And by the way, the pitch to EMC about buying VMware, it was all about reducing servers to files that could be stored on storage arrays. All of a sudden the light bulbs went off, we can store those things. And it just became, it became a marriage afterwards. But to watch that progression that you guys have gone from, from that fundamental to all of the other areas where now you've created this common denominator layer has been amazing. So my question is what's the zinger? What doesn't work? Where are the holes? If you don't wanna look at it from a, from a glass half empty perspective, what's the next opportunity? We've talked about edge, but what are the things that you need to fill in to make this truly ubiquitous? Well, there's a lot of services out there that we're not protecting to be fair, right? We do Microsoft 365, we don't sales force, but there's a dozen other PaaS services that people are moving data into. And until we add data protection for those SaaS and PaaS services, you know, you have to figure out how to protect them. Now here's the kicker about those services. Most of them have the ability to dump data out. The trick is, do they have the APIs needed to put data back into it, right? Which is, which is a gap as an industry, we need to address this. I actually think we need a common framework for how to manage the export of data, but also the import of data, not at a system level, but at an atomic level of the elements within those applications. So there are gaps there at the industry, but we'll fill them. If you look on the infrastructure side, we've done a lot with containers and Kubernetes. I think there's a next wave around serverless. There's still servers and these microservices, but we're making things smaller and smaller and smaller and there's going to be an essential need to protect those services as well. So modern data protection is something that's gonna, we're gonna need modern data protection five years from now. The modern will just be different. Do you ever see the day, Danny, where governance becomes an adjacency opportunity for you guys? It's clearly an opportunity even now. If you look, we spent a lot of time talking about security. And what you find is when organizations go for example, ransomware insurance or for compliance, they need to be able to prove that they have certifications or they have security or they have governance. We just saw Transatlantic Privacy Pact. Only to be able to prove what type of data they're collecting, where are they storing it, where are they allowed to recover it, and yes, those are very much adjacencies for our customers. They're trying to manage that data. So given that, I mean, am I correct that architecturally you are, are you location agnostic, right? We are location agnostic and you can actually tag data to allowable locations. So the big trend that I think is happening is going to happen in this decade. I think we're scratching the surface is this idea that leave data where it is, whether it's an S3 bucket, it could be in an Oracle database, it could be in a Snowflake database, it could be in a data lake that's, you know, Databricks, whatever, and it stays where it is. And it's just a node on the call of the data mesh, not my term, Jamak Degani coined that term. The problem with that, and it puts data in the hands of closer to the domain experts. The problem with that scenario is you need self-service infrastructure, which really doesn't exist today anyway, but it's coming. And the big problem is federated computational governance. How do I automate that governance so that the people who should have access to data can have access to that data? That to me seems to be an adjacency. It doesn't exist except in a proprietary platform today. There needs to be a horizontal layer that is more open that anybody can use. And I would think that's a perfect opportunity for you guys, just strategically. It is, there's no question. And I would argue, Dave, that it's actually valuable to take snapshots and you could keep the data out at the edge wherever it happens to be collected, but then federate it centrally. It's why I get so excited by an exabyte of data this year, you know, going into the cloud, because then you're centralizing the aggregation. And that's where you're really gonna drive the insights. You're not gonna be running TensorFlow and machine learning and things on-premises unless you have a lot of money and a lot of GPUs and a lot of capacity. That's a type of thing that is actually better suited for the cloud. And I would argue better suited for not your organization. You're gonna want to delegate that to a third party who has expertise in privacy data analysis or security forensics or whatever it is that you're trying to do with the data. But you're gonna, today, when you think about AI, everybody talks about AI. That hadn't had a ton of talk about AI, some appropriate amount. Most of the AI today, correct me if you think this is not true, is modeling that's done in the cloud. It's dominant. Don't you think that's gonna flip when edge really starts to take off where it's more real-time inferencing at the edge and new use cases at the edge? Now, how much of that data is gonna be persisted is a point of discussion, but what are your thoughts on that? Completely agree. So my expectation of the way that this will work is that the true machine learning will happen in the centralized location, and what it will do is similar to v1. We'll push out to the edge the signatures that drive the inferences. So my example of this is always the Tesla driving down the road. There's no way that that car should be figuring it, sending up to the cloud, is that a stop sign? Is it not? It can't. It has to be able to figure out what the stop sign is before it gets to it. So we'll do the inferencing at the edge, but when it doesn't know what to do with the data, then it should send it to the core to determine, to learn about it and send signatures back out, not just to that edge location, but all the edge locations within the ecosystem. Yeah, so I get what you're saying. They might send data back when there's an anomaly or I always use the example of a deer running in front of the car. David Flore gave me that one. That's when I want to, I do want to send the data back to the cloud because Tesla doesn't persist a ton of data, I presume, right? Right. Less than 5% of it. Yeah, you know, I want to, usually I'm here to dive into the weeds. I want to kind of up level this to sort of the larger picture from an IT perspective. There's been a lot of consolidation going on. If you divide the world into vendors and customers, on the customer side, there are only, there are a finite number of seats at the table for truly strategic partners. Those get gobbled up often by hyperscale cloud providers. The challenge there, and I am part of a CTO accreditation program, so this is aimed at my students who are CIOs and CTOs. The challenge is that a lot of CIOs and CTOs on the customer side don't exhaustively drag out of their vendor partners like a Veeam. Everything that say Veeam can do for them. Maybe they're leveraging a point solution, but I guarantee you they don't all know that you've got cast in in the portfolio. Not every one of them does yet. Let alone this idea of a super cloud and how much of a strategic role you can play. So I don't know if it's a blanket admonition to folks out there, but you have got to leverage the people who are building the solutions that are gonna help you solve problems in the business. And I guess as in the form of a question, do you see that as a challenge now, those, the limited number of seats at the table for strategic partners? Challenge and opportunity. If you look at the types of partners that we've partnered with, storage partners, because they own the storage of the data. At the end of the day, we actually just manage it. We don't actually store it. The cloud partners. So I see that as the opportunity. And my belief is that the storage doesn't matter, but I think the organization that can centralize and manage that data is the one that can rule the world. And so clearly I'm at Veeam. I think we can do amazing things. But we do have key strategic partners, HPE, Amazon. You heard them on stage yesterday, 18 different integrations with AWS. So we have very strategic partners, Azure. I go out there all the time. So there are... So you don't need to be in the room at the table because your partners are in a sense. And they have a relationship with the customer as well. Yes, yes, fair enough. But the key to this, it's not just technology. It is these relationships and what is possible between our organizations. So I'm sorry to be so dense on this, but when you talk about centralizing that data, you're talking about physically centralizing it or it can actually live across clouds, for instance, but you've got visibility in your catalogs, you have visibility on all that data. Is that what you mean by centralized, literated? We have understanding of all the places that it lives and we can do things with it. We can move it from one cloud to another. We can take, you know, everyone talks about data warehouses. They're actually pretty expensive. You got to take data and stream it into this thing and there's massive computing power. On the other hand, we're not like that. You have storage on there. We can ephemeraly spin up a database when you need it for five minutes and then destroy it. We can spin up an image when you need it and then destroy it. And so in some ways- Irrespective of location, sorry. Irrespective of location. It doesn't have to be in a central place and that's been a challenge. You extract, transform, and load it and moving the data to the central location has been a problem. We have awareness of all the data everywhere and then we can make decisions as to what you do based on where it is and what it is. And that's a metadata innovation. I guess that comes back to the catalog, right? Is that correct? I mean, you have data about the data that informs you as to where it is and how to get to it. Yes, so metadata within the data that allows you to recover it and then data across the federation of all that to determine where it is. And machine intelligence plays a role in all that? Not yet? Not yet in that space. Now I do think that there's opportunity in the future to be able to distribute storage across many different locations and that's a whole conversation in itself. But our machine learning is more just on helping our customers address the problems in their infrastructures rather than determining right now where that data should be. These guys, they want me to break, but I'm refusing. So your Hadoop back in, that runs v1, that's, well, that's scale. A lot of customers I talk to run a Hadoop, say, hey, we got there, it was heavy lift. We're looking at new ways now, new approaches and going into, of course it's all in the cloud anyway, but what's that look like, that future look like? We haven't reached bottlenecks yet on our Hadoop clusters and we do continuously examine them for anomalies that might happen. Not saying we won't run into a bottleneck, we always do at some point, but we haven't yet. Awesome. We've covered a lot of the, we certainly covered extensively the research that you did on cyber security and ransomware. You're kind of your vision for modern data protection. I think we hit on that pretty well. Kastin, we talked to Michael about that and then the future product releases, the Salesforce data protection. You guys, I think you're the first there. I think you were the first from Microsoft 365. No, there are other vendors in the Salesforce space, but what I tell people, we weren't the first to do data capture at the hypervisor level. There was two other vendors, I won't tell you who they were, no one remembers them. Microsoft 365, we weren't the first ones to do that. But we're now the largest. So there are other vendors in the Salesforce space, but we're going to be aggressive. Danny Allen, thanks so much for coming to theCube and letting us pick your brain like that. Really great job today and congratulations on being back in semi normal. Thank you for having me, I love being on. All right, and thank you for watching. Keep it right there. More coverage, Dave Vellante for Dave Nicholson. By the way, check out siliconangle.com for all the written coverage, all the news. TheCube.net is where all these videos will live. Check out wikibond.com, I publish there every week. I think I'm going to dig in to the cybersecurity research that you guys did this week if I can get my hands on those charts, which Dave Russell promised me. We'll be right back right after this short break.