 Backup and recovery remains one of the most pressing challenges for IT practitioners. Protecting data from human error, disasters, accidental deletion, and many other unforeseen events has vexed IT pros for decades. For many years, tape was the primary method of protecting data, and while tape had some advantages, most notably its removable, as a sequential medium recovery from tape was very slow and painful for many if not most use cases. What's more, rapid data growth in the last several years has really stressed backup windows considerably causing a lot of data sets to frankly go unprotected, and this is particularly problematic because practitioners have observed that most recoveries, 90% at least, are for files that are less than 24 hours old. As such, in the early 2000s, disk-based backup began to take shape, first in the form of virtual tape libraries or VTL. And then over the next decade, we've seen this technology evolve to become the dominant platform for backup and recovery. Now the key inflection point came in the mid part of the 2000s with the steep adoption of data deduplication. This trend has catalyzed a new class of devices that is known as purpose-built backup appliances. The name kind of says it all. Now IDC has this market pegged at 3.8 billion this year in 2012. That's larger than the entire tape business and it's growing at a whopping 26% cagger from 2010 to 2015. During the decade of the 2000s, EMC made three acquisitions, starting in 2003 with Legato, then in 2006 with Afamar, and then the Blockbuster, $2.4 billion acquisition in 2009 of Data Domain. And this trio of assets has combined to give EMC a very dominant position in backup generally and in purpose-built backup appliances specifically. The company has more than 60% of the market and more than 40 share points relative to its closest competitor. Now interestingly, disk-based backup is still often a premium priced product relative to tape, yet tape despite its lower cost is being largely relegated to an archive and last resort compliance medium. So how is it that disk-based backup has exploded onto the scene so fast? How has EMC gained such a foothold in the market with roughly twice the market share in backup that it sees in its core disk business? Is that dominant number one position sustainable and what does the future hold for disk-based backup and recovery? Hello everyone, this is Dave Vellante of Wikibon and welcome to Running Data, a production of theCUBE, SiliconANGLE.tv's live programming that brings you all the major developments in cloud, big data, and converged infrastructure. Today I'm here with Rob Emsley and Michael Wilkie, directors of product marketing for EMC's backup and recovery systems division, BRS, Rob and Michael, welcome to theCUBE. Hello Dave. Thanks very much for taking the time here. So Rob, I'm going to start with you. Did you predict this massive explosion? I got to be honest with you Dave, I think that the size of the market and how quickly it has evolved has been a little bit of a surprise to a lot of people. I think certainly we've been in the disk-based backup business for a number of years and certainly you mentioned back in early 2000s, 2003, 2004, was when virtual tape libraries came onto the market and you may remember that that was kind of after the acquisition of Data General which brought Clarion into our portfolio and one of the biggest use cases of Clarion was Clarion disk libraries. So that was really when we started seeing people wanting to back up to disk to augment their existing tape-based backup and recovery processes. But really it wasn't until the latter part of the last decade when we went into 2006 where we made some decisions about getting into deduplication. We saw that as kind of a game-changing technology. The acquisition of Avamar and then Data Domain, we certainly went after those assets to bring them into the portfolio because we were seeing an insatiable demand for that type of solution. So it's interesting because I remember after the acquisition of Clarion and I remember many people within EMC saying, hey, we're really going to go after the tape business and it made a lot of sense but tapes stayed relatively inexpensive and that gap was pretty wide until deduplication came into the fore. I mean really until deduplication came along what you would find is that the people that were using disk-based backup were using it as an augmentation to tape. They were keeping backups on disk for days at most weeks and then they were creating tapes to retain the backup data longer. It wasn't really until we started to sell Avamar and then Data Domain that we really started to see people using backup to disk to retain all of their backup data on disk and quite often replicate it to off-site locations. So I want to stay on that for a minute and then we'll come back to the market data. So Michael, Rob just mentioned you had Avamar, Data Domain, they were actually competitors. You could say quasi-competitors because they were sort of largely different use cases but it was deduplication and it was sort of this collision course. So now you own Avamar, you own Data Domain, you've got NetWorker, you've got this Sea of Partners like Symantec for instance that is a partner. How do you rationalize all this portfolio? What are you telling customers? Well I think what we're telling customers is that that portfolio really when you look at it is a collection of best of breed products and solutions on their own and that's one of the reasons why EMC made those acquisitions. But more importantly we believe those products are fundamentally the building blocks for the architecture that we're going to need for the future. You need incredibly high performance, highly scalable backup storage, very fast-efficient deduplication that can be deployed everywhere along the data path. Software that is in the case of Avamar that was really designed from the ground up with disk and that is ideally suited for virtual environments. So each of these products has the ability to be deployed separately but we're tightly integrating them across the board. And so from our perspective it allows us to do two things, it allows us to meet customer needs with the right solution at the right time, everybody's not at the same place and it provides what we believe are the foundational building blocks for the kinds of components that they're going to need going forward. And we haven't seen all of the transformation, there's a lot more going on beyond just cloud, there's changing roles, there's a lot of potential new users and owners of the data in the environment that we're going to need to accommodate as well going forward. So you mentioned tightly integrated, what does it mean to be tightly integrated? Well tightly integrated essentially means if you're an Avamar customer today you can deploy data domain in that environment or you can deploy network in your environment, if you're a network or a customer you can deploy either of those two products. I mean essentially the customer experience when using the collection of EMC products is relatively seamless, even though you're using what was previously separate branded products from companies like Data Domain and Avamar, now when you're buying from EMC they really operate as a complete end-to-end solution. Even though they're different algorithms and speak different languages? Absolutely, I mean from a manageability standpoint we're able to deliver a much more seamless experience. We're actually able to optimize them oddly enough much better than when they're used separately because now we can essentially leverage the best of each of those products. A good example is with Avamar and Data Domain, we can through an Avamar user interface we can select to do client-side deduplication or deduplication back within the data domain system based on the workload and so from a customer perspective those are two products that are really ideally suited for you know VM work types environments or maybe you know file systems the others very well suited for you know large change rate heavy workloads and so we can basically allow our customers to use either one of those. So I have actually called this StovePipe Central, you would say that's unfair. I totally think it's unfair, I mean we clearly have separate products but you know our ability to you know give our customers choice right we don't have to always rip and sometimes a customer problem is a very specific pain point that we can solve there and at that point we can then form a linchpin if you will to continue to redesign and transform their environment. Other customers have the luxury of starting with a clean sheet of paper and we you know we can kind of come in with a total disk-based next generation back. I mean since I've been in this business there's been an argument about best of breed versus integrated stacks and then what I'm hearing is you can provide both. Yeah and it's the interesting thing Dave is that you know we actually made some really distinct decisions you know after the acquisitions of certainly Avamar and Data Domain I mean let's face it when we had quite Avamar you know we could have done what some of the other vendors in the space did and we could have simply said let's take this deduplication technology and simply just you know embed it with our long-term franchise of NetWorker and really just simply say the only way to get the deduplication from EMC is you have to buy into the NetWorker offering the offering but for years Avamar has found itself into all of the backup installed base whether it be NetWorker whether it be Symantec's installed base, TSM's installed base from IBM you know and that's really you know one of the marked differences is that you can absolutely get an integrated end-to-end solution with EMC backup software and EMC backup appliances but a lot of the time is that a customer may say I'm looking to replace tape I have a long-term commitment to my backup software which isn't for me EMC what can you do for me and we can we can bring EMC Data Domain into that environment and immediately start to add value to that customer often you know I'll tell you that a lot of those customers after a period of time they then decide well why not reduce my my vendor care and those are times where you see the competitive replacements take place where our customer goes with EMC backup software to use with their EMC backup appliances and you're making an argument that the integration is going to deliver whatever better performance or use of management which is a smart strategy I want to come back to the market Stu could you bring up the market angle slide so let's start by looking at this is IDC data worldwide purpose-built backup appliance revenue from 2010 to 2015 I like about this slide as it compares IDC's May of 2011 forecast with its December forecast and you can see just in that short period of time in six or seven months the numbers have changed dramatically so in 2012 IDC is forecasting that the market will be 3.8 billion that's a forecast that saying that the 2011 actuals were 2.8 billion now the entire tape business I don't think is 3 billion so here's deduplication and what evolved into purpose-built backup appliances aimed at replacing tape you know the old data domain tape socks and all that stuff and now this market is exceeded the entire dollar value of the tape market and his forecast to grow a 25% per year why why what what more is there what incremental business was there that we didn't see that obviously somebody saw but you know I just think that like any fast-moving market that is that is evolving it's it's sometimes difficult when you start to track it to really get a good feeling of how rapidly it is going to it is going to move and I think that one of the things that you've that you've seen in in 2011 is clearly the demand for purpose-built backup appliances has been greater than what was initially forecast you know back in May so it's just a a six month time period of of of talking to customers and and talking to vendors around what they're seeing from the adoption of this type of technology so I think it really does just does just show that they purpose-built backup appliances are solving you know customer problems that exist today and I think a lot of that is around you know the the the inevitable you know growth that they're having to deal with and looking for ways to to not have their backup infrastructure compound the growth in their production environment and deduplicated backup you know definitely helps with that and then I think there's a reality of the shift from physical to virtual infrastructure and you know customers who are looking for you know better backup infrastructure to help them you know protect you know their their virtual infrastructure assets now some of this is pricing too right if I just take IDC's total revenue divided by the total terabyte ship I get if I did my math right I think it was $1,500 a terabyte that's unduplicated undeduplicated right so that's a significant premium to tape which is going to be around I don't know what $40 a terabyte now if I deduplicate it depending on the ratio that I get yeah let's use 10 to 1 now I'm down to 150 it's still you know three to four X tape but so there's got to be some utility there that customers are seeing is that really is that because of the recovery speed is that the real driver or is it something else there's a few things I think you know I think 10 to 1 deduplication you know we often you know see it really does depend on you know the data and the amount of retention that you have so we you know tend to look at the deduplication rates anywhere between 10 and 30 X is kind of the the the numbers so the average is in that range okay so so I'm being very conservative yep with 150 you could you could you could cut that you know and a third get it down to 50 and now you're 50 versus 40 exactly relative to tape yeah so that's you know that's certainly one of them you know but but acquisition cost is any is any one part of the equation you know I think when you start looking at the the tape management processes and especially when you start looking at well what are you going to do with your tapes after you've created them where you're going to store them how you're going to transport them there you know and and the processes and the procedures that you have in place to do that you know a lot of the the reality that when customers go with purpose built backup appliances probably 80% of them implement replication you know so you know when you replicate deduplicated storage for the purposes of getting your backup somewhere else you remove all of that tape handling costs as well so you know that's a soft cost that actually probably more of a hard cost you know which which is more than just the acquisition of the actual tape infrastructure itself and then I think you've got just the risk profile I think that the customers have no less tolerance to the risk now than they have ever had in the past and I think that moving to a disk-based infrastructure for your backup recovery environment takes a lot of that risk off the table so Michael the big theme of emc worlds coming up in May is transformation you guys have been transforming the backup process for a while at the same time sometimes people hear transformation they think of rip and replace the talk a little bit about rip and replace versus if I want to use my same backup processes what are you seeing in the context of transformation well that's a difficult question to answer because it's you know the customers are in so many you know different places in terms of their overall you know not only goals but where they're on that that so-called transformational path I mean what you know what we see is probably the biggest driver is really you know first virtualization to be able to support those environments and now you know looking at sort of cloud deployments as something that is sort of the next logical step well not I can protect my virtual environment you know how do I begin to deliver cloud type services so you know what we're starting to see from customers is really you know help me help me evolve you know what the last thing a customer wants is that that you know that backup infrastructure becomes something that holds them back or slows them down and we've seen evidence you know we all know that virtualized environments have some unique requirements in terms of protecting them and you know trying to protect those environments with backup solutions that aren't really ideally suited or you know sort of designed for that at some point will begin to slow you down it will become the kind of the anchor that keeps you from moving forward so you know what I think what we want to see is customers you know they want they want to evolve they understand backups part of that transformation as part of the bigger IT picture they don't want the backup process to take on an inordinate amount of you know importance and time but they they also don't want it to be the thing that sort of slows them down so anything that we can help our customers with in terms of kind of moving at the pace that they intended to move is is really where we you know we think we've got great solutions again because we can offer you know multiple approaches to solve a particular problem now you mentioned earlier that when you're talking about Avamar you talked about it in the context of virtualization is that the primary platform that you lead with in a virtualized environment or not necessarily no Avamar is our lead solution for backup and recovery of VMI you know one of the things and it has been for a few years one of the the additional use cases that we have in protecting virtualized environments is we recognize that customers are starting to move more of their mission critical applications you know and larger scale applications into you know VMware environments and and really Avamar was you know was really designed you know to handle you know large numbers of files as opposed to small numbers of files that were large in size so that was really one of the driving factors for integrating the Avamar software capabilities and allowing you to back up to date in the main so what we enabled with that you know change last year is that when you're backing up workloads that are more suited to be backed up to date in the main as opposed to an Avamar data store is that you simply make that selection within the Avamar software so whether it be Oracle that you might be virtualizing or Exchange or SQL or SharePoint those are all typical workloads that we would say if the size of the database or the application warrants it then simply turn on the ability to back up to a date in the main appliance and get the best of both worlds yeah okay so you having you cake and eat it to there you get the client side back up which we haven't really talked about this but the the challenge in virtualized environments is you don't have as many physical resources to do backup it's great because servers are underutilized so we consolidate utilization goes up but the problem is the one application where servers weren't underutilized was backup yeah because it's a pig it's a big giant job and so as a result a lot of customers are finding that their backup windows are really stressed in virtualized environments products like Avamar which give client side back a deduplication minimize the IO traffic and address that problem now you're saying you can also select a target that's a big pipe exactly as well yeah I mean that's I mean that's really been the the biggest benefit of Avamar for VMware and it really has become the the main use case of Avamar you know obviously we support other environments and we support physical type workloads means specifically things like NAS environments or remote office branch office environments but in the VMware environment you know the ability to to use deduplication to reduce the the resource constraints that exist in that environment and I give you the ability to do both guest level backups which you know a lot of customers like to do because they understand it they treat their virtual machines as if they were physical machines but of course without deduplication running in the clients you get these resource constraints you just don't have enough resources available to you Avamar takes those resource constraints off the table so it allows you to do that and that allows you to get things like application consistent backups you know by running backup agents that understand the applications but one of the things that a lot of clients want is how can I back up my virtual machines without putting an agent especially for those virtual machines that don't have you know transactional applications that need to be kept in a very consistent fashion you know so there yeah image level backup and recovery you know using the vStorage APIs that VMware introduced you know with vSphere 4 you know give you that ability to integrate Avamar in very tightly with VMware so that has really become an area where we've put a lot of focus on yeah and change block tracking is starting to really you know catch on in the marketplace yeah all right I want to go back to the market data Stu can we bring up the competitive angle here and let's talk about that and let's talk about the company we love to talk about the competition because that your customers are making choices this is just amazing I mean this is 2010 data shows the market size at 1.6 1.7 billion is grown substantially since then look at EMC shares over 64% and you've held that excuse me so this is IDC's forecast actual data rather they think they published this report in December and it shows the first half they did not have of 2011 they're not going to publish the full 2011 till May we talked to Rob Amitruta invited Rob in the cube Rob come on anytime it's good friend love to have you but say traveling a lot like crazy so we're just going to curate this study 62% market share in the first half of 2011 so and your closest competitor IBM has 21% shares you get more than 20 points of market share ahead of the competition so sorry 40 points I said 20 yeah how is it that you've been able to achieve this well we know that a lot of that was through acquisition but how have you been able to maintain that share and is it sustainable yeah I think one of things that that that people don't you know always realize about EMC is is is how how important backup and recovery is you know within the company and one of things that that that the people may not realize is that you know we we actually have a a specialized sales force that all they do when they come into work each day is they they look at working with customers to you know solve backup and recovery problems you know and that that that specialized sales force you know was it's both sales account reps and and SEs and consultants so you know really a lot of our ability to to take our products because clearly you know the products you know have to have to do what they say they're going to do but then you have to get those products and introduce them to the customers that have the problems so one of the things I think we've been able to do is is both with the EMC sales force you know which includes this specialized body of individuals that focus on backup and recovery you know working you know both directly and just as importantly through the velocity channel program and the indirect you know partners that we have that are also taking this to the market is we have a lot of people taking our products to the customers and I think that has really allowed you know I think the success that we've had and the customer references that we've built to really sort of have a snowball's effect and and we continue to allow us to continue to get these solutions to our customers and help them transform their backups I mean it's really quite astounding as I said in my my intro it's your share in backup is about double what your share is in your core disc business and EMC's you know great disc salespeople and very well known so do you think this is sustainable I think a lot of it is is just that right at the moment and certainly when you when we look to the market forecast is the the the customers are demanding these types of solutions and certainly there's still a lot of customers that in the enterprise even that that haven't got de-duplicated backup appliances you know in production you have a lot of people that you know if you ask them the question you know are you using or do you plan to use de-duplicated backup you know the answers are very high but you know at the moment you know we've still got a lot of people that have a lot of tape in even believe it or not as their primary backup and recovery technology there are still enterprises that we that we meet with on a day-to-day basis that still say yeah we really don't back up to disk in any sort of big way the majority of our backup targets are our tape so really until kind of that market you know moves over to all be using de-duplicated backup solutions you know that market is going to continue to grow so you know whether or not it slows down I think in the out years from where we are now but certainly in the moment you know a lot of it is is just you know being able to ship and being able to to get with customers you know that are you know knocking on the door and asking for helpers transform our backup and recovery environment yeah Slutman always made that point that this market was under penetrated I think he would he would cite data from Gartner I don't know if it was Dave Russell or not it was probably Russell but that that he would say Russell that is would say that that a minority of their customers are actually doing disk-based backup which always intrigued me so again there's this price premium that people are paying because of the value and with replication there's additional value that's going into this marketplace which again I personally didn't didn't forecast but backup it was the number one is always been the number one problem for IT practitioners and that really hasn't changed okay what about let me stay in competition for a little bit because IBM obviously with Tivoli has some good software got strong services capabilities there it's hard to say they're catching up but they're at least on the chart in double digits HP has come out and made some pretty aggressive claims would store once basically focusing on things like high availability high availability and restore performance so how do you respond to some of those claims that of higher performance and better availability and things of that nature what do you how do you respond yeah I mean it's interesting I think that you know one of the reasons why you know we really went after data main is because we we really saw that technology was was going to be able to you know to scale and and and change and keep up with the performance that was needed and a lot of the reasons for that you know it's the it's the unique way that we leverage you know the Intel CPU architecture you know to handle the deduplication algorithms that we run you know so we don't rely on on on disc spindles in order to drive performance this is all you know Intel compute horsepower you know built into the EMC back appliances you know so over the last couple of years I mean the the performance increases and the scalability increases that we've done within the data main product line specifically have been tremendous I mean the the the largest deduplication appliance today is is close to 30 terabytes an hour performance and you know half a petabyte of usable capacity and what we found with with some of the competitors is you know they've got very good at mathematics because a lot of what we've seen with the competitors is is they've really been looking at you know how you how you take multiple appliances and rack and stack them together you know and then you know add up the performance of each one you know and put that out as your top end performance number and that's really not comparing apples to apples I mean really the you have to look at the the building block performance and scalability and how many building blocks it takes to build a solution and I think one of the things that we found is that data main in the enterprise that invariably tends to be the environments that require the highest levels of performance you know really get that through you know the smallest number of of moving parts and it isn't from a combination of of smaller appliances that you rack and stack together you know and I think that's one of the one of the mark differences you know that we tend to find with you know some of the competitive solutions that are out there. Michael one of the things that we've been tracking here at Wikibon and Stu Miniman's really his area of expertise I've written about this is the whole convergence, convergent infrastructure you know the VCE servers and storage and networking coming together and virtualization obviously. How has that affected the backup portfolio? Are you seeing drag in that? I mean it's obviously a major trend because a lot of people are doing it we've seen VCE now numbers are being reported on EMC's 10k kind of kind of slipped that in there. Joe talked in the call of a billion dollar 800 billion dollar backlog or run rate rather. Is it affecting backup is it visible? You know I think we definitely see convergence in backup you know certainly we see drag from VCE and that type of convergence but you know if the question is are we seeing convergence on the backup side of things you know we definitely do and I think one of the trends that we are seeing that's really a kind of an underlying shift in the ecosystem is that you know you're getting today you know different you know data owners in the environment people that you know you have VMware administrators you have you have application owners oracle DBs and storage administrators and of course your traditional backup administrator because we've now gone to you know disk becomes so readily available for backing up a lot of these people you know have the ability because they have the tools within their their the data application to begin to do some of their own data protection they don't have to go talk to the backup guy to you know get to the tape so they can find some disk and start doing their own backup and so one of the things that that has the potential I mean it's good for the data owners they get some control and they're looking for that but it has the potential to to begin to splinter or create new silos across the organization and so now you know where you had sort of a central backup administrator you have people going off and doing their own things and so you know we believe that the the emcee backup and recovery portfolio can actually help provide some of that converged infrastructure in the back end to accommodate all those new data users who who may be using different tools and a traditional backup application but at the same time you want to be able to provide some sort of centralized point of control and management so that you know when the question is asked are you protecting your data you're meeting your SLAs you can actually assure that because otherwise you really don't know what's going on so we we do see an increasing importance of convergence of the of the backup storage along with the software that manages that storage and the software and the ways that you interact with the data sources because it's in all cases may not be a traditional backup agent it may be a utility you know it may be you know snap and replicate it in a storage array so we do see that convergence yeah and you're talking about convergence of the organizational roles as well absolutely a couple years ago I mean even three years ago now I saw a presentation at an emcee world anyway two or three years ago where the speaker was putting forth a vision of data deduplication technology everywhere primary secondary all throughout the IO infrastructure the storage infrastructure and the vision was we're going to move around data without having to rehydrate and I thought wow that's a that's a great vision is that still a valid vision is that something that you're putting forth is that something that could happen in our lifetimes or is that a pipe dream no it's still it's still an area of investigation you know with you know across the you know the different storage groups that you know that we have inside of emcee you know certainly you know that you know we introduced your deduplication into primary storage initially with salara you know and then that followed through with deduplication for the the vnx platforms specifically you know in the vnx you know files you know base base storage you know and and you know I think one of the things that you may know is that you know we leverage a lot of the deduplication algorithms and the compression techniques that we'd actually acquired through you know the acquisition of both avamar and recovery point you know so so certainly the the the the combination of both primary storage with backup infrastructure and how we can optimize the the data flow between one and the other you know it's definitely an area of focus and investigation for us and and certainly you know one of the areas where you know we want to you know to look at is is how can we bring you know a more seamless mechanism for protecting you know primary storage that is running the primary data that is running on emcee storage and more seamlessly protect that with you know emcee backup appliances you know and do it in a way that you know you actually simplify the workflow and simplify you know where the data has to go from the primary environment to the from to the backup infrastructure so you know certainly from you know all of the you know the cto groups with either in the the backup part of the organization you know and the primary storage in part of the organization I mean they're all working to look at you know how can we take advantage of all of the technology that we've you know acquired over the years and actually blend it into much more of a converged solution yeah because you're definitely seeing the primary discussion is interesting so you're certainly not ruling that out in fact you're saying that you got your best people working on it sort of thing and you're seeing some of the startups in particularly in the flash area embed deduplication into their flash as a way to do to disk what what data domain and avamar have done to tape and it's sort of an interesting dynamic I'm sure you know emcee's vf cache is you know part of that trend as well so so I think that's an interesting vision that was put forth by that speaker at emcee world and I for one would be very excited to see it I know it's not trivial yeah but that's good all right gentlemen well listen thanks very much for coming in and talking about the portfolio you know emcee's lead is very clear number one it's hard to find a storage market where there's this much domination I think Joe Tucci on the last earnings call said that the the BRS business was running at two billion dollar run rate is that right or yeah the the data made an avamar parts of our backup portfolio you know exited you know 2011 with a you know two billion dollar run rate just those two assets not not including any services not including networker so you're not including networker but but the data made avamar businesses of product and services product and services okay but no no networker software in there just the avamar software correct it's a good number so again you know at the time 2009 height of the the downturn where cash was king emcee put out 2.4 billion in cash and I went wow that's a lot of dough at a time when cash is so precious but clearly the company saw an ROI I think made a good call and has put emcee on a growth path particularly for this part of the business so Rob and Michael thanks very much for coming inside the Cube it was great to see you guys again and we'll see you around the block and thanks for watching everybody this is Dave Vellante from Wikibon headquarters in Marlboro and we will see you next time