 From the CUBE studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is a CUBE Conversation. Hi everybody, this is Dave Vellante. Welcome to the special CUBE Conversation. I'm really excited to invite in my mentor and friend. We go way back. Fred Moore is here. He's the president of Horizon Information Strategies. And we're going to talk about managing data in the Zetabyte era. Fred, I think when we first met, we were talking about like the megabyte era. Right. Exactly. Back then we had maybe 10 bytes on our telephone and one on the wristwatch. But now you can put a whole data center in the single cartridge of tape and take off things have really changed. It's pretty amazing. And of course, for those of you who don't know, Fred, he was the first systems engineer at Storage Tech. And as I said, somebody who taught me a lot in my early days, of course he's very famous for the term that everybody uses today, backup is one thing, recovery is everything. And Fred just wrote this fantastic paper. He's done this year after year after year. He's just dug in. He's a clear thinker, strategic planner with a technical bent and a business bent. Like one of those five tool baseball players, Fred. But tell me about this paper. Why did you write it? Well, the reason I wrote that is that there's been so much focus in the last year or so on the archive component of the storage hierarchy. And the thing that's happening, we're generating data a lot faster than we're analyzing it. So it's piling up, being unanalyzed and sitting basically untapped for years at a time. So that has posed a big challenge for people. The other thing that got me deeper into this last year was the hyper skill market. They are, those people are so big in terms of footprint and infrastructure that they can no longer keep everything on disk. It's just economically not possible. The energy consumption for disk, the infrastructure costs, the frequency of, you know, taking a disk out every three, four or five years for replacement has made it very difficult to do that. So hyper scale has gone to tape in a big way. And it's kind of where most of the tape business in the future is going to wind up in these hyper scale businesses. We know tape doesn't exist in the home. It doesn't exist in a small data center. It's only a large scale data center technology. But that whole cosmos went in the end of the archive space and then a need for a new archive technology beyond tape. So I want to set up the premise here. I'm just going to pull this out of your paper. It says 60% of all data is archival and could reach 80% or more by 2024, making archival data by far the largest storage class. And given this trajectory, the traditional storage hierarchy paradigm is going to need to disrupt itself and quickly. And we're going to talk about that. That really is the premise of your paper here, isn't it? It is, it is the, you know, to do all this with traditional technologies is going to get very painful for a variety of reasons. So the stage is set for a new tier and a new technology to appear in the next five years. Fortunately, I'm actually working with somebody who is after this in a big way in a different way than what you and I know. So I think there is some hope here that we can redefine and really add a new tier down at the bottom. You see it kind of emerging on that picture, the deep archived tier. It's beginning to show up now. And it's, you know, infinite storage. I mean, if you look at major league sports, the World Series and Super Bowl, you know, that data will never be deleted. It'll be here forever. It'll be used periodically based on circumstances. Yeah, well, we've got that pyramid chart up here. I mean, you invented this chart essentially, at least you were the first person that ever showed it to me. I honestly think that you first created this concept where you had, you know, a high performance tier and a high cost per bit and then an archived tier. Maybe it wasn't this granular, you know, back in the 70s and 80s, but it's constantly been changing with different media types and different use cases. You know, you're right. I mean, and you well know this because, you know, when storage tech introduced the near line architecture, near line set between online and offline storage, we called it near line and trademark that term. So that was the tape library concept to move data from offline status to online status with a robotic library. So that brought up that third tier, online, near line and offline, but you're right. This pyramid has evolved and morphed into several things and, you know, I keep it alive. Somebody said, I'll have a pyramid on my tombstone instead of my name when I go down. But it's really the heart and soul of the infrastructure for data. And then out of this comes all the management and security, the deletion, the immutable storage concepts. The whole thing starts here. So it's like your house. You got to have a foundation and then you can build everything on top of it. Well, and as you pointed out in your paper, I mean, it always comes down to economics. So I want to bring up the sort of 10 year expected cost of ownership, the TCO for the three levels. You got all disk, you got all cloud and you got LTO and you got the different, you know, aspects of, of course, the purple is always the biggest piece of cost. It's the labor cost. But of course, you know, in cloud, you got the big media cost because they've done so much automation. I wonder if you could take us through this slide. What are the key takeaways there? Well, you know, the thing that hurts here with all these technologies is, as you can see up on top of there, what the key issues are with this and the staff and personnel. So the less people you have to manage data, the better off you are. And then, you know, it's pretty high for disk compared to a lot of things to do on disk, a lot to manage, a lot of, you know, sadly, what you and I had to deal with years ago and provision kind of, I mean, a lot of this stuff is just labor intensive. The farther what you get, the farther down the pyramid, you also get less labor intensive storage. And that helps. And you get a lower cost for energy and cost of ownership. The TCO thing is kind of taking on a new meaning. I hate to put up the TCO chart in some regards because it's all based on what your input variables are. So you can decide something different, but we've tried to normalize all kind of pricing and come up with everything. And the cloud is a big question for most people as to how does it stack up? And if you don't ever touch the data in the cloud, you know, the price comes way down. If you wanna start moving data in and out of the cloud, you're gonna have to ante up in a big way like that. But, you know, we're gonna see dollar or terabyte storage prices down at the bottom of this pyramid here in the next five years. Today you can get down to four or five a terabyte with drives, media, and libraries for tape. Discs is higher, flash is certainly higher than that. But, you know, we're gonna have the race to a dollar or terabyte total TCO cost here in 2025. So when Amazon announced, AWS announced a glacier, everybody said, okay, what is that? Is that tape? Is that, you know, the spun down disc? Because it took a while to get it back. But you're kind of seeing that tape technology, as you said, really move into the hyperscale space and that's gonna accommodate this massive, you know, lower part of the pyramid, isn't it? Exactly, yeah. And we don't have a spin down disc solution today. I was actually on the board of a company that started that called Copan years ago, right up here in near Boulder. Yeah, Bill Lottram. Yeah, Bill Lottram, I don't know that. You're absolutely right. And a few other people that you know also, but the spin down disc never made it. And, you know, you can spin up and down the SATA disc on your desktop computer, but doing that in a data center, then on a fiber channel drive never made it. So we don't have a spin down disc to do that. The archive space is kind of dominated by very high capacity disc and then tape. And most of the archive data in the world today, unfortunately sits on disc where it's not used at spinning seven by 24, 365 and not touch much. So it's a bad economic move, but customers just found that easier to handle by doing that than going back to tape. So we've got a lot of data stored in the wrong place from a total economics point of view. But the hyperscalers are solving this problem. Are they not through automation and, you know, you reference storage tiering, really trying to take the labor cost out. How are they doing? Are they doing a good job? They've done really well taking the labor cost out. I mean, they have optimized every screw, nut and bolt in the 42U chassis that you could imagine to make it as clean as possible to do that. So they've done a whole lot to bring that cost down, but still the magnitude of these data centers. We're gonna finish the year 2020 with about 570 hyperscale data centers, the way it's going right now around the world. You know, each one of these things is 350, 400,000 square feet and up of race floor space. And the economics just don't allow you to keep putting inactive data on spinning discs. We don't have to spin down discs. Tape is, you know, I feel like the only guy in the industry that says this sometimes, but, you know, tape's had a, you know, a renaissance that people don't appreciate in terms of reliability, throughput, you know, tapes three orders of reliability higher than disc right now. And most people don't know that. So tapes viable. The hyperscalers see that. I mean, we had one hyperscaler, you know, buy over a million pieces of LT08 tape last year alone, just to handle this, you know, be the pressure valve to take all this inactive stuff off of the gigantic disc farms that they have. Well, so let's talk about that a little bit. So you just try to keep it simple. You got, you know, flash disc and tape. It feels like disc is getting squeezed. We know what flash has done in terms of eating into disc. And you're seeing that in the storage market generally, it's soft right now. And I've posited that a lot of that is the headroom that data centers have with flash is, they don't have to buy spindles anymore for performance reasons. And the market is soft, only pure is showing, you know, consistent growth. And IBM's up a little bit because the mainframe, you got Dell popping back and forth. But generally speaking, the primary storage market is not a great place to be right now. All the actions and sort of secondary storage and data protection. And so disc is going to get squeezed. And you mentioned tape. You said that you're the only person talking about it, but you said in your paper, you know, it's sequential, so time to first bite is sometimes problematic. But you can front end a tape with cash. You can use algorithms and, you know, smart scans and to really address that problem and dramatically lower the cost. Plus you could do things like, you tell me, Fred, you're the technologist here, but you're going to have multiple heads, things that you can't necessarily do in a hermetically sealed disc drive. You can. And what you just described is called the active archive layer in the pyramid. So when you front end a tape library with a disk array for a cash buffer, you create an active archive and that data will sit in there for four or five days before it gets demoted based on inactivity. So, you know, for repetitive usage, you're going to get dislike performance or tape data. And that's the same caching concept that the subsystems had 30 years ago. So that does work. And the active archive has got a lot of momentum right now. There's right here near me where I live in Boulder, we have the active archive alliances headquarters. And I get to do their annual report every year. And this whole active archives thing is a big way to make and overcome that time, the first byte problem that we've had in tape and we'll have for quite a while. In your paper, you've talked about some of the use cases and workloads. And you laid out, you know, basically taking the pyramid and said, okay, based on the workload, certain percentage should be up at the top of the pyramid for the high performance stuff. And of course lower for, you know, the less important traditional workloads, et cetera. And it was striking to see the delta between the highest performance. We had 70% I think was up in the top of the pyramid versus, you know, the last use case. So you're talking about what it costs to store a Zetabyte and services Zetabyte. We're talking about 108 million at the high end versus about 11 or 12 million. So huge delta, 10x delta between the top and the bottom based on those, you know, allocations based on the workload. Yeah, I tried to get at the value of tiered storage based on your individual workload in your business. So I looked at five different workloads. The top one that you referenced that was in there at 108 million, you know, is the HPC market. I mean, when I visit a few of the HPC people, you know, their DOD agencies in many cases, you know, and I threw the pyramid up, the first thing they would say, our pyramid's inverted. You know, all of our archive data is about 10%. You know, we're all flash as much as we can and we have a little bit of archive. We are in constant simulation and compute mode and producing results like crazy from the data. So we do an IO, bring in maybe a whole file at a time and compute for minutes before we come up with an answer. So just the reverse. And then I got to look at all the different workloads talking to people and that's how we develop these profiles. So let's pull up this future of the storage hierarchy was again, kind of talks to the premise of your paper. Walk us through this slide. What changes should we be expecting? I mean, you got air gap in here. We're gonna, I'm gonna ask you about remastering and lifespan, but take us through this. Yeah, you know, the traditional chart that you had up on the first figure had four tiers. Two disk tiers, a solid state at the top and then the big archive tier, which is kind of everything falling down into tape at this point. But, you know, again, tape has some challenges. You know, time to first bite and sequential access only. And then when we couple using tape or disk as an archive most of that data that's archival is captured as unstructured data. So we don't have, we don't have tags. We don't have metadata. We don't have indices. And that has led to the movement for object storage to be a primary, maybe in the next five years, the primary format to store archive data because it's got all that information inside of it. So now we have a way to search things and we can get to objects, but in the interim, you know, it's hard to find and search out things that are unstructured. And, you know, most estimates would say 80% of the world's data is, at least that much is unstructured. So archives are hard to find. Once you store it, it's one thing. Retrieving it is another thing. And that's led to the formation of another layer in the storage hierarchy here. One, it's gonna be data that doesn't have to be remastered or, you know, converted to a new technology. In the case of disk every three, four or five years or tape drive every eight, maybe 10 years, tape drives last. Tape media can go 30 years, but with all new modern tape media, but unfortunately, you know, the underlying drive doesn't go back that far. You can't support that many different versions. So the media life is actually longer than it needs to be. So the stage is set for a new technology to appear down here to deal with archives. It'll have faster access. Will not need to be remastered every five or 10 years, but it can have, you know, a 50 year life in here. And I believe me, I've been looking for a long time, Dave, to find something like this. And, you know, we have a shot at this now. And I'm actually working with the technology that could pull this off. Well, it's interesting as well. You're calling out the air gap in the chart. We go back to our mainframe days. There's not a lot we haven't seen before, you know, maybe data duplication, but you know, the adversary has become a lot more sophisticated. And so air gaps and ransomware on everybody's mind today, but you've sort of highlighted three layers of the pyramid that are actually candidates for that airgapping. Yeah, the active archive up there, of course, you know, with the disk and tape combined, then just pure tape and then this new technology, which can be removable, you know, when you have removable, you create an air gap. And little did we know when you and I met that removable would be important to take. We thought we were trying to get rid of the Chevy truck access method. And now without electricity, with a terrorist attack, a pandemic or whatever, the fastest way to move data is put it on a truck and get it out of town. So that has got renewed life right now, removable, much to my shock from where we started. You talked about remastering, and you said it's a costly labor-intensive process that typically migrates previously archived data to new media every five to 10 years. First of all, explain why you have to do that and how data center operators can solve that problem. Yeah, and let's start with data where most of it is today on disk drives. You know, a disk drive's useful life is four to five years before it either fails or is replaced. That's pretty much common now. So then they have to start replacing these things and that means you have to copy, you know, read the data off the disk and write it somewhere else. Big data move. And as the years go by, that amount of data to remaster gets bigger and bigger. So I mean, you can do the math as you well know and you wanna move, you know, 50 petabytes of data, it's gonna take several weeks to do that electronically. So this gets to be a real time-consuming effort. So most data centers that I've seen will keep about one-fifth of their disk pool every year migrating to a new technology is kind of rolling forward as they go like that, rather than do the whole thing every five years. So that's the new build in the disk world. And then for tape, the drives stay in there longer, you know, the LTO family of drives could read, you know, two generations back from the current one that's been there. They cut that off a year ago, they'll go back to something like that soon. But you know, you can go eight to 10 years on a tape drive. The media life because of very unfair media, which was already oxidized would last 30 years or more. The old media metal particle was not oxidized. So, you know, there was oxidized flake, the particles that fall off, people say shit. I've had this in here eight years, you know, and it's gonna flake if I put it back in. So that didn't work well, but now that we had very unfair media, it was all oxidized, media life skyrocketed. So that was the whole trick with tape to get into something that was pre-oxidized before time could cause it to decay. So the remastering is less on tape by two to one to three to one, but still, you know, when you've got petabytes, maybe an exabyte sitting on tape in the future, that's gonna take a long time to do that. The remastering, you'd love a way to scale capacity without having to continue to move the data to something new ever so often. So my last question, you went from a technical role into a strategic planning role, which of course, the more technical you are in that role, the better off you're gonna be, so you can understand the guardrails. But you've always had a sort of telescope in the industry and you close the paper, and it's kind of where I wanna end here on, you know, what's ahead. And you talk about some of the technologies that obviously have, you know, legs, like 3D NAND and obviously magnetic storage. You got optical in here, but then you've got all these other ones that you even mentioned, you know, don't hold your breath waiting for these multi-layer photonics and tetic DNA, quartz glass media, holographic storage, quantum storage. We hear a lot about quantum. What should we be thinking about and expecting as observers as to, you know, new technologies that might, you know, drive some innovation in the storage business? Well, I've listed the ones that are in the lab that have any life at all, right on this paper. So, you know, you can kind of take your pick at what goes on there. I mean, optical disk, you know, has not made it in the data center. We've talked about it for 35 years. We invested in it in storage tech. It never saw the light of day. You know, optical disk has remained an entertainment technology throughout the last 35 years and the bit error rate is very low compared to data center technology. So, you know, optical would have to take a huge step going forward. We've got a lot of legs left in the solid state business. That's really active. SSD, the whole non-volatile memory space is probably eating up 45% of the total disk shipments in terms of units from what it was at its high in 2010. Unbelievable that, you know, disk would ship 650 million drives a year and now it's just under 400, 350, 400. So, flashes is taking disk stuff away like crazy. Tape should be taking disk away, but the tape industry doesn't do a very effective job of marketing itself. Most people still don't know what's going on with tape. They're still looking out of the rear view mirror at tape as opposed to the front windshield to see all the new things that have happened. So, you know, they had bad memories of tape in the past, load, stretch, edge, damage, tape wouldn't work, tear, anything like that was a problem. All that's pretty well gone away now. I mean, modern tape is a whole different ballgame, but most people don't know that. So, you know, tape's gonna have to struggle with access time and sequentiality. They've done a few things to come over access time and the order request now to tape, they optimize it based on physical movement on the tape. That can take out 50% of your access time for multiple requests on a cartridge. The one on here that's got the most promise right now would be a version of a multi-layer photonic storage, which is, I would say sort of like optical, but, you know, with data center class characteristics, multi-layer recording capability on that, random access, which tape doesn't have. And, you know, I would say that's probably the one that you would want to take some look at going forward like this. The others are highly speculative. You know, we've been talking about DNA since we were kids. We don't have a DNA product out here yet. I mean, it's, you know, it's access times eight hours. It's probably not gonna work for us. That's your, that's not your deep archive anymore. That's your time capsule storage. Yeah, right. Block the earth up. So, I mean, I think you kind of see what's here. I mean, the chances are it's still gonna be the magnetic technologies, tape disk, and then the solid-state memory stuff. But these are the ones that I'm tracking and looking at trying to, I've worked with a few of the companies that are in this future list. And I'd love to see something break through out there. But it's like we've always said about holographic storage, for example. You know, there's been more written about it than there's ever been written on it. Well, the paper's called Reinventing Archival Storage. You can get it on your website at www.presumefredhorizon.com. Horizon in the West. Get it on my website and yep, absolutely. Awesome. Fred Moore, great to see you again. Thanks so much for coming on theCUBE. My pleasure, Dave. Thanks a lot, great job. All right, and thank you for watching everybody. This is Dave Vellante for theCUBE. We'll see you next time.