 From around the globe, it's theCUBE, presenting the convergence of file and object, brought to you by Pure Storage. Okay, we're back with the convergence of file and object in a power panel. This is a special content program made possible by Pure Storage and co-created with theCUBE. Now, in this series, what we're doing is we're exploring the coming together of file and object storage, trying to understand the trends that are driving this convergence, the architectural considerations that users should be aware of, and which use cases make the most sense for so-called unified, fast file and object storage. And with me are three great guests to unpack these issues. Garrett Belsner is the data center solutions architect. He's with CDW, Scott Sinclair is a senior analyst at Enterprise Strategy Group. He's got deep experience on enterprise storage and brings that independent analyst perspective. Matt Burr is back with us. Gentlemen, welcome to the program. Thank you. Hey, Dave. Hey, Scott, let me start with you and get your perspective on what's going on in the market with object, the cloud, huge amount of unstructured data out there that lives in files. Give us your independent view of the trends that you're seeing out there. Well, Dave, you know, where to start? I mean, surprise, surprise, data's growing. But one of the big things that we've seen is we've been talking about data growth for what, decades now? But what's really fascinating is or changed is because of the digital economy, digital business, digital transformation, whatever you call it, now people are not just storing data, they actually have to use it. And so we see this in trends like analytics and artificial intelligence. And what that does is it's just increasing the demand for not only consolidation of massive amounts of storage that we've seen for a while, but also the demand for incredibly low latency access to that storage. And I think that's one of the things that we're seeing that's driving this need for convergence, as you put it, of having multiple protocols consolidated onto one platform, but also the need for high performance access to that data. Thank you for that, a great setup. I got like, I wrote down three topics that we're going to unpack as a result of that. So Garrett, let me go to you. Maybe you can give us the perspective of what you see with customers. Is this, is this like a push where customers are saying, hey, listen, I need to converge my file and object, or is it more a story where they're saying, Garrett, I have this problem. And then you see unified file and object as a solution. Yeah, I think for us, it's taking that consultative approach with our customers and really kind of hearing paying around some of the pipelines, the way that they're going to market with data today and kind of what are the problems that they're seeing? We're also seeing a lot of the change driven by the software vendors as well. So really being able to support a disaggregated design or you're not having to upgrade and maintain everything as a single block has really been a place where we've seen a lot of customers pivot to where they have more flexibility as they need to maintain larger volumes of data and higher performance data, having the ability to do that separate from compute and cash and most other layers is really critical. So Matt, I wonder if you could follow up on that. So, so Garrett was talking about this disaggregated design. So I like it, distributed cloud, et cetera, but then we're talking about bringing things together in one place, right? So square that circle. How does this fit in with this hyper distributed cloud edge that's getting built out? Yeah, I mean, I could give you the easy answer on that, but I could also pass it back to Garrett in the sense that Garrett, maybe it's important to talk about elastic and Splunk and some of the things that you're seeing in that world and how that, I think the answer to Dave's question, I think you can give a pretty qualified answer relative to what your customers are seeing. Oh, that'd be great, please. Yeah, absolutely, no problem at all. So, you know, I think with Splunk kind of moving from its traditional design and classic design, whatever you want to call it, up into smart store, that was kind of one of the first that we saw kind of make that move towards kind of separating object out. And I think, you know, a lot of that comes from their own move to the cloud and updating their code to basically take advantage of object in the cloud. But we're starting to see, you know, with like Vertica, Eon, for example, Elastic, other folks taking that same type of approach where in the past we were building out many 2U servers, we were jamming them full of SSDs and NVMe drives. That was great, but it doesn't really scale and it kind of gets into that same problem that we see with, you know, hyperconvergence a little bit where it's, you know, you're always adding something, maybe that you didn't want to add. So I think, you know, again, being driven by software is really kind of where we're seeing the world open up there. But that whole idea of just having that as a hub and a central place where you can then leverage that out to other applications, whether that's out to the edge for machine learning or AI applications to take advantage of it, I think that's where that convergence really comes back in. But I think, like Scott mentioned earlier, it's really folks are now doing things with the data where before, I think they were really storing it, trying to figure out what are we gonna actually do with it when we need to do something with it. So this is making it possible. Yeah, Dave, if I could just sort of tack on to the end of Garrett's answer there, you know, in particular with Vertica with Deon Mode, the ability to leverage sharded subclusters give you, you know, sort of an advantage in terms of being able to isolate performance hotspots. An advantage to that is being able to do that on a FlashBlade, for example. So sharded subclusters allow you to sort of say, you know, I am gonna give prioritization to, you know, this particular element of my application and my dataset, but I can still share that data across those, across those subclusters. So, you know, as you see, you know, Vertica advanced with Deon Mode, or you see Splunk advanced with SmartStore, you know, these are all sort of advancements that are, you know, it's a chicken and the egg thing. They need faster storage. They need, you know, sort of a consolidated data storage dataset. And that's what sort of allows these things to drive forward. Yeah, so Vertica, Deon Mode, for those who don't know, it's the ability to separate compute and storage and scale independently. I think Vertica, if they're not the only one there, one of the only ones, I think they might even be the only one that does that in the cloud and on-prem. And that sort of plays into this distributed, you know, nature of this hyper-distributed cloud, I sometimes call it. And I'm interested in the data pipeline. And I wonder, Scott, if we could talk a little bit about that, maybe where unified object and file, I mean, I'm envisioning this distributed mesh and then UFFO is sort of a node on that that I can tap when I need it. But Scott, what are you seeing as a state of infrastructure as it relates to the data pipeline and the trends there? Yeah, absolutely Dave. So when I think data pipeline, I immediately gravitate to analytics or machine learning initiatives, right? And so one of the big things we see, and this is, it's an interesting trend. It seems, you know, we continue to see increased investment in AI, increased interest and people think, and as companies get started, they think, okay, well, what does that mean? Well, I got to go hire a data scientist. Okay, well, that data scientist probably needs some infrastructure. And what often happens in these environments is where it ends up being a bespoke environment or a one-off environment. And then over time, organizations run into challenges. And one of the big challenges is the data science team or people whose jobs are outside of IT spend way too much time trying to get the infrastructure to keep up with their demands and predominantly around data performance. So one of the ways organizations that especially have artificial intelligence workloads in production, and we found this in our research, have started mitigating that is by deploying flash all across the data pipeline. We have data on this, sorry to interrupt, but Pat, if you could bring up that chart, that would be great. So take us through this, Scott, and share with us what we're looking at here. Yeah, absolutely. So Dave, I'm glad you brought this up. So we did this study, I want to say late last year. One of the things we looked at was across artificial intelligence environments. Now, one thing that you're not seeing on this slide is we went through and we asked all around the data pipeline and we saw a flash everywhere. But I thought this was really telling because this is around data lakes. And when many people think about the idea of a data lake, they think about it as a repository, it's a place where you keep maybe cold data. And what we see here is, especially within production environments, a pervasive use of flash storage. So I think that 69% of organizations are saying their data lake is mostly flash or all flash. And I think we have 0% that don't have any flash in that environment. So organizations are finding out that flash is an essential technology to allow them to harness the value of their data. So, Garrett and then Matt, I wonder if you could chime in as well. We talk about digital transformation and I sometimes call it the COVID forced march to digital transformation. And I'm curious as to your perspective on things like machine learning and the adoption and Scott, you may have a perspective on this as well. You know, we had to pivot, you had to get laptops, we had to secure the endpoints, you know, and VDI, those became super high priorities. What happened to injecting AI into my applications and machine learning to that go in the back burner? Was that accelerated along with the need to digitally transform? Garrett, I wonder if you could share with us what you saw with customers last year. Yeah, I mean, I think we definitely saw an acceleration. I think folks are in my market are still kind of figuring out how they inject that into more of a widely distributed business use case. But again, this data hub and allowing folks to now take advantage of this data that they've had in these data lakes for a long time. I agree with Scott. I mean, many of the data lakes that we have were somewhat flash accelerated, but they were typically really made up of, you know, large capacity, slower spinning, nearline drives, accelerated with some flash. But I'm really starting to see folks now look at some of those older Hadoop implementations and really leveraging new ways to look at how they consume data. And many of those redesigned customers come into us wanting to look at all flash solutions. So we're definitely seeing it. We're seeing an acceleration towards folks trying to figure out how to actually use it in more of a business sense now, or before I feel it goes a little bit more skunk works kind of people dealing with, you know, in a much smaller situation, maybe in the executive offices trying to do some testing and things. Scott, you're nodding away. Anything you can add in here? Yeah, so first off, it's great to get that confirmation that the stuff we're seeing in our research, Garrett seeing, you know, out in the field and in the real world. But, you know, as it relates to really the past year, it's been really fascinating. So one of the things we study at ESG is IT buying intentions. What are things, what are initiatives that companies plan to invest in? And at the beginning of 2020, we saw a heavy interest in machine learning initiatives. Then you transitioned to the middle of 2020 in the midst of COVID. Some organizations continued on that path, but a lot of them had to pivot, right? How do we get laptops to everyone? How do we continue business in this new world? Well, now as we enter into 2021, and hopefully we're coming out of this, you know, the pandemic era, we're getting into a world where organizations are pivoting back towards these strategic investments around how do I maximize the usage of data? And actually accelerating those because they've seen the importance of digital business initiatives over the past year. Yeah, Matt, I mean, when we exited 2019, we saw a narrowing of experimentation and our premise was, you know, that organizations are going to start now operationalizing all their digital transformation experiments. And then we had a 10 month Petri dish on digital. So what are you seeing in this regard? 10 month Petri dish is an interesting way to describe it. You know, we saw another, there's another candidate for pivoting there around ransomware as well, right? You know, security entered into the mix, which took people's attention away from some of this as well. I mean, look, I'd like to bring this up just a level or two because what we're actually talking about here is progress, right? And progress is an inevitability, you know, whether it's, whether you believe that it's by 2025 or you think it's 2035 or 2050, it doesn't matter. We're on a forced march to the eradication of disk. And that is happening in many ways, you know, in many ways due to some of the things that Garrett was referring to and what Scott was referring to in terms of what our customers demands for how they're going to actually leverage data that they have. And that brings me to kind of my final point on this, which is we see customers in three phases. There's the first phase where they say, hey, I have this large data store and I know there's value in there. I don't know how to get to it. Or I have this large data store and I've started a project to get value out of it and we failed. Those could be customers that, you know, marched down the Hadoop path early on and they got some value out of it, but they realized that HDFS wasn't gonna be a modern protocol going forward for any number of reasons. You know, the first being, hey, if I have gold dot master, how do I know that I have gold dot four is consistent with my gold dot master? So data consistency matters. And then you have the sort of third group that says, I have these large data sets. I know how to extract value from them and I'm already on to the verticals, the elastics, you know, the splunks, et cetera. I think those folks are the folks that that ladder group are the folks that kept their projects going because they were already extracting value from them. The first two groups we're seeing sort of saying the second half of this year is when we're gonna begin really being picking up on these types of initiatives again. Well, thank you, Matt, by the way, for hitting the escape key. I think value from data really is what this is all about. And there are some real blockers there that I kind of want to talk about. You mentioned HDFS, I mean, we were very excited, of course, in the early days of a dupe, many of the concepts were profound. But at the end of the day, it was too complicated. We've got these hyper-specialized roles that are serving the business, but it still takes too long. It's too hard to get value from data. And one of the blockers is infrastructure. That the complexity of that infrastructure really needs to be abstracted. Take an upper level. We're starting to see this in cloud where you're seeing some of those abstraction layers being built from some of the cloud vendors. But more importantly, a lot of the vendors, like Pew, are saying, hey, we can do that heavy lifting for you. And we have expertise in engineering to do cloud native. So I'm wondering what you guys see, maybe Garrett, you could start us off and others to, I mean, as some of the blockers to getting value from data, and how we're going to address those in the coming decade. Yeah, I mean, I think part of it we're solving here, obviously, with Pew bringing Flash to a market that traditionally was utilizing much slower media. The other thing that I see that's very nice with FlashBlade, for example, is the ability to kind of do things, once you get it set up, a blade at a time. I mean, a lot of the things that we see from just kind of more of a simplistic approach to this, like a lot of these teams don't have big budgets and being able to kind of break them down into almost a blade type chunk, I think has really kind of allowed folks to get more projects and things off the ground because they don't have to buy a full expensive system to run these projects. So that's helped a lot. I think the wider use cases have helped a lot. So Matt mentioned ransomware, using Safe Mode as a place to help with ransomware has been a really big growth spot for us. We've got a lot of customers very interested and excited about that. And the other thing that I would say is bringing DevOps into data is another thing that we're seeing. So kind of that push towards data ops and really kind of using automation and infrastructure as code is a way to now kind of drive things through the system. The way that we've seen with automation through DevOps is really an area we're seeing a ton of growth with from a services perspective. Hey guys, any other thoughts on that? I mean, I'll tee it up there. We are seeing some bleeding edge which is somewhat counterintuitive, especially from a cost standpoint, organizational changes at some companies. Think of some of the internet companies that do music for instance and adding podcasts, et cetera. And those are different data products. We're seeing them actually reorganize their data architectures to make them more distributed and actually put the domain heads, the business heads in charge of the data and the data pipeline. And that is maybe less efficient but it's again, some of these bleeding edge. What else are you guys seeing out there that might be some harbinger of the next decade? I'll go first. I think specific to the construct that you threw out, Dave, one of the things that we're seeing is the application owner, maybe it's the DevOps person but maybe it's the application owner through the DevOps person, they're becoming more technical in their understanding of how infrastructure interfaces with their application. I think what we're seeing on the FlashBlade side is we're having a lot more conversations with application people than just IT people. It doesn't mean that the IT people aren't there. The IT people are still there, for sure they have to deliver the service, et cetera. But the days of IT building up a catalog of services and a business owner subscribing to one of those services picking whatever sort of fits their need, I don't think that construct, I think that's the construct that changes going forward. The application owner is becoming much more prescriptive about what they want the infrastructure to fit, how they want the infrastructure to fit into their application. And that's a big change. And for certainly folks like Garrett and CDW, they do a good job with this, being able to sort of get to the application owner and bring those two sides together is a tremendous amount of value there. For us it's been a little bit of a retooling. We've traditionally sold to the IT side of the house and we've had to teach ourselves how to go talk the language of applications. So I think you pointed out a good construct there and that application owner playing a much bigger role in what they're expecting from the performance of IT infrastructure, I think is a key change. Interesting, I mean that definitely is a trend that's put you guys closer to the business where the infrastructure team is serving the business as opposed to sometimes I talk to data experts and they're frustrated, especially data owners or data product builders who are frustrated that they feel like they have to beg the data pipeline team to get new data sources or get data out. How about the edge? Maybe Scott, you can kick us off. I mean, we're seeing the emergence of edge use cases, AI inferencing at the edge, a lot of data at the edge. What are you seeing there and how does this unified object? I'll bring us back to that in file fit. Wow, Dave, how much time do we have? Two minutes. First of all, Scott, why don't you just tell everybody what the edge is? You got it. I'll figure it out, right? How much time do you have now? At the end of the day, and that's a great question, right, is if you take a step back and I think it comes back to Dave something you mentioned, it's about extracting value from data and what that means is when you extract value from data, what it does is, as Matt pointed out, the influencers or the users of data, the application owners, they have more power because they're driving revenue now. And so what that means is from an IT standpoint, it's not just, hey, here are the services you get, use them or lose them or, you know, don't throw a fit. It is, no, I have to adapt. I have to follow what my application owners mean. Now, when you bring that back to the edge, what it means is that data is not localized to the data center. I mean, we just went through a nearly 12 month period where the entire workforce for most of the companies in this country had went distributed and business continued. So if business is distributed, data is distributed and that means in the data center, that means at the edge, that means at the cloud and that means in all other places, in tons of places. And what it also means is you have to be able to extract and utilize data anywhere it may be. And I think that's something that we're gonna continue to see. And I think it comes back to, you know, if you think about key characteristics, we've talked about things like performance and scale for years, but we need to start rethinking it because on one hand, we need to get performance everywhere. But also in terms of scale, and this ties back to some of the other initiatives and getting value from data, it's something I call the massive success problem. One of the things we see, especially with workloads like machine learning is businesses find success with them. And as soon as they do, they say, well, I need about 20 of these projects now. Well, all of a sudden that overburdens IT organizations, especially across core and edge and cloud environments. And so when you look at environments, ability to meet performance and scale demands wherever it needs to be is something that's really important. You know, so Dave, I'd like to just sort of tie together sort of two things that I think that I heard from Scott and Garrett that I think are important and it's around this concept of scale. You know, some of us are old enough to remember the day when kind of a 10 terabyte blast radius was too big of a blast radius for people to take on or a terabyte of storage was considered to be, you know, an exemplary budget environment, right? Now we sort of think as terabytes kind of like we used to think of as gigabytes in some ways like you don't have to explain to anybody what a petabyte is anymore and you know, what's on the horizon and it's not far are our exabyte type dataset workloads. And you start to think about what could be in that exabyte of data. We've talked about how you extract that value. We've talked about sort of how you start but if the scale is big, not everybody's gonna start at a petabyte or an exabyte to Garrett's point. The ability to start small and grow into these products or excuse me, these projects, I think is a really fundamental concept here because you're not gonna just go by five, I'm gonna go kick off a five petabyte project. Whether you do that on disk or flash it's gonna be expensive, right? But if you could start at a couple hundred terabytes, not just as a proof of concept but as something that you know, you could get predictable value out of that then you could say, hey, this either scales linearly or non-linearly in a way that I can then go map my investments to how I can go dig deeper into this. That's how all of these things are gonna, that's how these successful projects are gonna start because the people that are starting with these very large, sort of expansive greenfield projects at multi-petabyte scale, it's gonna be hard to realize near-term value. Excellent, we got a wrap but Garrett, I wonder if you could close, when you look forward, you talk to customers, do you see this unification of file and object? Is this an evolutionary trend? Is it something that is gonna be a lever that customers use? How do you see it evolving over the next two, three years and beyond? Yeah, I mean, I think from our perspective, I mean, just from what we're seeing from the numbers within the market, the amount of growth that's happening with unstructured data is really just starting to finally really kind of hit this data deluge or whatever you want to call it that we've been talking about for so many years, it really does seem to now be becoming true. As we start to see things scale out really folks settle into, okay, I'm gonna use the cloud to start and maybe train my models, but now I'm gonna get it back on-prem because of latency or security or whatever the decision points are there. This is something that is not gonna slow down and I think folks like Pure having the ability to have the tools that they give us to use and bring to market with our customers are really key and critical for us. So I see it as a huge growth area and a big focus for us moving forward. Guys, great job unpacking a topic that you know it's covered a little bit but I think we covered some ground that is new and so thank you so much for those insights and that data, really appreciate your time. Thanks, Dave. Thanks. Yeah, thanks Dave. Okay, and thank you for watching the convergence of file and object. Keep it right there, right back after this short break.