 We're running what I would call a little mini-series and we're exploring the conversions of file and object storage. What are the key trends? Why would you want to converge file and object? What are the use cases and architectural considerations and importantly what are the business drivers of UFFO, so called unified fast file and object. In this program you'll hear from Matt Burr who is the GM of Pure's FlashBlade business and then we'll bring in the perspectives of a solutions architect, Garrett Belsner who's from CDW and then the analyst angle with Scott Sinclair of the Enterprise Strategy Group ESG. He'll share some cool data on our power panel and then we'll wrap with a really interesting technical conversation with Chris Bond, CB Bond, who is a lead data architect at Micro Focus and he's got a really cool use case to share with us. So sit back and enjoy the program. From around the globe it's the Cube presenting the convergence of file and object brought to you by Pure Storage. We're back with the convergence of file and object a special program made possible by Pure Storage and co-created with the Cube. So in this series we're exploring that convergence between file and object storage. We're digging into the trends the architectures and some of the use cases for unified fast file and object storage UFFO. With me is Matt Burr who's the vice president general manager of FlashBlade at Pure Storage. Hello Matt how you doing? I'm doing great. Morning Dave how are you? Good thank you. Hey let's start with a little 101 you know kind of the basics. What is unified fast file and object? Yeah so look I mean I think you're gonna start with first principles talking about the rise of unstructured data. So when we think about unstructured data you sort of think about the projections 80% of data by 2025 is going to be unstructured data whether that's machine generated data or you know AI and ML type workloads. You start to sort of see this I don't want to say it's a boom but it's sort of a renaissance for unstructured data if you will. We move away from you know what we've traditionally thought of as general purpose NAS and file shares to you know really things that focus on fast object taking advantage of S3 cloud native applications that need to integrate with applications on site. You know AI workloads ML workloads tend to look to share data across you know multiple data sets and you really need to have a platform that can deliver both highly performant and scalable fast file and object from one system. So talk a little bit more about some of the drivers that you know bring forth that need to unify file and object. Yeah I mean look you know there's a real challenge in managing you know bespoke infrastructure or architectures around general purpose NAS and DAZ etc. So if you think about how an architect sort of looks at an application they might say well okay I need to have you know fast DAZ storage proximal to the application but that's going to require a tremendous amount of DAZ which is a tremendous amount of drives right hard drives are you know historically pretty pretty pretty unwieldy to manage because you're replacing them relatively consistently at multi-patter by scale. So you start to look at things like the complexity of DAZ you start to look at the complexity of general purpose NAS and you start to just look at quite frankly something that a lot of people don't really want to talk about anymore but actual data center space right like consolidation matters the ability to take you know something that's the size of a microwave like a modern flash blade or a modern you know UFO device replaces something that might be you know the size of three or four or five refrigerators. So Matt why is is now the right time for this I mean for years nobody really paid much attention to object S3 already obviously changed you know that course most of the world's data is still stored in file formats as you get there with NFS or SMB. Why is now the time to think about unifying object and in file? Well because we're moving to things like a contactless society you know the the things that we're going to do are going to just require a tremendous amount more compute power network and quite frankly storage throughput and you know I can give you two sort of real primary examples here right you know warehouses are being you know taken over by robots if you will it's not a war it's a it's a it's sort of a friendly advancement in you know how do I how do I store a box in a warehouse and you know we have we have a customer who focuses on large sort of big box distribution warehousing and you know a box that carried an object two weeks ago might have a different box size two weeks later well that robot needs to know where the space is in the data center in order to put it but also needs to be able to process hey I don't want to put the thing that I'm going to access the most in the back of the warehouse I'm gonna put that thing in the front of the warehouse all of those types of data you know sort of real-time you can think of the robot as almost an edge device is processing in real-time unstructured data and its object right so it's sort of the emergence of these new types of workloads and I give you the opposite example the other end of the spectrum is ransomware right you know today you know we'll talk to customers and they'll say quite commonly hey if if you know anybody can sell me a backup device I need something that can restore quickly if you had the ability to restore something in 270 terabytes an hour 250 terabytes an hour that's much faster when you're dealing with a ransomware attack you want to get your data back quickly you know so I want to ask you I was gonna ask you about that later but since you brought it up what is the right I guess call it architecture for for ransomware I mean how and explain like how unified object and file would support me I get the fast recovery but how would you recommend a customer go about architecting a ransomware proof you know system yeah well you know with flash blade and and with flash ray there's an actual feature called called safe mode and that safe mode actually protects the snapshots and and the data from sort of being as a part of the of the ransomware event and so if you're in the type of ransomware situation like this you're able to leverage safe mode and you say okay what happens in a ransomware attack is you can't get access to your data and so you know the bad guy the perpetrator is basically saying hey I'm not gonna give you access to your data until you pay me you know X in Bitcoin or whatever it might be right with with safe mode those snapshots are actually protected outside of the ransomware blast zone and you can bring back those snapshots because what's your alternative if you're not doing something like that your alternative is either to pay and unlock your data or you have to start restoring restoring excuse me from tape or slow disc that could take you days or weeks to get your data back so leveraging safe mode you know any of the flash for the flash blade product is a great way to go about architecting against ransomware so I gotta put my I'm thinking like a customer now so safe mode so that's an immutable mode right can't change the data is it can can an administrator going and change that mode can he turn it off or do I still need an air gap for example what would you recommend there yeah so there there there are still you know sort of our back or roll big act role-based access control policies around who can access that safe mode and okay right okay so anyway to subject for a different day I want to I want to actually bring up if you don't object a topic that I think used to be really front and center and now it's becoming front and center again I mean wikibon just produced a research note forecasting the future of flash and hard drives and those of you follow us know we've done this for quite some time and you can if you could bring up the chart here you you you could see and we see this happening again it was originally we forecast the the death of quote-unquote high spin speed just drives which is kind of an oxymoron but you can see our here on this chart this the hard disk at a magnificent journey but they peaked in volume in manufacturing volume in 2010 and the reason why that is is so important is that volumes now are steadily dropping you can see that and and we use Wright's law to explain why this is a problem in Wright's law essentially says that as you your cumulative manufacturing volume doubles your cost to manufacture declined by a constant percentage I won't go too much detail in that but suffice it to say that flash volumes are growing very rapidly HDD volumes aren't and so flash because of consumer volumes can take advantage of Wright's law and that constant reduction and that was really important for the next generation which is always more expensive to build and so this kind of marks the beginning of the end that what do you think what what's the future hold for a spinning disk in your view well I can give you the answer on two levels on a personal level it's why I come to work every day you know the the eradication or or extinction of an inefficient thing you know I like to say that inefficiency is the bane of my existence and I think hard drives are largely inefficient and I'm willing to accept the sort of long-standing argument that you know we've seen this transition in block right and we're starting to see it repeat itself in in unstructured data and I'm willing to accept the argument that cost is a vector here and it most certainly is right HDDs have been considerably cheaper than then then flash storage you know even to this day you know up to this point right but we're starting to approach the point where you sort of reach a 3x sort of you know differentiator between the cost of an HDD and an STD and you know that really is that point in time when you begin to pick up a lot of volume and velocity and so you know that tends to map directly to you know what you're seeing here which is you know a slow decline which I think is going to become even more rapid kind of probably starting around next year where you start to see SDS excuse me SSDs you know really replacing HDDs at a much more rapid clip particularly on the unstructured data side and it's largely around cost the workloads that we talked about robots and warehouses or you know other types of advanced and machine learning and artificial intelligence type applications and workflows you know they require a degree of performance that a hard drive just can't deliver we are we are seeing sort of the creative innovative disruption of an entire industry up before our eyes it's a fun thing to live through yeah and and we would agree I mean it doesn't the premise there is it doesn't have to be less expensive we think it will be by you know the second half or early second half of this decade but even if it's a we think around a 3x Delta the value of SSD relative to spinning disk is going to overwhelm just like with your laptop you know it got to the point where you said why would I ever have a spinning disk in my laptop we see the same thing happening here and and so and we're talking about you know raw capacity you know put in compression and dedupe and everything else that you really can't do with spinning disk because of the performance issues you can do with flash okay let's come back to uffo can we dig into the challenges specifically that this solves for customers give me maybe give us some examples yeah so you know I mean if we if we think about the examples you know the the robotic one I think is is the one that I think is the marker for you know kind of of of the modern side of of what we see here but what we're you know what we're what we're seeing from a trend perspective which you know not everybody's deploying robots right you know there's there's many companies that are you know that aren't going to be in either the robotic business or or even thinking about you know sort of future type oriented type things but what they are doing is greenfield applications are being built on object generally not on not on file and not on block and so you know the rise of object as sort of the the sort of let's call it the the next great protocol for you know for for modern workloads right this is this is that that modern application coming to the forefront and that could be anything from you know financial institutions you know right down through you we've even see it and seen it in oil and gas we're also seeing it across across health care so you know as as as companies take the opportunity as industries to take this opportunity to modernize you know they're modernizing not on things that are leveraging you know you know sort of archaic disk technology they're they're really focusing on on object but they still have file workflows that they need to that they need to be able to support and so having the ability to be able to deliver those things from one device in a capacity orientation or a performance orientation while at the same time dramatically simplifying the overall administration of your environment both physically and non physically is a key driver so the great thing about object is it's simple it's a kind of a get put metaphor it's it scales out you know because it's got metadata associated with the data and and and it's cheap the drawback is you don't necessarily associate it with high performance and and and as well most applications don't you know speak in that language they speak in the language of file you know or as you mentioned block so I see real opportunities here if I have some some data that's not necessarily frequently accessed you know every day but yet I want to then you know what an end of quarter whatever it is I want to I want to or machine learning I want to apply some AI to that data I want to bring it in and then apply a file format because it for performance reasons is that right maybe you could unpack that a little bit yeah so you know we see I think you described it well right but I don't think object necessarily has to be slow and nor does it have to be you know because when you think about you brought up a good point with metadata right being able to scale to a billions of object being able to scale to billions of objects excuse me is of value right and I think people do traditionally associate object with slow but it's not necessarily slow anymore right we did a sort of unofficial survey of of our of our customers in our employee base and when people described object they thought of it as like law firms and storing a word doc if you will and that's just you know I think that there's a lack of understanding or a misnomer around what modern what modern object has become and perform an object at particularly at scale when we're talking about billions of objects you know that's the next frontier right is it at pace performance wise with you know the other protocols no but it's making leaps and grounds so you talked a little bit more about some of the verticals that you see I mean I think when I think of financial services I think transaction processing but of course they have a lot of tons of unstructured data are there any patterns you're seeing by by vertical market we're you know we're not that's the interesting thing and you know as a as a as a as a company with a with a block heritage or a block DNA those patterns were pretty easy to spot right there were a certain number of databases that you really needed to support or a goal sequel some Postgres work etc then kind of the modern databases around Cassandra and things like that you knew that there were going to be VMware environments you know you could sort of see the trends and where things were going unstructured data is such a broader horizontal thing right so you know inside of oil and gas for example you have you know you have specific applications and bespoke infrastructures for those applications you know inside of media entertainment you're the same thing that the trend that we're seeing the commonality that we're seeing is the modernization of you know object as a starting point for all the all the net new workloads within within those industry verticals right that that's the most common request we see is what's your object roadmap what's your you know what's your what's your object strategy you know where do you think where do you think object is going so there isn't any you know sort of there's no there's no path it's really just kind of a wide open field in front of us with common requests across all industries so the amazing thing about pure just as a kind of a little you know quasi you know armchair historian the industry is pure is really the only company in many many years to be able to achieve escape velocity break through a billion dollars I mean three part couldn't do it I salon couldn't do it compelling couldn't do it I could I could go on but pure was able to achieve that as an independent company and so you become a leader you look at the Gartner Magic Quadrant your leader in there maybe made it this far you got to have some chops and so of course it's very competitive there are a number of other storage suppliers that have announced products that unify object and file so I'm interested in how pure differentiates why pure it's a great question and it's one that you know having been a long time puritan you know I take pride in answering and it's actually really simple answer it's its business model innovation and technology right the technology that goes behind how we do what we do right and I don't mean the product right innovation is product but having a better support model for example or having on the business model side you know evergreen storage right where we sort of look at your relationship to us as a subscription right you know we're gonna sort of take the thing that that you've had and we're gonna modernize that thing in place over time such that you're not rebuying that same you know terabyte or you know petabyte of storage that you've that you that you've paid for over time so you know sort of three legs of the stool that that have made you know pure clearly differentiated I think the market has has recognized that you're right it's it's hard to break through to a billion dollars but I look forward to the day that you know we we have two billion dollar products and I think with you know that rise in in unstructured data growing to 80% by 2025 and you know the massive transition that you know you guys have noted in in in your HDD slide it's a huge opportunity for us on you know the other unstructured data side of the house you know the other thing I'd add Matt I've talked to cause about this is it's simplicity first I've asked them why don't you do this why don't you do that and the answer is always the same as that adds complexity and we we put simplicity for the customer ahead of everything else and I think that served you very very well what about the economics of unified file and object I mean if you're bringing additional value presumably there's a there there's a cost to that but there's got to be also a business case behind it what kind of impact have you seen with customers yeah I mean look I'll I'll go back to something I mentioned earlier which is just the reclamation of floor space and power and cooling right you know there's a you know there's a people people people want to search for kind of the the sexier element if you will when it comes to looking at how we how you drive value from something but the reality is if you're reducing your power consumption by you know by a material percentage power bills matter in big in big data centers you know customers typically are facing it you know a paradigm of well I want to go to the cloud but you know the clouds are not being more expensive than I thought it was going to be or you know I figured out what I can use in the cloud I thought it was going to be everything but it's not going to be everything so hybrids where we're landing but I want to be out of the data center business and I don't want to have a team of 20 storage people to Matt you know to administer my storage you know so there's sort of this this very tangible value around you know hey if I could manage you know multiple petabytes with one full-time engineer because the system to your and causes point was radically simpler to administer didn't require someone to be running around swapping drives all the time would that be a value the answer is yes 100% of the time right and then you start to look at okay all right well on the UFFO side from a product perspective hey if I have to manage a you know bespoke environment for this application if I have to manage a bespoke environment for this application and a bespoke environment for this application and a spoken environment for this application I'm managing four different things and can I actually share data across those four different things there's ways to share data but most customers it's gets too complex how do you even know what your what your gold dot master copy is of data if you have it in four different places where you try to have it in four different places and it's four different spot siloed infrastructures so when you get to the sort of the side of you know how do we how do you measure value in UFFO it's actually being able to have all of that data concentrated in one place so that you can share it from application to application got it I'm interested we get a couple minutes left I'm interested in the the update on flash blade you know generally but also I have a specific question I mean look getting file right is hard enough you just announced SMB support for flash blade I'm interested in you know how that fits in I think it's kind of obvious with file and object converging but give us the update on on flash blade and maybe you could address that specific question yeah so look I mean we're we're you know tremendously excited about the growth of flash blade you know we we found workloads we never expected to find you know the rapid restore workload was one is actually brought to us from from a customer actually and has become you know one of our one of our top two three four you know workloads so you know we're really happy with the trend we've seen in it and you know mapping back to you know thinking about HDDs and SSDs you know we're well on a path to building a billion dollar business here so you know we're very excited about that but to your point you know you don't just snap your fingers and get there right you know we've learned that doing file and object is is harder than block because there's more things that you have to go do for one you're basically focused on three protocols SMB NFS and S3 not necessarily in that order but to your point about SMB you know we are on the path through to releasing you know SMB full full native SMB support in the system that will allow us to service customers we have a limitation with some customers today where they'll have an SMB portion of their NFS workflow and we do great on the NFS side but you know we didn't we didn't have the ability to plug into the SMB component of their workflow so that's going to open up a lot of opportunity for us on on that front and you know we continue to you know invest significantly across the board in in areas like security which is you know become more than just a hot button you know today security has always been there but it feels like it's blazing hot today and so you know going through the next couple years we'll be looking at you know developing some some you know pretty material security elements of the product as well so well on a path to a billion dollars is the net on that and you know we're fortunate to have SMB here and we're looking forward to introducing that to those customers that have you know NFS workloads today with an SMB component. Yeah nice tailwind good TAM expansion strategy Matt thanks so much really appreciate you coming on the program. We appreciate you having us and thanks much Dave good to see you. Okay we're back with the convergence of file and object in a power panel this is a special content program made possible by Pure Storage and co-created with theCUBE now in this series what we're doing is we're exploring the coming together of file and object storage trying to understand the trends that are driving this convergence the architectural considerations that users should be aware of in which use cases make the most sense for so-called unified fast file and object storage and with me are three great guests to unpack these issues Garrett Belsner is the data center solutions architect he's with CDW Scott Sinclair is a senior analyst at Enterprise Strategy Group he's got deep experience on enterprise storage and brings that independent analyst perspective and Matt Burr is back with us gentlemen welcome to the program. Thank you Dave. Hey Scott let me let me start with you and get your perspective on what's going on the market with with object the cloud huge amount of unstructured data out there that lives in files give us your independent view of the trends that you're seeing out there. Well Dave you know where to start I mean surprise surprise data is growing but one of the big things that we've seen is we've been talking about data growth for what decades now but what's really fascinating is or changed is because of the digital economy digital business digital transformation whatever you call it now people are not just storing data they actually have to use it and so we see this in trends like analytics and artificial intelligence and what that does is it's just increasing the demand for not only consolidation of massive amounts of storage that we've seen for a while but also the demand for incredibly low latency access to that storage and I think that's one of the things that we're seeing that's driving this need for convergence as you put it of having multiple protocols consolidated onto one platform but also the need for high performance access to that data. Thank you for that a great setup I got like I wrote down three topics that we're going to unpack as a result of that so Garrett let me let me go to you maybe you can give us the perspective of what you see with customers is this is this like a push where customers are saying hey listen I need to converge my file an object or is it more story where they're saying Garrett I have this problem and then you see unified file an object as a solution. Yeah I think I think for us it's you know taking that consultative approach with our customers and really kind of hearing pain around some of the pipelines the way that they're going to market with data today and kind of what are the problems that they're seeing we're also seeing a lot of the change driven by the software vendors as well so really being able to support a disaggregated design where you're not having to upgrade and maintain everything as a single block has really been a place where we've seen a lot of customers pivot to where they have more flexibility as they need to maintain larger volumes of data and higher performance data having the ability to do that separate from compute cache and most other layers are really critical. So Matt I wonder if you could you know follow up on that so so Garrett was talking about this disaggregated design so I like it you know distributed cloud etc but then we're talking about bringing things together in in one place right so square that circle how does this fit in with this hyper distributed cloud edge that's getting built out. Yeah you know I mean I could give you the easy answer on that but I could also pass it back to Garrett in the sense that you know Garrett maybe it's important to talk about elastic and Splunk and some of the things that you're seeing in in that world and how that I think the answer to Dave question I think you can give you can give a pretty qualified answer relative what your customers are seeing. Oh that'd be great please. Yeah absolutely no problem at all so you know I think with Splunk kind of moving from its traditional design and classic design whatever you want to you want to call it up into smart store that was kind of one of the first that we saw kind of make that move towards kind of separating object out and I think you know a lot of that comes from their own move to the cloud and updating their code to basically take advantage of object object in the cloud but we're starting to see you know with like Vertica Eon for example elastic other folks taking that same type of approach where in the past we we were building out many 2u servers we were jamming them full of you know SSDs and then DME drives that was great but it doesn't really scale and it kind of gets into that same problem that we see with you know hyperconvergence a little bit where it's you know you're all you're always adding something maybe that you didn't want to add so I think it you know again being driven by software is really kind of where we're seeing the world open up there but that whole idea of just having that as a hub and a central place where you can then leverage that out to other applications whether that's out to the edge for machine learning or AI applications to take advantage of it I think that's where that convergence really comes back in but I think like Scott mentioned earlier it's really folks are now doing things with the data where before I think they were really storing it trying to figure out what are we going to actually do with it when we need to do something with it so this is making it possible yeah and Dave if I could just sort of tack on to the end of Garrett's answer there you know in particular Vertica with the on mode the ability to leverage sharded sub clusters give you you know sort of an advantage in terms of being able to isolate performance hot spots you an advantage to that is being able to do that on a flash blade for example so sharded sub clusters allow you to sort of say I am you know I am going to give prioritization to you know this particular element of my application in my data set but I can still share those share that data across those across those sub clusters so you know as you see you know Vertica advanced with the on mode or you see Splunk advanced with with smart store you know these are all sort of advancements that are you know it's a chicken in the egg thing they need faster storage they need you know sort of a consolidated data storage data set and that's what sort of allows these things to drive forward yeah so Vertica Eon mode for those who don't know it's the ability to separate compute and storage and scale independently I think I think Vertica if they're if they're not the only one there one of the only ones I think they might even be the only one that does that in the cloud and on-prem and that sort of plays into this distributed you know nature of this hyper-distributed cloud I sometimes call it and and I'm interested in the in the data pipeline and and I wonder Scott if we could talk a little bit about that maybe where unified object and file I mean I'm envisioning this this distributed mesh and and then you know UFFO is sort of a node on that that I I can tap when I need it but but Scott what are you seeing as the state of infrastructure as it relates to the data pipeline and the trends there yeah absolutely Dave so when I think data pipeline I immediately gravitate to analytics or or machine learning initiatives right and so one of the big things we see and this is it's an interesting trend it seems you know we continue to see increased investment in AI increased interest and people think and as companies get started they think okay well what does that mean well I got to go hire a data scientist okay well that data scientists probably need some infrastructure and what they end what often happens in these environments is where it ends up being a bespoke environment or a one-off environment and then over time organizations run in the challenges and one of the big challenges is the data science team or people whose jobs are outside of IT spend way too much time trying to get the infrastructure to keep up with their demands and predominantly around data performance so one of the one of the ways organizations that especially have artificial intelligence workloads in production and we found this in our research have started mitigating that is by deploying flash all across the data pipeline we have we have data on this sorry to interrupt but Pat if you could bring up that that chart that would be great so take us through this Scott and share with us what we're looking at here yeah absolutely so so Dave I'm glad you brought this up so we did this study I want to say late last year one of the things we looked at was across artificial intelligence environments now one thing that you're not seeing on this slide is we went through and we asked all around the data pipeline and we saw flash everywhere but I thought this was really telling because this is around data lakes and when when or many people think about the idea of a data lake they think about it as a repository it's a place where you keep maybe cold data and what we see here is especially within production environments a pervasive use of flash storage so I think that 69% of organizations are saying their data lake is mostly flash or all flash and I think we have zero percent that don't have any flash in that environment so organizations are finding out that they that flash is an essential technology to allow them to harness the value of their data so Garrett and then Matt I wonder if you could chime in as well we talk about digital transformation and I sometimes call it you know the COVID forced march to digital transformation and I'm curious as to your perspective on things like machine learning and the adoption and Scott you may have a perspective on this as well you know we had to pivot you had to get laptops we had to secure the endpoints you know and VDI those became super high priorities what happened to you know injecting AI into my applications and machine learning did that go in the back burner was that accelerated along with the need to digitally transform Garrett I wonder if you could share with us what you saw with with customers last year yeah I mean I think we definitely saw an acceleration I think folks are in in my market are still kind of figuring out how they inject that into more of a widely distributed business use case but again this data hub and allowing folks to now take advantage of this data that they've had in these data lakes for a long time I agree with Scott I mean many of the data lakes that we have were somewhat flash accelerated but they were typically really made up of you know large capacity slower spinning near line drives accelerated with some flash but I'm really starting to see folks now look at some of those older Hadoop implementations and really leveraging new ways to look at how they consume data and many of those redesigned customers are coming to us wanting to look at all flash solutions so we're definitely seeing it and we're seeing an acceleration towards folks trying to figure out how to actually use it in more of a business sense now or before I feel it goes a little bit more skunkworks kind of people dealing with you know in a much smaller situation maybe in the executive offices trying to do some testing and things Scott you're nodding away anything you can add in here yeah so well first off it's great to get that confirmation that the stuff we're seeing in our research Garrett seeing you know out in the field and in the real world but you know as it relates to really the past year it's been really fascinating so one of things we study at ESG is it buying intentions what are things what are initiatives that companies plan to invest in and at the beginning of 2020 we saw a heavy interest in machine learning initiatives then you transition to the middle of 2020 in the midst of COVID some organizations continue on that path but a lot of them had the pivot right how do we get laptops everyone how do we continue business in this new world well now as we enter into 2021 and hopefully we're coming out of this you know the pandemic era we're getting into a world where organizations are pivoting back towards these strategic investments around how do I maximize the usage of data and actually accelerating those because they've seen the importance of digital business initiatives over the past year yeah Matt I mean when we exited 2019 we saw a narrowing of experimentation and our premise was you know that organizations are going to start now operationalizing all their digital transformation experiments and then we had a you know 10 month Petri dish on digital so what are you seeing in this regard 10 month Petri dish is an interesting way to describe it you know we saw another there's another there's another candidate for pivot in there around ransomware as well right you know security entered into the mix which took people's attention away from some of this as well I mean look I'd like to bring this up just a level or two because what we're actually talking about here is progress right and and progress isn't is an inevitability you know whether it's whether whether you believe that it's by 2025 or you or you think it's 2035 or 2050 it doesn't matter we're on a forced march to the eradication of disk and that is happening in many ways you know in many ways due to some of the things that Garrett was referring to and what Scott was referring to in terms of what our customers demands for how they're going to actually leverage data that they have and that brings me to kind of my final point on this which is we see customers in three phases there's the first phase where they say hey I have this large data store and I know there's value in there I don't know how to get to it or I have this large data store and I've started a project to get value out of it and we failed those could be customers that you know marched down the Hadoop path early on and they got some value out of it but they realized that you know HDFS wasn't going to be a modern protocol going forward for any number of reasons you know the first being hey if I have gold.master how do I know that I have gold.four is consistent with my gold.master so data consistency matters and then you have the sort of third group that says I have these large data sets I know how to extract value from them and I'm already on to the verticals the elastics you know the spunks etc. I think those folks are the folks that that ladder group are the folks that kept their their their projects going because they were already extracting value from them the first two groups we we're seeing sort of saying the second half of this year is when we're going to begin really being picking up on these on these types of initiatives again. Well thank you Matt by the way for hitting the escape key because I think value from data really is what this is all about and there are some real blockers there that I kind of want to talk about you mentioned HDFS I mean we were very excited of course in the early days of a dupe many of the concepts were profound but at the end of the day it was too complicated we've got these hyper specialized roles that are that are you know serving the business but it still takes too long it's it's too hard to get get value from data and one of the blockers is infrastructure that the complexity of that infrastructure really needs to be abstracted take an upper level we're starting to see this in cloud where you're seeing some of those abstraction layers being built from some of the cloud vendors but more importantly a lot of the vendors like pure saying hey we can do that heavy lifting for you and we you know we have expertise in engineering to do cloud native so I'm wondering what you guys see maybe Garrett you could start us off and others time in as some of the blockers to getting value from data and and how we're going to address those in the coming decade yeah I mean I think part of it we're solving here obviously with with pure bringing you know flash to a market that traditionally was utilizing much slower media you know the other thing that I that I see that's very nice with flash blade for example is the ability to kind of do things you know once you get it set up a blade at a time I mean a lot of the things that we see from just kind of more of a you know simplistic approach to this like a lot of these teams don't have big budgets and being able to kind of break them down into almost a blade type chunk I think has really kind of allowed folks to get more projects and things off the ground because they don't have to buy a full expensive system to run these projects so that's helped a lot I think the wider use cases have helped a lot so Matt mentioned ransomware you know using safe mode as a place to help with ransomware it has been a really big growth spot for us we've got a lot of customers very interested and excited about that and the other thing that I would say is bringing DevOps into data is another thing that we're seeing so kind of that push towards data ops and really kind of using automation and infrastructure as code is a way to now kind of drive things through the system the way that we've seen with automation through DevOps is really an area we're seeing a ton of growth with from a services perspective. Guys any other thoughts on that I mean we're I'll tee it up there we are seeing some bleeding edge which is somewhat counterintuitive especially from a cost standpoint organizational changes at some some companies think of some of the the the the internet companies that do music for instance and adding podcasts etc and those are different data products we're seeing them actually reorganize their data architectures to make them more distributed and actually put the domain heads the business heads in charge of the the data and the data pipeline and that is maybe less efficient but but it's again some of these bleeding edge what else are you guys seeing out there that might be yes some harbinger of the next decade. I'll go first you know I think specific to the the construct that you throughout Dave one of the things that we're seeing is you know the the application owner maybe it's the DevOps person but it's you know maybe it's it's it's the application owner through the DevOps person they're they're becoming more technical in their understanding of how infrastructure interfaces with their with their application I think you know what what we're seeing on the FlashBlade side is we're having a lot more conversations with application people than just IT people it doesn't mean that the IT people aren't there the IT people are still there for sure they have to deliver the service etc but you know the days of of IT you know building up a catalog of services and a business owner subscribing to one of those services you know picking you know whatever sort of fits their need I don't think that construct I think that's the construct that changes going forward the application owner is becoming much more prescriptive about what they want the infrastructure to fit how they want the infrastructure to fit into their application and that's a big change and for for you know certainly folks like like Garrett and CDW you know they do a good job with this being able to sort of get to the application owner and bring those two sides together is a tremendous amount of value there for us it's been a little bit of a retooling we've traditionally sold to the IT side of the house and you know we've had to teach ourselves how to go talk the language of of applications so you know I think you pointed out a good a good a good construct there and and you know that that application owner tank playing a much bigger role in what they're expecting from the performance of IT infrastructure I think is is a key is a key change interesting I mean that definitely is a trend that's put you guys closer to the business where the the infrastructure team is is serving the business as opposed to sometimes I talk to data experts and they're frustrated especially data owners or or data product builders who are frustrated that they feel like they have to beg beg the data pipeline team to get you know new data sources or get data out how about the edge you know maybe Scott you can kick us off I mean we're seeing you know the emergence of of edge use cases AI inferencing at the edge a lot of data at the edge what are you seeing there and and how does this unified object I'll bring us back to that and file fit wow Dave how much time do we have um two minutes first of all Scott why don't you why don't you just tell everybody what the edge is yeah you got it figured out right how much time do you have the end of the day and that that's that's a great question right is if you take a step back and I think it comes back to Dave something you mentioned it's about extracting value from data and what that means is when you extract value from data what it does is as Matt pointed out the the influencers or the users of data the application owners they have more power because they're driving revenue now and so what that means is from an it standpoint it's not just hey here the services you get use them or lose them or you know don't throw a fit it is no I have to I have to adapt I have to follow what my application owners mean now when you bring that back to the edge what it means is is that data is not localized to the data center I mean we just went through a nearly 12 month period where the entire workforce for most of the companies in this country had went distributed and business continued so if business is distributed data is distributed and that means that means in the data center that means at the edge that means at the cloud and that means in all other places in tons of places and what it also means is you have to be able to extract and utilize data anywhere it may be and I think that's something that we're going to continue to and continue to see and I think it comes back to you know if you think about key characteristics we've talked about things like performance and scale for years but we need to start rethinking it because on one hand we need to get performance everywhere but also in terms of scale and this ties back to some of the other initiatives and getting value from data it's something I call the the massive success problem one of the things we see especially with with workloads like machine learning is businesses find success with them and as soon as they do they say well I need about 20 of these projects now well all of a sudden that overburdens IT organizations especially across across core and edge and cloud environments and so when you look at environments ability to meet performance and scale demands wherever it needs to be is something that's really important you know so Dave I'd like to just sort of tie together sort of two things that I think that I heard from Scott and Garrett that I think are important and it's around this concept of scale you know some of us are old enough to remember the day when kind of a 10 terabyte blast radius was too big of a blast radius for people to take on or a terabyte of storage was considered to be you know an exemplary budget environment right now we sort of think as terabytes kind of like we used to think of as gigabytes in some ways petabyte like you don't have to explain to anybody what a petabyte is anymore and you know what's on the horizon and it's not far our our exabyte type dataset workloads and you start to think about what could be in that exabyte of data we've talked about how you extract that value we've talked about sort of how you start but if the scale is big not everybody's going to start at a petabyte or an exabyte to Garrett's point the ability to start small and grow into these products or excuse me these projects I think is a really fundamental concept here because you're not going to just go by five I'm going to go kick off a five petabyte project whether you do that on disk or flash it's going to be expensive right but if you could start at a couple hundred terabytes not just as a proof of concept but as something that you know you could get predictable value out of that then you could say hey this either scales linearly or non-linearly in a way that I can then go map my investments to how I can go dig deeper into this that's how all of these things are going to that's how these successful projects are going to start because the people that are starting with these very large you know sort of expansive you know greenfield projects at multi petabyte scale it's going to be hard to realize near-term value. Excellent we're we got a wrap but but Garrett I wonder if you could close when you look forward you talk to customers do you see this unification of file and object is it is this an evolutionary trend is it something that is that is that is that is going to be a lever that customers use how do you see it evolving over the next two three years and beyond? Yeah I mean I think from our perspective I mean just from what we're seeing from the numbers within the market the the amount of growth that's happening with unstructured data is really just starting to finally really kind of hit this data deluge or whatever you want to call it that we've been talking about for so many years it really does seem to now be becoming true as we start to see things scale out and really folks settle into okay I'm going to use the cloud to start and maybe train my models but now I'm going to get it back on prem because of latency or security or whatever the the decision points are there this is something that is not going to slow down and and I think you know folks like pure having the ability to have the tools that they give us to use and bring to market with our customers are are really key and critical for us so I see it as a huge growth area and a big focus for us moving forward. Guys great job unpacking a topic that you know it's covered a little bit but I think we we covered some ground that is that is new and so thank you so much for those insights and that data really appreciate your time. Thanks Dave. Thanks. Yeah thanks Dave. Okay and thank you for watching the convergence of file and object keep it right there right back after this short break. Innovation. Impact. Influence. Welcome to the CUBE disruptors developers and practitioners learn from the voices of leaders who share their personal insights from the hottest digital events around the globe. Enjoy the best this community has to offer on the CUBE your global leader in high-tech digital coverage. Now we're going to get the customer perspective on object and we'll talk about the convergence of file and object but really focusing on the object piece this is a content program that's being made possible by Pure Storage and it's co-created with the CUBE. Christopher C.B. Bond is here he's a lead architect for Micro Focus the Enterprise Data Warehouse and Principal Data Engineer at Micro Focus. C.B. welcome good to see you. Thanks Dave good to be here. So tell us more about your role at Micro Focus it's a pan Micro Focus role of course we know the company is a you know multinational software firm it acquired the software assets of HP of course including Vertica tell us where you fit. Yeah so Micro Focus is you know it's like he says wide worldwide company that sells a lot of software products all over the place to governments and so forth and it also grows often by acquiring other companies so there is the problem of of integrating new companies and their data and so what's happened over the years is that they've had a number of different discrete data systems so you've got this data spread all over the place and they've never been able to get a full complete introspection on the entire business because of that so my role was to come in design a central data repository an enterprise data warehouse that all reporting could be generated against and so that's what we're doing and we selected Vertica as the EDW system and pure storage flash blade as the communal repository. Okay so you obviously had experience with with Vertica in your in your previous role so it's not like you were starting from scratch but paint a picture of what life was like before you embarked on this sort of consolidated approach to your your data warehouse what was it just disparate data all over the place a lot of M&A going on where did the data live? Right so again the data was all over the place including under people's desks in just dedicated you know their their own private SQL servers a lot of data in Microfocus is is run on SQL server which has pros and cons because that's a great transactional database but it's not really good for analytics in my opinion so but a lot of stuff was running on that they had one Vertica instance that was doing some select reporting wasn't a very powerful system and it was what they call Vertica Enterprise but we had dedicated nodes which had the compute and storage in the same locus on each server okay so Vertica Eon mode is a whole new world because it separates compute from storage. You mentioned Eon mode and the ability to to to scale storage and compute independently. We wanted to have the analytics OLAP stuff close to the OLTP stuff right so that's why they're co-located very close to each other and so we could what's nice about this situation is that these S3 objects it's an S3 object store on the pure FlashBlade we could copy those over if we needed to AWS and we could spin up a version of Vertica there and keep going it's it's like a tertiary DR strategy because we actually have a we're setting up a second FlashBlade Vertica system geo-located elsewhere for backup and we can get into it if you want to talk about how the latest version of the pure software for the FlashBlade allows synchronization across network boundaries of those FlashBlades which is really nice because if you know there's a giant sinkhole opens up under our colo facility and we lose that thing then we just have to switch the DNS and we were back in business off the DR and then if that one was to go we could copy those objects over to AWS and be up and running there so we're feeling pretty confident about being able to weather whatever comes along so you're using the the pure FlashBlade as an object store most people think oh object simple but slow not the case for you was that right not the case at all why is that it's ripping well you have to understand about Vertica in the way it stores data it stores data in what they call storage containers and those are immutable okay on disk whether it's on AWS or if you have enterprise mode Vertica if you do an update or delete it actually has to go and retrieve that object container from disk and it destroys it and rebuilds it okay which is why you don't you want to avoid updates and deletes with Vertica because the way it gets its speed is by sorting and ordering and encoding the data on disk so it can read it really fast but if you do an operation where you're deleting or updating a record in the middle of that then you've got to rebuild that entire thing so that actually matches up really well with S3 object storage because it's kind of the same way it gets destroyed and rebuilt too okay so that matches up very well with Vertica and we were able to design the system so that it's a pand only now we had some reports that were running in SQL server okay which we're taking seven days so we moved that to to Vertica from SQL server and we rewrote the queries which were which had been written in t-sql with a bunch of loops and so forth and we were to get this is amazing it went from seven days to two seconds to generate this report which has tremendous value to the company because it would have to have this long cycle of seven days to get a new introspection into what they call their knowledge base and now all of a sudden it's almost on demand two seconds to generate it that's great and that's because of the way the data is stored and the S3 you asked about oh you know is it slow well not in that context because what happens really with Vertica eon mode is that it can they have when you set up your compute nodes they have local storage also which is called the depot it's kind of a cache okay so the data will be drawn from the flash blade and cached locally and that was it was thought when they designed that oh you know it's that'll cut down on the latency okay but it turns out that if you have your compute nodes close meaning minimal hops to the flash blade that you can actually tell Vertica you know don't even bother caching that stuff just read it directly on the fly from the from the flash blade and the performance is still really good it depends on your situation but i know for example a major telecom company that uses the same topologies we're talking about here they did the same thing they just they just dropped the cache because the flash blade was able to deliver the data fast enough so that's you're talking about that's speed of light issues and just the overhead of of of switching infrastructure is that that gets eliminated and so as a result you can go directly to the storage array that's correct yeah it's it's like it's fast enough that it's it's almost as if it's local to the compute node uh but every situation is different depending on your uh your needs if you've got like a few tables that are heavily used uh then yeah put them um put them in the cache because that'll be probably a little bit faster but if you have a lot of ad hoc queries that are going on you know that you may exceed the storage of the local cache and then you're better off having it uh just read directly from the from the flash blade look at pure as a fit i mean i sound like a fanboy but pure is all about simplicity so is object so that means you don't have to you know worry about wrangling storage and worrying about lungs and all that other you know nonsense and i've been burned by hardware in the past you know where oh okay they're built into a price and so they cheap out on stuff like fans or other things in these these components fail and the whole thing goes down but this hardware is super super good quality and uh so i'm i'm happy with the quality that we're getting so cb last question what's next for you where do you want to take this uh this this initiative well we are in the process now of we um when so i i designed the system to combine the best of the kimball approach to data warehousing and the inland approach okay and what we do is we bring over all the data we've got and we put it into a pristine staging layer okay like i said it's uh because it's a panda only it's essentially a log of all the transactions that are happening in this company just they appear okay and then from the the kimball side of things we're designing the datamarts now so that uh that's what the end users actually interact with and so we're we're taking um the we're examining the transactional systems to say how are these business objects created what's what's the logic there and we're recreating those logical models in uh in vertica so we've done a handful of them so far and it's working out really well so going forward we've got a lot of work to do to uh create just about every object that uh that the company needs cb you're an awesome guest they're really always a pleasure talking to you and thank you congratulations and and good luck going forward stay safe thank you today okay let's summarize the convergence of file and object first i want to thank our guest matt burr scott sinclair garrett belsner and cb bond i'm your host davilante and please allow me to briefly share some of the key takeaways from today's program so first as scott sinclair of esg stated surprise surprise data is growing and matt burr he helped us understand the growth of unstructured data i mean estimates indicate that the vast majority of data will be considered unstructured by mid decade 80 or so and obviously unstructured data is growing very very rapidly now of course your definition of unstructured data and that may vary across across a wide spectrum i mean there's video there's audio there's documents there's spreadsheets there's chat i mean these are generally considered unstructured data but of course they all have some type of structure to them you know perhaps it's not as strict as a relational database but there's certainly metadata and certain structure to these types of use cases that i just mentioned now the key to what pure is promoting is this idea of unified fast file and object u f f o look object is great it's inexpensive it's simple but historically it's been less performant so good for archiving or cheap and deep types of examples organizations often use file for higher performance workloads and it's face at most of the world's data lives in file formats what pure is doing is bringing together file and object by for example supporting multiple protocols i.e. nfs smb and s3 s3 of course is really given new life to object over the past decade now the key here is to essentially enable customers to have the best of both worlds not having to trade off performance for object simplicity and a key discussion point that we've had on the program has been the impact of flash on the long slow death of spinning disk look hard disk drives they had a great run but hdv volumes they peaked in 2010 and flash as you well know has seen tremendous volume growth thanks to the consumption of flash in mobile devices and then of course its application into the enterprise and as volume is just going to keep growing and growing and growing the price declines of flash are coming down faster than those of hdv so it's the writings on the wall it's just a matter of time so flash is riding down that cost curve very very aggressively and hdv has essentially become you know a managed decline business now by bringing flash to object as part of the flash blade portfolio and allowing for multiple protocols pure hopes to eliminate the dissonance between file and object and simplify the choice in other words let the workload decide if you have data in a file format no problem pure can still bring the benefits of simplicity of object at scale to the table so again let the workload inform what the right strategy is not the technical infrastructure now pure of course is not alone there are others supporting this multi-protocol strategy and so we asked matt burr why pure what's so special about you and not surprisingly in addition to the product innovation he went right to pure's business model advantages i mean for example with its evergreen support model which was very disruptive in the marketplace you know frankly pure's entire business disrupted the traditional disk array model which was fundamentally was flawed pure forced the industry to respond and when it achieved escape velocity velocity and pure went public the entire industry had to react and a big part of the pure value prop in addition to this business model innovation that we just discussed is simplicity pure's keep it simple approach coincided perfectly with the ascendancy of cloud where technology organizations needed cloud-like simplicity for certain workloads that we're never going to move into the cloud they're going to stay on-prem now i'm going to come back to this but allow me to bring in another concept that garret and cb really highlighted and that is the complexity of the data pipeline and what i mean what do i mean by that and why is this important so scott sinclair articulated to imply that the big challenge is organizations they're data full but insights are scare scarce a lot of data not as much insights and takes time too much time to get to those insights so we heard from our guests that the complexity of the data pipeline was a barrier to getting to faster insights now cb bond shared how he streamlined his data architecture using vertica's eon mode which allowed him to scale compute independently of storage so that brought critical flexibility and improved economics at scale and flashlight of course was the back end storage for his data warehouse efforts now the reason i think this is so important is that organizations are struggling to get insights from data and the complexity associated with the data pipeline and data life cycles let's face it it's overwhelming organizations and there the answer to this problem is a much longer and different discussion than unifying object and file that's you know i can spend all day talking about that but let's focus narrowly on the part of the issue that is related to file and object so the situation here is the technology has not been serving the business the way it should rather the formula is twisted in the world of data and big data and data architectures the data team is mired in complex technical issues that impact the time to insights now part of the answer is to abstract the underlying infrastructure complexity and create a layer with which the business can interact that accelerates instead of impedes innovation and unifying file and object is a simple example of this where the business team is not blocked by infrastructure nuance like does this data reside in the file or object format can i get to it quickly and inexpensively in a logical way or is the infrastructure in a stovepipe and and blocking me so if you think about the prevailing sentiment of how the cloud is evolving to incorporate on-premises workloads that are hybrid and configurations that are working across clouds and now out to the edge this idea of an abstraction layer that essentially hides the underlying infrastructure is a trend we're going to see evolve this decade now is uffo the be all end all answer to solving all of our data pipeline challenges no no of course not but by bringing the simplicity and economics of object together with the ubiquity and performance of file uffo makes it a lot easier it simplifies life organizations that are evolving into digital businesses which by the way is every business so we see this as an evolutionary trend that further simplifies the underlying technology infrastructure and does a better job supporting the data flows for organizations so they don't have to spend so much time worrying about the technology details that add a little value to the business okay so thanks for watching the convergence of file and object and thanks to pure storage for making this program possible this is Dave Vellante for the cube we'll see you next time