 From around the globe, it's theCUBE with digital coverage of Data Automated, an event series brought to you by IOTAHO. Welcome everyone to the second episode in our Data Automated series, made possible with support from IOTAHO. Today, we're going to drill into the data lifecycle, meaning the sequence of stages that data travels through from creation to consumption to archive. The problem, as we discussed in our last episode, is that data pipelines are complicated, they're cumbersome, they're disjointed, and they involve highly manual processes. A smart data lifecycle uses automation and metadata to improve agility, performance, data quality, and governance, and ultimately reduce costs and time to outcomes. Now in today's session, we'll define the data lifecycle in detail and provide perspectives on what makes a data lifecycle smart, and importantly, how to build smarts into your processes. In a moment, we'll be back with Adam Worthington from Ethos to kick things off, and then we'll go into an export power panel to dig into the tech behind smart data life cycles. And then we'll hop into the crowd chat and give you a chance to ask questions. So stay right there, you're watching theCUBE. Innovation, impact, influence. Welcome to theCUBE, disruptors, developers, and practitioners. Learn from the voices of leaders who share their personal insights from the hottest digital events around the globe. Enjoy the best this community has to offer on theCUBE, your global leader in high tech digital coverage. Okay, we're back with Adam Worthington. Adam, good to see you. How are things across the pond? Good, thank you. I'm sure our weather's a little bit worse than yours is over the other side, but good. Hey, so let's set it up. Tell us about yourself, what your role is as CTO. Yeah, Adam Worthington, as you said, CTO and co-founder of Ethos. So we're a pretty young company ourselves. So we're in our sixth year. And we specialize in emerging disruptive technologies. So within the infrastructure data center kind of cloud space. And my role is the technical lead. So it's kind of my job to be an expert in all of the technologies that we work with, which can be a bit of a challenge if you have a huge portfolio. One of the reasons we've got a deliberately focused one. And also kind of key to the technical validation and evaluation of new technologies. So you guys are really technology experts, data experts, and probably also expert in process and delivering customer outcomes, right? That's a great word there, Dave, outcomes. I mean, that's a lot of what I like to speak to customers about. Let's talk about smart data. You know, when you throw out terms like this, it kind of can feel buzzwordy. But what are the critical aspects of so-called smart data? Cool. Well, it will probably be helped to step back a little bit and set the scene a little bit more in terms of kind of where I came from. So, and the types of problems I saw around the field. So I'm really an infrastructure or solution architect by trade. And what I kind of relatively organically, but over time, my personal framework and approach, I focused on three core design principles. So simplicity, flexibility, and efficiency. So whatever it was I was designing and obviously they need different things depending on what the technology area is that we're working with. But that's been a pretty good step. So they're the kind of areas that a smart approach to data will directly address. So reducing silos, so that comes from simplifying. So moving away from complexity of infrastructure, reducing the amount of copies of data that we have across the infrastructure and reducing the amount of application environments that we need for different areas. So the smarter we get with data, it's in my eyes anyway, the further we move away from those traditional legacies. But how does it work? I mean, what's involved in injecting smarts into your data life cycle? I think one of my, actually I didn't have this quite ready but genuinely one of my favorite quotes is from the French philosopher and mathematician Blaise Pascal. He says, if I get this right, I'd have written you a shorter letter but I didn't have the time. So there's real, I love that quote for lots of reasons. So direct application in terms of what we're talking about. In terms of, it is actually really complicated to develop a technology capability to make things simple, to more directly meet the needs of the business group tech to provide self-service capability. And I don't just mean self-driving, I mean making data and infrastructure makes sense to the business users that are using it. Your job, correct me if I'm wrong, is to kind of put that all together in a solution and then help the customer realize what we talked about earlier, that business out. Yeah, and that's, it's fitting that both sides and understanding both sides. So kind of key to us and our ability to be able to deliver on exactly what you just said is being experts in the capabilities and newer and better ways of doing things but also having the kind of the business understanding to be able to ask the right questions to identify how newer, better approaches could help solve these issues. But another area that I'd really like is with their platforms, you can do more with less. And that's not just about reducing data redundancy, that's about creating application environments that can service, then the infrastructure to service different requirements that are able to do the random IO thing without getting to kind of low level tech as well as the sequential. So what that means is that you don't necessarily have to move data from application environment A do one thing, manipulate it and then move it to application environment B to application environment three in terms of an analytics kind of left to right workflow and keep the data where it is, use it for different requirements within the infrastructure. And again, do more with less. And what that does, it's not just about simplicity and efficiency, it significantly reduces the time to value of that base as well. Do you have examples that you can share with us even if they're anonymized, the customers that you've worked with that are maybe a little further down on the journey or maybe not. Looking at the, you mentioned data protection earlier. So another organization, this is a project which is just kind of nearing completion at the moment, huge organization that literally petabytes of data that was servicing their backup and archive. And what they had is not just this reams of data, they had, I think I'm right in saying, five different backup applications that they had dependent on the, what area of infrastructure they were backing up. So whether it was virtualization, that was different to, they were backing up PD-2s, different, backing up another database environment, they were using something else in the cloud. So a consolidated approach that we recommended to work with them on, they were able to significantly reduce complexity and reduce the amount of time that it took them, what they were able to achieve. And this is again, one of the key requirements they have, they've gone up above the threshold of being able to back up all of them. Adam, give us the final thoughts, bring us home in this segment. Well, the final thoughts and it's something that we didn't particularly touch on, but I think it's kind of slightly hidden and isn't spoken about as much as I think it could be, is that traditional approaches to infrastructure, we've already touched on that they could be complicated and there's lack of efficiency, it impacts a user's ability to be agile. But what you find with traditional approaches and you've already touched on some of the kind of benefits of your approaches there, is that they're often very prescriptive, they're designed for a particular purpose, the infrastructure environment, the way that it's served up to the users in a kind of a packaged kind of way, means that they need to use it in whatever way has been dictated. So that kind of self-service aspect comes in from a flexibility standpoint. So these platforms and this platform approach, which is the right way to address technology in my eyes, enables the infrastructure to be used flexibly. So the business users and the data users, what we find is if you put in this capability into their hands, they start innovating in the way that they use that data and the way that they bring benefits to the business. And if a platform is too prescriptive and they aren't able to do that, so what you're doing with these new approaches is get all of the metrics that we've touched on, it's fantastic from a cost standpoint, from a utility standpoint. But what it means is that the innovators in the business, the ones that really understand what they're looking to achieve, they now have the tools to innovate without this. And I think, and I've started to see that with projects that we've completed, if you do it in the right way, if you articulate the capability and you empower the business users in the right way, then they're in a significantly better position these businesses to take advantage of this and really sort of match and significantly beat up from their competition in whatever space it is. Super Adam, I mean, nearly exciting space. I mean, we spent the last 10 years gathering all this data, trying to slog through it and figure it out. And now with the tools that we have and the automation capabilities, it really is a new era of innovation and insight. So Adam Worthington, thanks so much for coming in theCUBE and participating in this program. Excellent times and thank you very much today for inviting me and yeah, it's been a pleasure. Now we're going to go into the power panel and go deeper into the technologies that enable smart data life cycles. Stay right there, you're watching theCUBE. Are you interested in test driving the IoTaho platform? Kickstart the benefits of data automation for your business through the IoLabs program. A flexible, scalable sandbox environment on the cloud of your choice with setup, service, and support provided by IoTaho. Click on the link and connect with a data engineer to learn more and see IoTaho in action. Welcome back everybody to the power panel, driving business performance with smart data life cycles. Lester Waters is here, he's the chief technology officer from IoTaho, he's joined by Patrick Smith, who was field CTO from Pure Storage and Izat Daya, who's a system engineering manager at Cohesity. Gentlemen, good to see you, thanks so much for coming on this panel. Thank you, Daniel. What are you doing? Let's start with Lester. I wonder if each of you could just give us a quick overview of your role and what's the number one problem that you're focused on solving for your customers? Let's start with Lester. Yes, I'm Lester Waters, Chief Technology Officer for IoTaho. And really the number one problem that we are trying to solve for our customers is to help them understand what they have. Because if they don't understand what they have in terms of their data, they can't manage it, they can't control it, they can't monitor it, they can't ensure compliance. So really that's finding all that you can about your data that you have and building a catalog that can be readily consumed by the entire business is what we do. Patrick, field CTO in your title, that says to me, you're talking to customers all the time so you've got a good perspective on it. Give us your take on things here. Yeah, absolutely. So my patch is in here and talk to customers and prospects in lots of different verticals across the region. And as they look at their environments and their data landscape, they're faced with massive growth in the data that they're trying to analyze and demands to be able to get insight past that and to deliver business value faster than they've ever had to do in the past. So. Got it. And then is that a course of cohesity? You're like the new kid in the block. You guys are really growing rapidly, created this whole notion of data management back up and beyond. But from assistant engineering manager, what are you seeing from customers, your role and the number one problem that you're solving? Yeah, sure. So the number one problem I see, time and again speaking with customers, it's all around data fragmentation. So due to things like organic growth, even maybe budgetary limitations, infrastructure has grown over time, great piecemeal and it's highly distributed internally. And just to be clear, when I say internally, that could be that it's on multiple platforms or silos within an on-prem infrastructure, but that it also does extend to the cloud as well. Right, hey, cloud is cool. Everybody wants to be in the cloud, right? So you're right. It creates maybe unintended consequences. So let's start with the business outcome and kind of try to work backwards. I mean, people, they want to get more insights from data. They want to have a more efficient data life cycle. But Celeste, let me start with you. We're thinking about like the North Star to creating data-driven cultures. What is the North Star for customers here? I think the North Star in a nutshell is driving value from your data. Without question, I mean, we differentiate ourselves these days by even nuances in our data. Now, underpinning that, there's a lot of things that have to happen to make that work out well. For example, making sure you adequately protect your data. Do you have a good storage subsystem? Do you have a good backup and recovery point objectives, recovery time objectives? Are you fully compliant? Are you ensuring that you're taking all the boxes? There's a lot of regulations these days in terms with respect to compliance, data retention, data privacy and so forth. Are you taking those boxes? Are you being efficient with your data? In other words, I think there's a statistic that someone mentioned to me the other day that 53% of all businesses have between three and 15 copies of the same data. So, finding and eliminating those is part of the problems you need to chase. I like to think of, you're right, Lester, no doubt business value. And a lot of that comes from reducing the end cycle times, but anything that you guys would add to that, Patrick and Eza, maybe start with Patrick. Yeah, I think getting value from data really hits on what everyone wants to achieve. But I think there are a couple of key steps in doing that. First of all, is getting access to the data. And that really hits three big problems. Firstly, working out what you've got. Secondly, after working out what you've got, how to get access to it. Because it's all very well-knowing that you've got some data, but if you can't get access to it either because of privacy reasons, security reasons, then that's a big challenge. And then finally, once you've got access to the data, making sure that you can process that data in a timely manner. For me, it would be that an organization has got a really good global view of all of its data. It understands the data flow and dependencies within their infrastructure, understands the precise legal and compliance requirements, and has the ability to action changes or initiatives within their environment to give the pun, but with a cloud-like agility. And that's no easy feat, right? That is hard work. Okay, so we've talked about the challenges and some of the objectives, but there's a lot of blockers out there, and I want to understand how you guys are helping remove them. So Lester, what do you see as some of the big blockers in terms of people really leaning in to this smart data lifecycle? Yeah, Silas is probably one of the biggest one I see in businesses. Yes, it's my data, not your data. Lots of compartmentalization and breaking that down is one of the challenges. And having the right tools to help you do that is only part of the solution. There's obviously a lot of cultural things that need to take place to break down those silos and work together. If you can identify where you have redundant data across your enterprise, you might be able to consolidate those. Yeah, so I want to go to Patrick. So one of the blockers that I see is legacy infrastructure, technical debt, sucking all the budget. You've got too many people having to look after. As you look at the infrastructure that supports people's data landscapes today, for primarily legacy reasons, the infrastructure itself is siloed. So you have different technologies with different underlying hardware, different management methodologies that are there for good reason because historically you had to have specific fitness for purpose for different data requirements. And that's one of the challenges that we tackled head on at Pure with the FlashBlade technology and the concept of the data hub, a platform that can deliver in different characteristics for the different workloads but from a consistent data platform. Now, is that, I want to go to you because in the world, in your world, which to me goes beyond backup. I mean, one of the challenges is, they say backup is one thing, recovery is everything. But as well, the CFO doesn't want to pay for just protection. And one of the things that I like about what you guys have done is you've broadened the perspective to get more value out of what was once seen as an insurance policy. I do see one of the biggest blockers as the fact that the task at hand can be overwhelming for customers. But the key here is to remember that it's not an overnight change. It's not a flick of a switch. It's something that can be tackled in a very piecemeal manner. And absolutely, like you said, reduction in TCO and being able to leverage the data for other purposes is a key driver for this. So this can be resolved. It can be very pretty straightforward. It can be quite painless as well. Same goes for unstructured data which is very complex to manage. We've all heard the stats from the analysts. Data obviously is growing at an extremely rapid rate but actually when you look at that, how is it actually growing? 80% of that growth is actually in unstructured data and only 20% of that growth is in structured data. So these are quick-win areas that the customers can realize, immediate TCO improvement and increased agility as well. Let's paint a picture of this. Guys, if you could bring up the life cycle, what you can see here is you've got this cycle, the data life cycle. And what we're wanting to do is inject intelligence or smarts into this life cycle. So you can see, you start with ingestion or creation of data, you're storing it. You got to put it somewhere, right? You got to classify it. You got to protect it. And then of course you want to reduce the copies, make it efficient and then you want to prepare it so that businesses can actually consume it and then you've got clients and governance and privacy issues. And I wonder if we could start with you, Lester. This is the picture of the life cycle. What role does automation play in terms of injecting smarts into the life cycle? Automation is key here, especially from the Discover catalog and classify perspective. I've seen companies where they go and we'll take and dump there all of their database scheme is into a spreadsheet so that they can sit down and manually figure out what attribute 37 means for a column name. And that's only the tip of the iceberg. So being able to do automatically detect what you have, automatically deduce what's consuming the data, upstream and downstream, being able to understand all of the things related to the life cycle of your data, backup, archive, deletion is key. And so having good tools there is very important. So Patrick, obviously you participate in the store piece of this picture. So I wonder if you could just talk more specifically about that, but I'm also interested in how you affect the whole system view, the end to end cycle time. Yeah, I think Lester kind of hit the nail on the head in terms of the importance of automation because the data volumes are just so massive now that you can't effectively manage or understand or catalog your data without automation. Once you understand the data and the value of the data, then that's where you can work out where the data needs to be at any point in time. Right, so pure and cohesive obviously partner to do that. And of course, you guys are part of the protect, you're certainly part of the retain, but also you provide data management capabilities and analytics. I'm wondering if you could add some color there. Yeah, absolutely. So like you said, we focus pretty heavily on data protection as just one of our areas and that infrastructure, it is just sitting there really, the legacy infrastructure, it's just sitting there consuming power space, cooling and pretty inefficient. And automating that process is a key part of that. If I have a modern day platform, such as the Co-ECD data platform, I can actually do a lot of analytics on that through applications. So we have a marketplace for apps. I wonder if we could talk about metadata. It's increasingly important, metadata is data about the data, but Lester, maybe explain why it's so important and what role it plays in terms of creating smart data lifecycle. A lot of people think it's just about the data itself, but there's a lot of extended characteristics about your data. So imagine if for my data lifecycle, I can communicate with the backup system from Cohesity and find out when the last time that data was backed up or where it's backed up to. I can communicate, exchange data with pure storage and find out what tier is on. Is the data at the right tier commensurate with its use level that we pointed out and being able to share that metadata across systems. I think that's the direction that we're going in. Right now we're at the stage we're just identifying the metadata and trying to bring it together and catalog it. The next stage will be okay using the APIs that we have between our systems. Can we communicate and share that data and build good solutions for customers to use? I think it's a huge point that you just made. I mean, 10 years ago, automating classification was the big problem. And with machine intelligence, we're obviously attacking that, but your point about as machines start communicating to each other. And you start, you know, it's cloud to cloud. There's all kinds of metadata, kind of new metadata that's being created. I often joke that someday there's going to be more metadata than data. So that brings us to cloud. And Izette, I'd like to start with you. You know, I do think, you know, having the cloud is a great thing and it has got its role to play. And you can have many different, you know, permutations and iterations of how you use it. And, you know, as I may have sort of mentioned previously, you know, I've seen customers go into the cloud very, very quickly. And actually recently they're starting to remove workloads from the cloud. And the reason why this happens is that, you know, cloud has got its role to play, but it's not right for absolutely everything, especially in their current form as well. A good analogy I like to use, this may sound a little bit cliched, but you know, when you compare clouds versus on-premises data centers, you can use the analogy of houses and hotels. So to give you an idea, so, you know, when we look at hotels, that's like the equivalent of a cloud, right? I can get everything I need from there. I can get my food, my water, my outdoor facilities. If I need to accommodate more people, I can rent some more rooms. I don't have to maintain the hotel. It's all done for me. When you look at houses, the equivalent to, you know, on-premises infrastructure, I pretty much have to do everything myself, right? So I have to purchase the house. I have to maintain it. I have to buy my own food and water, eat it. I have to make improvements myself. But then why do we all live in houses, not in hotels? And the simple answer that I can only think of is that it's cheaper, right? It's cheaper to do it myself. But that's not to say that hotels haven't got their role to play, you know? So for example, if I've got loads of visitors coming over for the weekend, I'm not gonna go and build an extension to my house just for them. I will burst into my hotel, into the cloud and use it for, you know, for things like that. So what I'm really saying is the cloud is great for many things, but it can work out costlier for certain applications while others are a perfect fit. It's an interesting analogy. I hadn't thought of that before, but you're right. Cause I was going to say, well, part of it is you want the cloud experience everywhere, but you don't always want the cloud experience, especially, you know, when you're with your family, you want certain privacy. I've not heard that before he's out. So that's the new perspective. So thank you. But Patrick, I do want to come back to that cloud experience because in fact, that's what's happening in a lot of cases, organizations are extending the cloud properties of automation on-prem. Yeah, I thought, as I brought up a really interesting point and a great analogy for the use of the public cloud and it really reinforces the importance of the hybrid and multi-cloud environment because it gives you that flexibility to choose where is the optimal environment to run your business workloads and that's what it's all about. And the flexibility to change which environment you're running in either from one month to the next or from one year to the next because workloads change and the characteristics that are available in the cloud change. The hybrid cloud is something that we've lived with ourselves at pure. So our pure management technology actually sits in a hybrid cloud environment. We started off entirely cloud native but now we use public cloud for compute and we use our own technology at the end of our high performance network link to support our data platform. So we get the best of both worlds. And I think that's where a lot of our customers are trying to get to. All right, I want to come back in a moment there but before we do, Lester, I wonder if we could talk a little bit about compliance and governance and privacy. I think the Brits on this panel they're still in the EU for now but the EU are looking at new rules, new regulations going beyond GDPR. Where does sort of privacy, governance, compliance fit in to the data lifecycle? Is there anything that I want your thoughts on this as well? Yeah, this is a very important point because the landscape for compliance around data privacy and data retention is changing very rapidly. And being able to keep up with those changing regulations in an automated fashion is the only way you're going to be able to do it. Even I think there's some sort of maybe ruling coming out today or tomorrow with the change to GDPR. So this is, these are all very key points and being able to codify those rules into some software, whether, you know, Iotaho or your storage system or Cohesity that will help you be compliant is crucial. Yeah, is that anything you can add there? I mean, this really is your wheelhouse. Yeah, absolutely. So, you know, I think anybody who's watching this probably has gotten the message that, you know less silos is better. And then absolutely, it also applies to data in the cloud as well. So, you know, by aiming to consolidate into, you know fewer platforms, customers can realize a lot better control over their data. And then natural effect of this is that it makes meeting compliance and governance a lot easier. So when it's consolidated, you can start to confidently understand who's accessing your data, how frequently are they accessing the data? You can also do things like, you know detecting anomalous file access activities and quickly identify potential threats. Okay, Patrick, we were talking, you talked earlier about storage optimization. We talked to Adam Worthington about the business case. You get the sort of numerator, which is the business value and then the denominator, which is the cost. And what's unique about Pure in this regard? Yeah, and I think there are multiple dimensions to that. Firstly, you know, if you look at the difference between legacy storage platforms that used to take up racks or aisles of space in a data center, with flash technology that underpins flash blade, we effectively switch out racks for rack units. And it has a big play in terms of data center footprint and the environment as associated with the data center. If you look at extending out storage efficiencies and the benefits it brings, just the performance has a direct effect on staff. Whether that's, you know, the staff and the simplicity of the platform so that it's easy and efficient to manage, or whether it's the efficiency you get from your data scientists who are using the outcomes from the platform and making them more efficient. And if you look at some of our customers in the financial space, their time to results are improved by 10 or 20X by switching to our technology from legacy technologies for their analytics platforms. So guys, we've been running, you know, CUBE interviews in our studios remotely for the last 120 days. This is probably the first interview I've done where I haven't started off talking about COVID. Lester, I wonder if you could talk about smart data lifecycle and how it fits into this isolation economy and hopefully what will soon be a post-isolation economy. You know, COVID has dramatically accelerated the data economy. I think, you know, first and foremost, we've all learned to work at home. I, you know, we've all had that experience where, you know, there were people at Hem and Haw about being able to work at home just a couple of days a week. And here we are working five days a week. That's how to knock on impact to infrastructure to be able to support that. But going further than that, you know, the data economy is all about how a business can leverage their data to compete in this new world order that we are now in. COVID has really been a forcing function to, you know, it's probably one of the few good things that have come out of COVID is that we have been forced to adapt and it's been an interesting journey and it continues to be so. Like Lester said, you know, we've, we're seeing a huge impact here, you know, working from home has pretty much become the norm now. You know, companies have been forced into basically making it work. If you look at online retail, that's accelerated dramatically as well. Unified communications and video conferencing. So really, you know, the point here is that, yes, absolutely, we're, you know, we've compressed, you know, in the past maybe four months, what probably would have taken maybe even five years, maybe 10 years or so. We got a wrap, but so Lester, let me ask you to sort of paint a picture of the sort of journey of the maturity model that people have to take. You know, if they want to get into it, where do they start and where are they going? Give us that view. I think first is knowing what you have. If you don't know what you have, you can't manage it, you can't control it, you can't secure it, you can't ensure it's compliant. So that's first and foremost. The second is really, you know, ensuring that you're compliant. Once you know what you have, are you securing it? Are you following the regulatory, the applicable regulations? Are you able to evidence that? How are you storing your data? Are you archiving it? Are you storing it effectively and efficiently? You know, Nirvana, from my perspective, is really getting to a point where you've consolidated your data, you've broken down the silos and you have a virtually self-service environment by which the business can consume and build upon their data. And really, at the end of the day, as we said at the beginning, it's all about driving value out of your data. And the automation is key to this journey. That's awesome. And you just described sort of a winning data culture. Lester, Patrick, he's out. Thanks so much for participating in this power panel. Thank you, David. All right, so great overview of the steps in the data lifecycle and how to inject smarts into the processes, really to drive business outcomes. Now it's your turn. Hop into the crowd chat, please log in with Twitter or LinkedIn or Facebook, ask questions, answer questions and engage with the community. Let's crowd chat.