 Data by its very nature is distributed and siloed, but most data architectures today are highly centralized. Organizations are increasingly challenged to organize and manage data and turn that data into insights. This idea of a single monolithic platform for data, it's giving way to new thinking where a decentralized approach with open cloud native principles and federated governance will become an underpinning of digital transformations. Hi everybody, this is Dave Vellante. Welcome back to HPE Discover 2021, the virtual version. You're watching theCUBE's continuous coverage of the event and we're here with Matt McCohes, the field CTO for Esmeral software at HPE. And we're going to talk about HPE's software strategy and Esmeral and specifically how to take AI analytics to scale and ensure the productivity of data teams. Welcome to theCUBE, good to see you. Good to see you again, Dave. Thanks for having me today. You're welcome. So talk a little bit about your role as a CTO. Where do you spend your time? Yeah, so I spend about half of my time talking to customers and partners about where they are on their digital transformation journeys and where they struggle with this sort of last phase where we start talking about bringing those cloud principles and practices into the data world. How do I take those data? Warehouses, those data lakes, those distributed data systems into the enterprise and deploy them in a cloud-like manner. And then the other half of my time is working with our product teams to feed that information back so that we can continually innovate to the next generation of our software platform. So, I remember I've been following HPE and HPE for a long, long time. theCUBE was documented. We go back to sort of when the company was breaking in two parts. And at the time, a lot of people were saying, oh, HPE is getting rid of the software business. They're getting out of software. I said, no, no, no, no, hold on. They're really focusing and the whole focus around hybrid cloud and now as a service and sort of really retooling that business and sharpened your focus. So, tell us more about Esmeral, it's a cool name. But what exactly is Esmeral software? I get this question all the time. So, what is Esmeral? Esmeral is a software platform for modern data and analytics workloads using open source software components. And we came from some inorganic growth. We acquired a company called Sightake that brought us a zero trust approach to doing security with containers. We bought blue data who came to us with an orchestrator before Kubernetes even existed in mainstream. They were orchestrating workloads using containers for some of these more difficult workloads, clustered applications, distributed applications like Hadoop, and then finally we acquired MapR which gave us this scale out distributed file system and additional analytical capabilities. And so what we've done is we've taken those components and we've also gone out into the marketplace to see what open source projects exist to allow us to bring those cloud principles and practices to these types of workloads so that we can take things like Hadoop and Spark and Presto and deploy and orchestrate them using open source Kubernetes, leveraging GPUs while providing that zero trust approach to security. That's what Esmeral is all about is taking those cloud practices and principles but without locking you in. Again, using those open source components where they exist and then committing and contributing back to the open source community where those projects don't exist. You know, it's interesting, thank you for that history. And when I go back, I was there since the early days of big data and Hadoop and so forth, the MapR always had the best product. But back then it was like Kumbaya open source and they had this kind of proprietary system but it worked and that's why it was the best product. And so at the same time, they participated in open source projects because everybody did. That's where the innovation is going. So you're making that really hard to use stuff easier to use with Kubernetes orchestration. And then obviously I'm presuming with the open source chops sort of leaning in to the big trends that you're seeing in the marketplace. So my question is what are those big trends that you're seeing when you speak to technology executives which is a big part of what you do? Yeah, so the trends I think are a couple of fold and it's funny about Hadoop. I think the final nails in the coffin have been hammered in with the Hadoop space now. And so that leading trends of where organizations are going we're seeing organizations wanting to go cloud first but they really struggle with these data intensive workloads. Do I have to store my data in every cloud? Am I gonna pay egress in every cloud? Well, what if my data scientists are most comfortable in AWS but my data analysts are more comfortable in Azure? How do I provide that multicloud experience for these data workloads? That's the number one question I get asked. And that's probably the biggest struggle for these chief data officers, chief digital officers is how do I allow that innovation but maintaining control over my data compliance especially we talk international standards like GDPR to restrict access to data, the ability to be forgotten in these multinational organizations how do I sort of square all of those components? And then how do I do that in a way that just doesn't lock me into another appliance or software vendors stack? I want to be able to work within the confines of the ecosystem use the tools that are out there but allow my organization to innovate in a very structured compliant way. I mean, I love this conversation and just to me, you hit on the key word which is organization. I want to talk about what some of the barriers are and again, you heard my wrap up front. I really do think that we've created not only from a technology standpoint and yes, the tooling is important but so is the organization. And as you said, you know, an analyst might want to work in one environment a data scientist might want to work in another environment. The data may be very distributed. They may be you might have situations where they're supporting the line of business the line of business is trying to build new products. And if I have to go through this high this monolithic centralized organization that's a barrier for me. And so we're seeing that change I kind of alluded to it up front but what do you see as the big, you know barriers that are blocking this vision from becoming a reality? It very much is organization Dave. It's the technology is actually no longer the inhibitor here. We have enough technology enough choices out there that technology is no longer the issue. It's the organization's willingness to embrace some of those technologies and put just the right level of control around accessing that data. Because if you don't allow your data scientists and data analysts to innovate they're going to do one of two things. They're either going to leave and then you have a huge problem keeping up with your competitors or they're going to do it anyway and they're going to do it in a way that probably doesn't comply with the organizational standards. The more progressive enterprises that I speak with have realized that they need to allow these various analytical users to choose the tools they want to self provision those as they need to and get access to data in a secure and compliant way. And that means we need to bring the cloud to generally where the data is. Because it's a heck of a lot easier than trying to bring the data where the cloud is while conforming to those data principles. And that's HPE strategy. You've heard it from our CEO for years now everything needs to be delivered as a service. It's Ezreal software that enables that capability such as self service and secure data provisioning, et cetera. You're getting I love this conversation because if you go back to the early days of Hadoop that was what was profound about Hadoop. Bring five megabytes of code to a petabyte of data and it didn't happen. We shoved it all into a data lake and it became a data swamp. And so it's okay and that's okay. It's a 1.0 maybe in data this is like data warehouses, data hubs, data lakes and maybe this is now a 4.0 but we're getting there. So an open source, one thing's for sure it continues to gain momentum. It's where the innovation is. I wonder if you could comment on your thoughts on the role that open source software plays for large enterprises. Maybe some of the hurdles that are there whether they're legal or licensing or just fears. How important is open source software today? I think the cloud native development following the 12 factor applications, microservices based paved the way over the last decade to make using open source technology tools and libraries mainstream. We have to tip our hats to Red Hat for allowing organizations to embrace something so core as an operating system within the enterprise but what everyone realized is that it's support. That's what has to come with that. So we can allow our data scientists to use open source libraries, packages and notebooks but are we gonna allow those to run in production? And so if the answer is no, then if we can't get support, we're not gonna allow that. So where HPE Esmeralda is taking the lead here is again embracing those open source capabilities but if we deploy it, we're gonna support it or we're gonna work with the organization that has the committers to support it. You call HPE the same phone number you've been calling for years for tier one 24 by seven support and we will support your Kubernetes, your Spark, your Presto, your Hadoop ecosystem of components. We're that throat to choke and we'll provide all the way up to break fixed support for some of these components and packages giving these large enterprises the confidence to move forward with open source but knowing that they have a trusted partner in which to do so. And that's why we've seen such success with say for instance, managed services in the cloud versus throwing out all the animals in the zoo and say, okay, figure it out yourself. But then of course, what we saw, which was kind of ironic was we saw people finally said, hey, we could do this in the cloud more easily. So that's where you're seeing a lot of data land. However, the definition of cloud or the notion of cloud is changing. No longer is it just this remote set of services somewhere out there in the cloud, some data center somewhere. No, it's moving to on-prem. On-prem is creating hybrid connections. You're seeing, you know, co-location facilities very proximate to the cloud. We're talking now about the edge, the near edge and the far edge, deeply embedded, you know. And so that whole notion of cloud is changing. But I want to ask you, there's still a big push to cloud. Everybody has a cloud-first mantra. How do you see HPE competing in this new landscape? I think collaborating is probably a better word although you could certainly argue if we're just leasing or renting hardware, then it would be competition. But I think, again, the workload is gonna flow to where the data exists. So if the data is being generated at the edge and being pumped into the cloud, then cloud is prod. That's the production system. If the data is generated via on-premises systems, then that's where it's gonna be executed. That's production. And so HPE's approach is very much coexist. It's a coexist model of, if you need to do dev tests in the cloud and bring it back on-premises fine or vice versa, the key here is not locking our customers and our prospective clients into any sort of proprietary stack. As we were talking about earlier, giving people the flexibility to move those workloads, to where the data exists, that is gonna allow us to continue to get share of wallet, mind share, continue to deploy those workloads. And yes, there's going to be competition that comes along. Do you run this on a GCP or do you run it on a GreenLake on-premises? Sure, we'll have those conversations, but again, if we're using open-source software as the foundation for that, then actually where you run it is less relevant. So there's a lot of choices out there when it comes to containers generally and Kubernetes specifically. You may have answered this, you get the zero trust component, you've got the orchestrator, you've got the scale out piece, but I'm interested in hearing in your words why an enterprise would or should consider Esmerell instead of alternatives to Kubernetes solutions? It's a fair question and it comes up in almost every conversation. Oh, we already do Kubernetes. We have a Kubernetes standard and that's largely true in most of the enterprises I speak to. They're using one of them any on-premises distributions and they're cloud distributions and they're all fine. They're all fine for what they were built for. Esmerell was generally built for something a little different. Yes, everybody can run microservices based applications, DevOps based workloads, but where Esmerell is different is for those data intensive and clustered applications. Those sort of applications require a certain degree of network awareness, persistent storage, et cetera, which requires either a significant amount of intelligence, either you have to write in Golang or you have to write your own operators or Esmerell can be that easy button. We deploy those stateful applications because we bring a persistent storage later. That came from AppBar. We're really good at deploying those stateful clustered applications and in fact, we've open sourced that as a project. Cube director, that came from Blue Data and we're really good at securing these using Spiffy Inspire to ensure that there's that zero trust approach that came from Sightail and we've wrapped all of that in Kubernetes. So now you can take the most difficult, gnarly, complex data intensive applications in your enterprise and deploy them using open source. And if that means we have to coexist with an existing Kubernetes distribution, that's fine. That's actually the most common scenario that I walk into is I start asking about, what about these other applications you haven't done yet? The answer is usually we haven't gotten to them yet or we're thinking about it and that's when we talk about the capabilities of Esmerell and I usually get the response, oh, A, we didn't know you existed and B, well, let's talk about how exactly you do that. So again, it's more of a coexist model rather than a compete with model, Dave. Well, that makes sense. I mean, I think again, a lot of people think, oh yeah, Kubernetes is no big deal. It's everywhere, but you're talking about a solution, kind of taking a platform approach with capabilities. You got to protect the data. A lot of times these microservices aren't so micro and things are happening really fast. You've got to be secure, you got to be protected. And like you said, you've got a single phone number. People say one throat to choke, somebody to the other side of the day said, no, no, single hand to shake. It's more of a partnership. And I think that's apropos for HPE, Matt, with your heritage. So, thinking about this whole, we've gone through the pre-big data days and the big data was all the hot buzzword. People don't maybe necessarily use that term anymore, although the data is bigger and getting bigger, which is kind of ironic. Where do you see this whole space going? We've talked about the sort of trend toward breaking down the silos, decentralization, maybe these hyper-specialized roles that we've created, maybe getting more embedded or aligned with the line of business. It feels like the next 10 years are going to be different than the last 10 years. How do you see it, Matt? I completely agree. I think we are entering this next era. And I don't know if it's well-defined. I don't know if I would go out on an edge to say exactly what the trend is going to be, but as you said earlier, data lakes really turned into data swaps. We ended up with lots of them in the enterprise. And enterprises had to allow that to happen. They had to let each business unit or each group of users collect the data that they needed. And IT sort of had to deal with that down the road. And so I think the more progressive organizations are leading the way. They are, again, taking those lessons from cloud and application developments, microservices. And they're allowing a freedom of choice. They're allowing data to move to where those applications are. And I think this decentralized approach is really going to be king. And you're going to see traditional software packages. You're going to see open source. You're going to see a mix of those. But what I think will probably be common throughout all of that is there's going to be this sense of automation, this sense that we can't just build an algorithm once, release it, and then wish it luck, that we've got to treat these analytics and these data systems as living things, that there's life cycles that we have to support, which means we need to have DevOps for our data science. We need a CICD for our data analytics. We need to provide engineering at scale like we do for software engineering. That's going to require automation and an organizational thinking process to allow that to actually occur. And so I think all of those things, the sort of people process products, it's all three of those things that are going to have to come into play, but stealing those best ideas from cloud and application developments, I think we're going to end up with probably something new over the next decade or so. Again, I'm loving this conversation. So I'm going to stick with it for a second. It's hard to predict, but I'll some takeaways that I have, Matt, from our conversation. I wonder if you could comment. I think the future is more open source. You mentioned automation. Devs are going to be key. I think governance as code, security designed in at the point of code creation is going to be critical. It's no longer going to be a bolt on. And I don't think we're going to throw away the data warehouse or the data hubs or the data lakes. I think they become a node. I like this idea, I don't know if you know, Jamak Tagani, but she has this idea of a global data mesh where these tools, lakes, whatever, they're a node on the mesh. They're discoverable. They're shareable. They're governed in a way. And that really, I think the mistake a lot of people made early on in the big data movement is, oh, we got data. We have to monetize our data as opposed to thinking about what products that I can I build that are based on data that then I can lead to monetization. And I think the other thing I would say is the business has gotten way too technical. And it's alienated a lot of the business lines. And I think we're seeing that change. And I think things like Esmeral that simplify that are critical. So I'll give you the final thoughts based on my rant. No, your rant is spot on, Dave. I think we are in agreement about a lot of things. Governance is absolutely key. If you don't know where your data is, what it's used for and can apply policies to it, it doesn't matter what technology you throw at it, you're gonna end up in the same state that you're essentially in today with lots of swamps. I did like that concept of a node or a data mesh. It kind of goes back to the similar thing with a service mesh or a set of APIs that you can use. I think we're gonna have something similar with data. The trick is always how heavy is it and how easy is it to move about? And so I think there's always going to be that latency issue, maybe not within the data center, but across the WAN. Latency is still going to be key, which means we need to have really good processes to be able to move data around. As you said, governance determine who has access to what, when and under what conditions and then allow it to be free. Allow people to bring their choice of tools, provision them how they need to while providing that audit compliance and control. And then again, as you need to provision data across those nodes for those use cases, do so in a well measured and governed way. I think that's sort of where things are going, but we keep using that term governance. I think that's so key. And there's nothing better than using open source software because that provides traceability, the auditability and this frankly openness that allows you to say, I don't like where this project's going. I want to go in a different direction. And it gives those enterprises a control over these platforms that they've never had before. Matt, thanks so much for the discussion. I really enjoyed it. Awesome perspectives. Yeah, well, thank you for having me, Dave. I have excellent conversation as always. Thanks for having me again. All right, you're very welcome. And thank you for watching everybody. This is theCUBE's continuous coverage of HPE Discover 2021. Of course, the virtual version next year. We're going to be back live. My name is Dave Vellante. Keep it right there.