 Live from San Jose in the heart of Silicon Valley. It's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Peter Smales. He is the Vice President of Marketing at Imanus Data. Thanks so much for coming on theCUBE. Thanks for having me, glad to be here. So you've been in the data storage solution industry for a long time, but you're new to Imanus. Thank you for a while. What made you jump? What was it about Imanus that? Yep, so very easy to answer that it's a hot market. So essentially what Imanus is all about is we're an enterprise data management company. So the reason I jumped here is because if I put it in market context, if I take a small step back, I put it in market context, here's what's happening. You've got your traditional application world, right? On-prem, typically RDMS-based applications. That's the old world. New world is everybody's moving to microservices-based applications. For IoT, for customer 360, for customer analysis, whatever you want, they're building these new modern applications. They're building those applications, not in traditional RDMS. They're building them on more microservices based on architectures built on top of Hadoop or built on no-SQL databases. Those applications, as they go mainstream and they'd be going to production environments, they require data management. They require backup, they require backup and recovery. They require disaster recovery. They require archiving, et cetera. They require the whole plethora of data management capabilities. Nobody's touching that market. It's a blue ocean. So that's why I'm here. And the amount of sales you were saying is sort of the greatest little company no one's ever heard of. You've been around five years. Yeah, no, the company is not new. So the thing that's exciting, as a marketer, what's exciting is that we're not sort of out there just pitching our wares, untested technology. We have blue chip. We're getting into customers that people would die to get into. Big blue chip companies, because we're addressing a problem that's materialist. They roll out these new applications. They've got to have data management solutions for them. The company's been around five years. And I've only been on about a month. But what that's related and resulted in is that over the last five years what they've had the opportunity, it's an enterprise product. And you don't build an enterprise product overnight. So they've had the last five years to really gestate the platform, gestate the technology, prove it in real world scenarios. And now the opportunity for us as a company is, we're doubling down from a marketing standpoint. We're doubling down from the sales infrastructure standpoint. So the timing's right to essentially put this thing on the map, make sure everybody does know exactly what we do because we're solving a real world problem. You're back up and restored, but much more. Can you lay out the broad set of enterprise data management capabilities that Manistata currently supports in your product portfolio and how you're going in terms of evolving what you offer? Yeah, that's great. I love that question. So think of us as the platform itself as this highly scalable distributor architecture. Okay, and so we scale in multiple, and it'll come directly to your question. We scale in a number of different ways. One is we're infinitely scalable just in terms of computational power. So we're built for big data by definition. Number two is we scale very well from a storage efficiency standpoint. So we can support very large volumes of storage, which is a requirement. We also scale very much from a use case standpoint. So we support use cases throughout the lifecycle. The one that gets all sort of the attention is obviously backup recovery because you have to protect your data. But if I look at it from a lifecycle standpoint, our number two use case is test dev. So a lot of these organizations building these new apps now, they want to spin up subsets of their data because they're supporting things like CICD, okay? So they want to be able to do rapid testing and such. Develop dev ops and stuff like that. Yeah, dev ops and so forth. So they need test dev. So we help them automate the process and orchestrate the process of test dev. Supporting things like sampling. I may have a one petabyte data set. I'm not going to do test dev against that. I want to do 10% of that and spin that up and I want to do some massing of private to PII data. So we can do massing and sampling against that sport test dev. We do backup and recovery. We do disaster recovery. So some customers, particularly in the big data space, they may for now say, well, I have replicas. So some of this data, it's not permanent data, it's transient data, but I do care about DR. So DR is a key use case. We also do archiving. So if you just think of data through the lifecycle, we support all those. The piece in terms of where we're going is that what's truly unique in addition to everything I just mentioned is that we're the only data management platform that's machine learning based. So machine learning gets a lot of attention and all that type of stuff, but we're actually delivering machine learning enabled capabilities to that. We discussed this before this interview. There's a bit of like anomaly detection. Correct. How exactly are you using machine learning in terms of what value does it provide to an enterprise data administrator? They have ML inside your tool. Inside our platform. Great question. Very specifically, the product we're delivering today, essentially there's a capability in the product called ThreatSense. Okay, so the number one use case as I mentioned is backup and recovery. So within backup and recovery, ThreatSense, what it will do with no user intervention whatsoever, what it will do is it will look, it will analyze your backup, your backups as they go forward. And what it will do is it will learn what a normal pattern looks like across like 50 different metrics, the details of which I couldn't give you right now, but essentially a whole bunch of different metrics that we look at to establish this is what a normal baseline looks like for you or for you kind of thing. Great, that's number one. Number two is then we look and constantly analyze does anything, is anything occurring that is knocking things outside of that, creating an anomaly? Does something fall outside of that? And when it does, we're notifying the administrators, you might want to look at this, something could have happened. So the value very specifically is around ransomware. Typically the way you're going to, one of the ways you'll detect ransomware is you will see an anomaly in your backup set because your data set will change materially. So we will be able to tell you. Because somebody's holding it for ransom is what you're saying. Correct, so they might have blocked, something's going to happen to your data pattern. You lost data that should be there. Correct. Or whatever it might be. It could be that you lost data. Your change rate went way up or something. There's any number of things that could trigger it. And then we let the administrator know, it happened here. So today we don't then turn around and just automatically solve that, but to your point about where we're going, we've already broken the ice on delivering machine learning and able data management. In this case around anomaly detection. That might indicate you want to checkpoint your backups to like a few days before this was detected. So at least you have, you know what data is most likely missing. So yeah, I understand. Dingo, that's exactly right. Now where we're going with that is you can imagine having the machine learning powered data management platform at our core. How many different ways we can go with it? When do I backup? What data do I backup? How do I create the optimal RTO and IRPO? From a storage management standpoint, when do I put what data where? There's all kinds of the whole library science of data management. The future of data management is machine learning based. There's too much data. There's too much complexity for humans to just be able, you need to bring machine learning into the equation to help you harness the power of your data. We've broken the ice, we got a long way to go, but we've got the platform to start with and we've already introduced a first use case around this and you can imagine all the places we can take this going forward. So you're the company that's using machine learning right now, what in your opinion will separate the winners from the losers? In terms of vendors or in terms of the customers? In terms of, well in both, I mean. Yeah, well let me answer that too. So, well let me answer sort of the inward outward versus how we are unique. We are very unique in the sense we're infinitely scalable. We are a single pane of glass for all of your distributed systems. We are very unique in terms of our multi-stage data reduction and we're the only vendor that's doing, from a technology differentiation standpoint, we're the only vendor that's doing machine learning-based stuff. So we've got- Multi-stage data reduction, I'll break that out. What does that actually mean in practice? So, we get the question for you. Let's get that compression and deduplication, is there something else in there? There's a ration, there's a couple different things actually. So why does that matter? So a lot of customers will ask the question, well by definition, you know, no sequel or redupation environments, it's all based on replicas, so I don't need to back things up. First of all, replication is in backup. So that's lesson number one. Point in time backup is very different than replication. Replication replicates bad data just as quickly as it replicates good. When you back up these very large data sets, you have to be incredibly efficient in how you do that. What we do with multi-stage data reduction is one, we will do deduplication. So we'll do variable length deduplication, we will do compression, we will do erasure coding, but the other thing that we'll also do in there is what we call, we've got a global deduplication pool. So when we're dedupping your data, we're actually dedupping it against a very large data set. So there's value in, this is where size matters. So the larger the data set, your data is all secured, but the larger the size of the data that I'm actually storing, the higher percentage I can get of deduplication because I've got a higher pool to reduce against. So the net result is we're incredibly efficient when you're talking about petabyte scale data management, we're incredibly efficient to the tune of 10x, easily 10x over traditional deduplication and multi-x over technologies that are more current, if you will. So back to your question about the, we are confident that we have a very strong head start. Our opportunity now is we got to drive, why we're here, because we got to drive awareness, we got to make sure everybody knows who we are and how we're unique and how we're different. And you guys are great, love being on theCUBE. From a customer standpoint, the customers, they're going to win, and this is sort of a cliche, but it's true, it's like the customers that best harness their data are the ones that are going to win. They're going to be more competitive, they're going to be able to find ways of being differentiated, and the only way they're going to do that is they're going to make the appropriate investments in their data infrastructure, in their data lakes, in their data management tools so that they can harness all that data. Where do you see the future of your Hortonworks partnership going? So Hortonworks is, so we support a broad ecosystem. So Hortonworks is just as important as any of our other data source partners. So we are, where we see that unfolding is they're going to, we play an important part in, we feel our value when we put it that way. We feel our value in helping Hortonworks as more and more organizations go mainstream with these applications. These are not corner cases anymore. This is not sort of in the lab. This is like the real deal. You know, this is mainstream enterprises running business critical applications. The value we bring is you're not going to rely on those platforms without an enterprise data management solution that delivers what we deliver. So our value there is we can go to market, there's all kinds of ways we can go to market together, but net net our value there is that we provide a very important enterprise data management capability that's important for customers that are deploying this in these business critical environments. Great. Very good. As the, as more of the data gets persisted out at the edge devices and the internet of things and so forth, what are the challenges in terms of protecting that data, backup and restore, deduplication and so forth? And to what extent is your company's managed data to be addressing those kinds of more distributed data management requirements going forward? Do you see that on the horizon? Are you hearing that from customers? You want to do more of that? You know, more of an edge cloud environment? Or is that way too far in the future? I don't think it's way too far in the future, but I think that I do think there's an inside out. And so, so my position on that is it's not that there isn't edge work going on. What I would contend is that the big problem right now from an enterprise mainstream standpoint is more getting the house in order, just your core house in order from, you've moved from sort of a traditional four wall data center to a hybrid cloud environment, maybe not quite as edge. A combination of how do I leverage on-prem in the cloud, so to speak? And how do I get the core data lake in the case of Hortonworks? How do I get that big data lake sorted out? You're touching on, I think, a longer discussion which is, you know, do you bring compute, you know, where is the analysis going on? You know, where's the data going to persist? You know, where do you do some of that computational work? So do you get all this information out at the edge? Does all that information end up going into the data lake? So do you move the storage to where the lake is? Do you start pushing some of the lake function out to the edge where you have to then start doing some of the, so it's a much more complicated discussion? I know we had this discussion over lunch. This may be outside your wheelhouse, but let me just ask it anyway. We see more at Wikibon, I cover AI and distributed training and distributed inferencing. So the edges are capturing the data and more and more there's a trend toward them performing local training of their models, their embedded models from the data they capture. But quite often edge devices don't have a ton of stores and they're not going to retain that long term. But some of that data will need to be archived, will need to be persisted in a way that, you know, and managed as a core resource. So we see that kind of requirement maybe not now, but in a few years time, distributed training and persistence of that data, protection of that data, becoming a mainstream enterprise requirement where AI and machine learning, the whole pipeline is concerned. That's, like I said, that's probably outside you guys' wheelhouse. That's probably outside the... The only thing I would add to... That kind of thing is coming out as the likes of Horton work and IBM and everybody else is starting to look at and implement containerization of analytics and data management out to all these micro devices. Yes, and I think you're right there. And to your point about the, we're kind of going where the data is if you will in volumes kind of thing. It's going that direction and frankly where we see that happening is that's where the cloud plays a big role as well, because there's edge, but how do you get to the edge? You can get to the edge of the cloud. So again, we run on AWS, we run on GCP, we run on Azure, we run, so to be clear in terms of the data we can protect, we've got a broad portfolio of, you know, a broad ecosystem of Hadoop based, big data, you know, data sources that we support as well as NoSQL. If they're running on AWS or GCP or Azure, we support ADLS, so we support Azure's data like stuff, HD inside and we support a whole bunch of different things, both from a cloud standpoint is on prem, which is where we're seeing some of that, some of that edge work happening. Well, Peter, thank you so much for coming on theCUBE. It's always a pleasure to have you on. Yes, thanks for having me and I look forward to being back sometime soon. We'll have you. Thank you both. When the time is right. Cool. Indeed. We will have more from theCUBE's live coverage of data works just after this.