 from the SiliconANGLE Media Office in Boston, Massachusetts. It's theCUBE. Now, here's your host, Dave Vellante. Hi, buddy, welcome to this CUBE conversation here in our studios outside of Boston. My name is Dave Vellante. I'm here with Matt Carroll, who's the CEO of Immuta. Matt, good to see you. All right, nice to be here. So we're going to talk about governance, how to automate governance, data privacy. But let me start with Immuta. What is Immuta? Why did you guys start this company? Yeah, Immuta is an automated data governance platform. We started this company back in 2014 because we saw a gap in the market to be able to control data. What's happened in the market, it's changed is that every enterprise wants to leverage their data. Data is the new app. But governments want to regulate it and consumers want to protect it. And these were at odds with one another. So we saw a need of creating a platform that could meet the needs of everyone to democratize access to data in the enterprise. But at the same time, provide the necessary controls on the data to enforce any regulation and ensure that there was transparency as to who's using it and why. So let's unpack that a little bit, just trying to dig into the problem here. So we all know about the data explosion, of course. And I often say data used to be a liability or an asset, sorry, a liability. Now it's turned into an asset, right? People use data, get rid of the data after seven. Now everybody wants to mine it. They want to take advantage of it. But that causes privacy concerns for individuals. We've seen this with Facebook and many others. Is regulations now coming to play? GDPR, different states applying different regulations. So you have all these competing forces. The business guys just want to go and get out to the market, but then the lawyers and the compliance officers and others, so are you attacking that problem? Maybe you can describe that problem a little further and talk about how you guys- Yeah, absolutely. I mean, as you described, there's over 150 privacy regulations being proposed over 25 states just in 2019 alone. GDPR has created, has opened the floodgates, if you will, for people to start thinking about how do we want to insert our values into data? How should people use it? And so the challenge is now is, right, your most sensitive data in an enterprise is most likely going to give you the most insight into driving your business forward, creating new revenue channels and be able to optimize your operational expenses. But the challenges is that consumers have awoken to, well, we're not exactly sure we're okay with that, right? We signed the EULA with you to just use our data for marketing, but now you're using it for other revenue channels? Why? And so where Immuta is trying to play in there is, how do we give the line of business the ability to access that instantaneously, but also give the CISO, the Chief Information Security Officer, and the governance team's the ability to take control back, right? So it's a delicate balance between speed and safety. And I think what's really happening in the market is, we used to think about security from building firewalls, right? We invested in physical security controls around managing external adversaries from stealing our data. But now it's not necessarily someone trying to steal it, it's just potentially misusing it by accident in the enterprise. And the CISO is having to step in and provide that level of control. And it's also the collision of the cloud and these privacy regulations. Because now it's, we have data everywhere, it's not just in our firewalls. And that's the big challenge. That's the opportunity to hand is, democratization of data in the enterprise, the problem is, that is not all in the enterprise. That is in the cloud, that is in SaaS, that is in the infrastructure. It's distributed by its very nature. So there's a lot of things I want to follow up on that. So first is GDPR. When GDPR came out, of course, you know, it was May of 2018, I think it went into effect. Actually, you know, it came out in 2017, but the penalties didn't take effect till 18. And I thought, okay, well, maybe this can be a framework for, you know, governments around the world and states. It sounds like yes, sort of, but not really. Maybe there's elements of GDPR that people are adopting, but then it sounds like they're putting in their own twist, which is going to be a nightmare for companies. So are you not seeing this sort of GDPR becoming this global standard? It sounds like no. I don't think it's going to be necessarily a global standard, but I do think the spirit of the GDPR and at the core of it is, why are you using my data? What was the purpose, right? So traditionally, when we think about using data, we think about, all right, who's the user and what authorizations do they have, right? But now there's a third question. Sure, you're authorized to see this data, depending on your role and organization, right? But why are you using it? Are you using it for, you know, your certain business use? Are you using it for personal use? What, why are you using this? That's the spirit of the GDPR that everyone is adopting across the board. And then of course, each state or each federal organization is thinking about their unique lens on it, right? And so you're right, this is going to be incredibly complex and the amount of policies being enforced at query time, I'm in my, you know, my favorite, like let's just say I'm in Tableau or Looker, right? I'm just some simple analyst, I'm a young kid, I'm 22, my first job, right? And I'm running these queries. I don't know where the data is, right? I don't know what I'm combining. And what we found is on average in these large enterprises, any query at any moment in time might have over 500,000 policies that need to be enforced in real time. And it's only getting worse. We have to automate it. No human can handle all those edge cases. We have to automate it. So, and I want to get into how you guys actually do that. Before I do it, there seems to be, you know, there's a lot of confusion in the marketplace. Take the word data management, data protection. So all the backup guys are using that term, the database guys use that term, GRC folks use that term. So there's a lot of confusion there. You have all these adjacent markets coming together. You've got the whole governance, risk and compliance space. You've got cybersecurity. There's privacy concerns, which is kind of two sides of the same coin. How do you see these adjacencies coming together? It seems like you sit in the middle of all that. Yeah, welcome to why my marketing budget is getting bigger and bigger. No, I, you know, it's the challenge we're facing now is I think who owns the problem, right? There's all these, the chief data officers taking on a much larger role in these organizations. The CISO is taking a much more larger role reporting up to the board. You have the line of business who now is almost self sustaining. They don't have to depend on IT as much any longer because of the cloud and because of the new computer layers to make it easier. So who owns it? At the end of the day where we see it is we think there's a next generation of cyber tools that are coming out. We think that the CISO has to own this. And the reason is, is that the CISO's job is to protect the enterprise from cyber risk. And at the core of cyber risk is data and they must own the data problem. The CDO must find the data and explain what that data is and make sure it's quality. But it is the CISO that must protect the enterprise from these threats. And so ICS is part of this next wave of cyber tools that are coming out. There's other companies that are equally, you know, in our stratosphere like big ID, we're seeing like AWS with Macie doing sensitive data discovery. Google has their data loss prevention service. So the cloud players are starting to see, hey, we got to identify sensitive data. There's other startups that are saying, hey, we got to identify and catalog sensitive data. And for us, we're saying, hey, we need to be able to consume all that cataloging, understand what's sensitive and automatically apply policies to ensure that any regulation in that environment is met. I want to ask you about the cloud, too. So much to talk to you about here, Matt. But so I also wanted to get your perspective on variances within industries. So you mentioned chief data officers. The ascendancy of the chief data officers started in financial services, health care and government where you had highly regulated industries. And now it's sort of seeped into, you know, more commercial. But in terms of those regulated industries, I mean, take healthcare, for example, there are specific nuances. Can you talk about what you're seeing in terms of industry variance? Yeah, it's a great point, starting with healthcare. What does it mean to be HIPAA compliant anymore? There are different types of devices now where I can point it at your heartbeat from a distance away and I can have 99% accuracy of identifying you, right? It takes three data points and any data set to identify 87% of U.S. citizens. If I have your age, sex and location, I can identify you. So what does it mean anymore to be HIPAA compliant? So the challenge is that, how do we build guarantees of trust that we've de-identified these data sets? Because we have to use it, right? No one's going to go into a hospital and say, you know what, I don't want you to save my life, because I want my data protected, right? No one's ever going to say that. So the challenges we face now across these regulated industries is the most sensitive data sets are critical for those businesses to operate. So there has to be a compromise. And so what we're trying to do in these organizations is help them leverage their data and build levels of proportionality to access data, right? So the key isn't to stop people from using data. The key is to build the controls necessary to leverage a small bit of the data that's, let's just say, we've made it indistinguishable. You can only ask aggregate statistics for the question. Well, you say, you know what, we actually found some really interesting things there. We need it to be a little bit more useful. It's this trade-off between privacy and utility. And it's a pendulum that swings back and forth. As someone proves I need more of this, you can swing, all right, we'll just mask it. All right, I need more of it? All right, we'll just redact some of the certain things. Nope, this is really important. It's gonna save someone's life. Okay, completely unmasked, you have the raw data. But it's that control that's necessary in these environments. That's what's missing. We came out of the US intelligence community. We understood this better than anyone because highly regulated, very sensitive data, but we knew we needed the ability to rapidly control, well, is this just a hunch or is this a 9-11 event, right? And you need the ability to switch like that. That's the difference. And so healthcare is going through a change of, we have all these new algorithms. Like Facebook the other day said, hey, we have machine learning algorithms can look at MRI scans, and we're gonna be better than anyone in the world if I identify these. Do you feel good about giving your dad a Facebook? I don't know. But we can maybe provide guaranteed anonymization to them to prove to the world that they're going to do right. That's where we have to get to. Well, this is huge, especially for the, I see this especially for the consumer because you just gave several examples. Facebook's going to know a lot about me, a mobile device, a FitBid. And yet if I want to get access to my own medical records, it's like a Fort Knox to try to get, hey, please give this to my insurance company. Yeah. You got to throw it through all these forms. And so you've got those diverging objectives. And so as a consumer, I want to be able to trust that when I say yes, you can use it go and I can get access to it and others can get access to it. So I want to understand exactly what it is that you guys do what you sell. Is it software? Is it SaaS? And then let's get into how it works. So what is it? Yeah, so we're a software platform. So we deploy into any infrastructure, but it is not multi-tenant. So we could deploy on any cloud or on-premises for any customer. And we do that with customers across the world. But if you think about like at the core of what is a Muda? Think of a Muda as like a system of record for the CISO or the line of business where I can connect to any data on any infrastructure, any compute layer. And we connect into over 61 different storage and compute platforms. We then have built a UI where lawyers can come in. We actually have three lawyers as employees that act as product managers to help any lawyer of any stature take what's on paper, these regulation, these rules and policies and turn it into, they digitize it essentially, and in active code. And so they can build any policy they want on any data in the ecosystem and the enterprise and enforce it globally without having to write any code. And then because we're this plane where you can connect any tool to this data and enforce any regulation because we're at the man in the middle, we can audit who is using what data and why and every action and any change in policy. And so if you think about it, it's connect to any data. So any tool to any data, control it, any regulation and prove compliance at a court of law. So you can set the policy at the data set level? Correct. And so how does one do that? Can you automate that on the creation of that data set? I mean, you've got dependencies. So how does that all work? Yeah, what's really interesting, part of our secret sources is that, one, we could do that at the column level, we could do it at the row level, we could do it at the cell level. So very granular. Very, very granular. This is something, again, we learned from the US intelligence community is that we have to have very fine grained access to every little bit of the data. And the reason is that especially in the age of data is people are gonna combine many data sets together. The challenge isn't enforcing the policy on a static data set. The challenge is enforcing the policy across three data sets, where you merge three pieces of data together who have conflicting policies. What do you do then? That's the beauty of our system is as we deal with that policy inheritance, we manage that lineage of the policy and can tell you, all right, here's what the policy will be. In other words, you can manage to the highest common denominator. Or the lowest common denominator. Or we can automate it to the lowest common denominator where you can work in projects together recognizing, hey, we're gonna bring someone into the project that's not gonna have the level of access everyone else and we'll automatically change it to the lowest common denominator. But then you share that work with another team and it'll automatically be brought to the highest common denominator. And we've built all these workflows in. That was what was missing. That's why I call it a system record. It's really a symbiotic relationship between IT, the data owner, governance and the CISO, who are trying to protect the data. And the consumer, and all they wanna do is just access the data as fast as possible to make better, more informed decisions. So the other mega trend you have is obviously the superpower of machine intelligence or artificial intelligence. And then you've got edge devices and machine-to-machine communication where it's just an explosion of IP addresses and data. And so you guys, it sounds like can attack that problem as well. Any of this data coming in on any system, the idea is that eventually it's gonna land somewhere. And you're gonna protect it. And we call that like rogue data, right? This is why I said earlier is that like when we talk about data, what we have to start thinking about is it's not in some building anymore. That is everywhere. It's gonna be on a cloud infrastructure. It's gonna be on premises. And it's likely in the future going to be on many distributed data centers around the world because that's business is global. And so yeah, what's interesting to us is no matter where the data is sitting, we can protect it. We can connect to it and we can protect it and we allow people to access it. And that's the key thing is not worrying about how to lock down your physical infrastructure. It's about logically separating it. And that's why what differentiates us from other people is, one, we don't copy the data, right? That's always the barrier for these types of platforms. We leave the data where it is. The second is we take all those regulations and we can actually at query time push it down to where that data is. So rather than bring it to us, we push the policy to the data. And what that does is that's what allows us and differentiates us from everyone else is, it allows us to guarantee that protection no matter where the data is living. So you're essentially virtualizing the data, right? Yeah, so it's virtual views of data, but it's not all the data. What people have to realize is in the day of apps, we cared about storage. We put all the data into a database. We built some services on top of it in a UI and it was controlled that way, right? You had all the nice business logic to control it. And in the age of data, right? Data is the new app, right? We have all these automation tools like DataRobot and H2O and Domino and Tableau's building all these automation workflows. Robotic process automation. Yeah, RPA, UiPath and work fusion, right? So they're making it easier and easier for any user to connect to any data and then automate the process around it. They don't need an app to build a unique workflows. These new tools do that for them. The key is getting to the data. And the challenge with the supply chain of data is time to data is the most critical aspect of that because the time to insight is perishable. And so what I always tell people a little story is is like, I came from the government. I worked in Baghdad. We had 42 minutes to know whether or not a bad guy in the environment we could go after him. After that, that data was perishable, right? We didn't know where he was. It's the same thing in the real world. It's like imagine if Google told you, well in 42 minutes it might be a good time to go 4.95. It's not very useful. It's, I need to know the information now. That's the key. So how, what we see is policy enforcement and regulations are the key barrier of entry. So our ability to rapidly with no latency be able to connect anyone to that data and enforce those policies of where the data lives. That's the critical nature. Okay, so you can apply the policies and you can do it quickly. And so now you can help solve the problem. You mentioned a cloud before or on-prem. What is the strategy there with regard to various clouds and how do you approach multi-cloud? I think cloud is what used to be an infrastructure as a service game is now becoming a compute game. I think large regulated enterprises, government, healthcare, financial services and insurance are all moving to cloud now in a different way. What do you mean by that? Because people take infrastructure as a service they'll say, oh that's a compute storage. Yeah, sure, sure. What do you mean by that? But I think there's a whole new age of software that's being laid on top of the availability of compute and availability of storage. That's like companies like Databricks. It's companies like Snowflake. And what they're doing is dramatically changing how people interact with data. The availability zones, the different types of features, the ability to rip and replace legacy warehouses and mainframes, it's changing the ability to not just access, but also the types of users that could even come on to leverage this data. And so these enterprises are now thinking through how do I move my entire infrastructure of data to them? And what are these new capabilities that I could get out of that? Which that is just happening now. A lot of people have been thinking, oh, this has been happening over the past five years. No, the compute game is now the new war. I see, I used to think of big data, right? Big data created, everyone started to understand, ah, if we've got our data assets together we can get value. Now they're thinking, all right, let's move beyond that. The new Cloud Air Hortonworks is Snowflake and Databricks. And what they're thinking about is, how do I take all your metadata and allow anyone to connect any BI tool, any data science tool and provide highly performant and highly dependable compute services to process petabytes of data. It's pretty fantastic. And very cost efficient in being able to scale, compute independent of storage from an architectural perspective. A lot of people claim they can do that, but it doesn't scale the same way. Yeah, and when you're talking about, because that's the thing is, you've got to remember these financial systems, especially they depend on these transactions. They cannot go down and they're processing petabytes of data. That's what the new war is over, is that data in the compute layer? And the opportunity for you is that data can come from anywhere. It's not sitting in a God box where you can enforce policies on that corpus. You don't know where it's coming. We want to be invisible to that, right? You're using Snowflake, it's just automatically enforced. You're using Databricks, it's automatically enforced. All these policies are enforced in flight. No one should even truly care about us. We just want to allow you to use the data the way you're used to using it. And you do this, the secret sauce you talked about is math, it's artificial intelligence. It's math. I wish I could say it was like super fancy, like unsupervised neural nets or whatnot. It's 15 years of working in the most regulated sticky environments. We learned about very simple novel ways of pushing it down. Great engineering is always simple. But what we've done is at query time, what's really neat is we figured out a way to take the user attributes from identity management system and combine that with a purpose. And then what we do is we've built all these libraries to connect into all these disparate storage and compute systems to push it in there. And the nice thing about that is prior to this, what people are doing is just making copies. They go to the data engineering team and they say, hey, I need to ETL this and get a copy and it'll be anonymized. Think about that for a second. One, the load on your production systems of all these copies all the time, right? The second is CISO, surface area. Now you've got all this data that a snapshot in time is legal and ethical, might change tomorrow. And so now you've got an increased surface area of risk. So it's the second like that no copy aspect. So the pushing it down and then the no copy aspects really change the game for enterprise. And you've got provenance issues. Like you say, you've got governance and compliance. And imagine trying, if someone said to you, imagine Congress said, hey, any data source that you've processed over the past five years, I want to know if there was these three people in any of these data sources. And if there were, who touched that data and why did they touch it? Yeah, and storage is cheap, but there's unintended consequences and people aren't, management isn't. We just don't have a unified way to look at all of the logs crosswise. So we started talking about cloud and then we kind of took it down a different path, but you offer your software on any cloud, is that right? Yeah, so right now we are in production on any of this marketplace, so you can procure us through any of this marketplace. And that is a managed service. So you can go deploy in there, it'll go into your VPC and we can manage the updates for you. We have no insight into your infrastructure, but we could push those updates and it'll automatically update. So you're getting our quarterly releases, we release every season. But yeah, we started with AWS and then we will grow out, we see cloud is just ubiquitous. So currently we still, we do support the BigQuery data proc, we support Azure data like storage version two, as well as Azure data bricks, but you can get us through any of this marketplace. And we're also investing in re-invent, we'll be out there in Vegas in a couple of weeks and it's a big event for us, just because obviously the government has a very big stake in AWS, but also commercial customers. It's been a massive endeavor to move, we've seen lots of infrastructure. Most of our deals now are on cloud infrastructure. So tell us, great. So tell us about the company you've raised, I think in a series B, about 28 million to date. Maybe you could give us the staff headcount and whatever you can share about momentum or maybe customer examples. So we've raised 32 million to date, and so from some great investors and the company's about 70 people now, so not too big, but not small anymore. And yeah, we've just this year at this point, I haven't closed my fiscal year, so I don't want to give too much, but we've doubled our ARR and we've tripled our logo count this year alone and we've still got one more quarter here, just started our fourth quarter. Yeah, and some customer cases, the way I think about our business is, I love healthcare, I love government, I love finance. And to give you some examples is like, Cogno is a really great example. Cogno and what they're trying to solve is, can they predict where a child is on the autism spectrum? And they're trying to use machine learning to be able to narrow these children down so that they can see patterns as to how a provider, a therapist is helping these families give these kids the skills to operate in the real world. And so it's like the symbiotic relationship, utilizing software, surveys and video and whatnot to help connect these kids that are in similar areas of the spectrum to help say, hey, this is a successful treatment, right? The problem with that is, is you need lots of training data and this is children, one, two, this is healthcare. And so how do you guarantee HIPAA compliance? How do you get through FDA trials through third-party blind testing and still continue to validate and retrain your models while protecting the identity of these children? So we provide a platform where we can anonymize all the data for them. We can guarantee that there's blind studies where the company doesn't have access to certain subsets of the data. And we can also then connect providers to gain access to the HIPAA data as needed. We can automate the whole thing for them. Imagine, and this is just a start, I mean, they're a startup too, there are 100 people. But imagine if you were a startup in this health tech industry and you had to invest in the backend infrastructure to handle all of that, it's too expensive. So, what we're unlocking for them, I mean, yes, it's great that they're HIPAA compliant and all that, that's what we want, right? But the more important thing is like, but we're providing a value add to innovate in areas utilizing machine learning that regulations would have stymied, right? We're allowing startups in that ecosystem to really push us forward and help those families. Yeah, because HIPAA compliance is a table stakes compulsory. But now you're talking about enabling new business models. Yeah, yeah, exactly. How did you get into all this? You're a CEO, you've got business savvy, but it sounds like, you know, you're pretty technical as well, what's your background? Yeah, I mean, so I worked in the intelligence community for before this. And most of my focus was on how do we take data and be able to leverage it either for counterterrorism missions to different non-kinetic operations. And so what I kind of grew up in is in this age, if you think about billions of dollars in Baghdad, what I learned is that through the computing infrastructure there, everything changed. Remember 2006, Baghdad created this like boom of technology, we had drones, right? We had all these devices on our trucks that were collecting information in real time and telling us things. And then we started building computing infrastructure in Burst Hadoop. So I kind of grew up in this era of big data, we were collecting all, we had no clue what to do with it, we had nowhere to process it. And so I kind of saw like, there's a problem here, if we can find the unique little, you know, nuggets of information out of that, we can make some really smart decisions to save lives. And so once I left that community, I kind of dedicated myself to that. And the burst of this company, again, it was spun out of the US intelligence community and it was really a simple problem. It was, they had a bunch of data scientists that couldn't access data fast enough. So they couldn't solve problems at the speed they needed to. They took four to six months to get to data, the mission said they needed it in less than 72 hours. So it was orthogonal to one another and so it was very clear we had to solve that problem fast. And so kind of that weird world of very secure, really sensitive, but also the success that we saw of using data, it was so obvious that we need to democratize access to data, but we need to do it securely and we need to be able to prove it. We worked with more lawyers in the intelligence community you could ever imagine. So the goal was always, how do we make a lawyer happy? If you figure that problem out, you have some success and I think we've done it. It was awesome in applying that, you know, use case, that example to the commercial business world, Scott McNeely's famous for saying there is no privacy and the internet get over it, well, guess what? People aren't going to get over it. You're seeing individuals are much more concerned with it after the whole Facebook and fake news debacle. And as well, organizations putting data in the cloud, they need to govern their data, they need that privacy. So Matt, thanks very much for sharing with us your perspectives on the market and best of luck with the Muda. Thanks so much, I appreciate it. Thanks for having me out. You're welcome. Cheers. All right, and thank you everybody for watching this CUBE Conversation. This is Dave Vellante, we'll see you next time.