 From around the globe, it's theCUBE with digital coverage of AWS re-invent 2020, sponsored by Intel, AWS, and our community partners. All right, you're continuing, or we're continuing around the clock coverage and around the world coverage of AWS re-invent 2020 virtual conference this year. I'm guessing hundreds of thousands of folks are tuning in for coverage and we have, we have on the other end of the country, a CUBE alum, Stephen Tau, co-founder and CTO of Immutable, Stephen. Welcome back to the show. Great to be here. Thanks for having me again. I hope to match your enthusiasm. You know what is, you're a co-founder, I'm sure you can match the enthusiasm. Plus we're talking about data governance. You've been on theCUBE before and you kind of laid the foundation for us last year talking about challenges around data access and data access control. I wanted to extend this conversation. I had a conversation with a CDO chief data officer a couple of years ago. And he shared how his data analysts, his, the people that actually take the data and make business decisions or create outcomes to make business decisions, spent 80% of their time wrangling the data, just doing transformations. How is Immuta helping solve that problem? Yeah, great question. So it's actually interesting. We're seeing a division of roles in these organizations where we have data engineering teams that are actually managing a lot of the prep work that goes into exposing data and releasing data to analysts. And as part of their day-to-day job is to ensure that data that they're releasing to the analysts is what they're allowed to see. And so we kind of see this problem of compliance getting in the way of analysts doing their own transformations. So it would be great if we didn't have to have it limited to just this small data engineering team to release the data. But we believe one of the real issues behind that is that they are the ones that are trusted. They're the only ones that can see all the data in the clear. So it needs to be a very small subset of humans, so to speak, that can do this transformation work and release it. And that means that the data analysts downstream are hamstrung to a certain extent and bottlenecked by requesting these data engineers do some of this transformation work for them. So I think because, as you said, that's so critical to being able to analyze data, that bottleneck could be a backbreaker for organizations. So we really think that you need to tie transformation with compliance in order to streamline your analytics and your organization. So that has me curious. What does that actually look like? Because when I think of a data analyst, they're not always thinking about, well, who should have this data? They're trying to get the answer to the question to provide to the data engineer. What does that functionally look like when you want to see that relationship of collaboration? Yeah, so we, I think the beauty of immuta and the beauty of governance solutions done right is that they should be invisible to the downstream analysts to a certain extent. So the data engineering team will take on some requirements from their legal and compliance teams, such as you need to mask KII or you need to hide these kinds of rows from these kinds of analysts, depending on what the user is doing. And we've just seen an explosion of different slices or different ways you should dice up your data and who's allowed to see what and not just about who they are, but what they're doing. And so you can kind of bake all of these policies upfront on your data in a tool like immuta. And it will dynamically react based on who the analyst is and what they're doing to ensure that the right policies are being enforced. And we can do that in a way that when the analysts, I mean, what we also see is just setting your policies on your data once upfront. That's not the end of the story. Like a lot of people will tap themselves in the back and say, hey, look, we've got all our data protected appropriately, job done. And that's not really the case because the analysts will start creating their own data products and they want to share that with other analysts. And so when you think about this, this becomes a very complex problem of, okay, before someone can share their data with anyone else, we need to understand what they were allowed to see. So being able to control kind of this downstream flow of transformations and feature engineering to ensure that only the right people are seeing the things that they're allowed to see, but still enabling analytics is really the challenges that we saw that immuta to help the data teams create those initial policies at scale, but also help the analytical teams build derivative data products in a way that doesn't introduce data leaks. So as I think about the traditional ways in which we do this, we kind of, you know, take a data set, let's say it's a database and we set security rules, et cetera, on those data sets, what you're leaning to is more dynamic, has immuta approaching this problem from just a architectural direction. Yeah, great question. So I'm sure you've probably heard the term role-based access control, and it's been around forever where you basically aggregate your users into roles and then you build rules around those roles. And pretty much every legacy RDBMS manages data access this way. What we're seeing now, and I call it the private data era that we're now embarking on or have been embarking on for the past three years or so, where consumers are more aware of their data privacy and the needs they have there. There's data regulations coming fast and furious with no end in sight. We believe that this role-based access control paradigm is just broken. We've got customers with thousands of roles that they're trying to manage to slice up the data all the different ways that they need to. So instead, we offer an attribute-based access control solution and also policy-based access control solution where instead it's really about how do you dynamically enforce policy by separating who the user is from the policy that needs to be enforced and having that execute at runtime. A good analogy to this is role-based access control is like writing code without being able to use variables. You're writing the same block of code over and over again with slight changes based on the role where attribute-based access control is you're able to use variables and basically the policy gets decided at runtime based on who the user is and what they're doing. So that dynamic nature kind of lends itself to the public cloud. Where are you seeing this applied in the world of AWS we're here at re-invent? So how are customers using this with AWS? So it all comes down to scalability. So the same reasons that you separate storage from compute, you get your storage in one place you can ephemeraly spin up compute like EMR if you want. You can use Athena against your storage in a serverless way. That kind of freedom to choose whatever compute you want the same kind of concepts apply with policy enforcement. You wanna separate your policy from your platform and this private data era has created this need just like you had to separate your compute from storage in the big data era. And this allows you to have a single plane of glass to enforce policy consistently no matter what compute you're using or what AWS resources you're using. And so this gives our customers power to not only build the rules that they need to build and not have to do it uniquely per service in AWS but also prove to their legal and compliance teams that they're doing it correctly. Because when you do it this way it really simplifies everything and you have one place to go to understand how policy is being enforced. And this really gives you the auditing and reporting around the enforcement that you've been doing to put everyone in these that everything's being done correctly and that your data consumers can understand how your data is being protected, their data is being protected. And you can actually answer those questions when they come at you. So let's put this idea to the test a little bit. So I have the data engineer who kind of designs the security policy around the data or implements that policy using immuta as dictated by the security and chief data officer of the organization. Then I have the analyst. And the analyst is just using the tools at their disposal. Let's say that one analyst wants to use AWS Lambda and another analyst wants to use R-type database or analysis tools. You're telling me that immuta allows the flexibility for that analyst to use either tool within AWS. That's right, because we enforce policy at the data layer. So if you think about immuta, it's really three layers. Policy authoring, which you touched on where those requirements get turned into real policies. Policy decisioning. So at query time, we see who the user is, what they're doing and what policy's been defined to dynamically build that policy at runtime. And then enforcement, which is what you're getting at the enforcement happens at the data layer. For example, we can enforce policies natively in Spark. So no matter how you're connecting to Spark, that policy is going to get enforced appropriately. So we don't really care about what the client tool is because the enforcement is happening at the data or the compute layer is a more accurate way to say it. So a practical reality of collaboration, especially around large data sets is the ability to share data across organizations. How is immuta hoping to just make that barrier a little lower but ensuring security so that when I'm sharing data with analysts within another firm, they're only seeing the data that they need to see, but we can effectively collaborate on those pieces of content. Yeah, I'm glad you asked this. I mean, this is like the big finale, right? This is what you get when you have this granularity on your own data ecosystem. It enables you to have that granularity now when you wanna share outside of your internal ecosystem. And so I think an important part about this is that when you think about governance, you can't necessarily have one God user, so to speak, that has control over all tables and all policies. You really need segmentation of duty where different parts of the org can hook in their own data, build their own policies in a way where people can't step on each other. And then this can expand this out to third-party data sharing where you can set different anonymization levels on your data when you're sharing an external of the organization versus if it's internal users and then someone else in your org could share their data with you and then that also to that third party. So it really enables and frees these organizations to share with each other in ways that weren't possibly before because it happens at the data layer. These organizations can choose their own compute and still have the same policies being forced. And again, going back to that consistency piece, it provides, think of it as almost a authoritative way to share data in your organization. It doesn't have to be ad hoc, oh, I have to share with this group over here. How should I do it? What policies should I enforce? There's a single authoritative way to set policy and share your data. So the first thing that comes to my mind, especially when we give more power to the users is when the auditor is gone. And they say, you know what, Keith? I understand this is the policy, but prove it. How do we provide auditors with the evidence that we're implementing the policy that we designed and then two, we're able to audit that policy? Yeah, good question. So I briefly spoke about this a little bit, but when you author and define the policies in Emuta, they're immediately being enforced. So when you write something in our platform, it's not a glorified Wikipedia, right? It's actually turning those policies on and enforcing it at the data layer. And because of that, any query that's coming through Emuta is going to be audited. But I think even more importantly, to be honest, we keep a history of how policy changes happening over time too. So you could understand, you know, so and so changed the policy on this table versus other table, you know, got newly added and these people got dropped from it. So you get this rich history of not only who's touching what data and what data is important, but you're also getting a rich history of, okay, how have we been treating this data from a policy perspective over time? How is it like, what were my risk levels over the past year with these six tables? And you can answer those kinds of questions as well. And then we're in the era of cloud. We expect to be able to consume these services via API, via pay-as-you-go type of thing. How is your relationship with AWS and ultimately the customer? How do I consume Emuta? Yeah, so Emuta can pretty much be deployed anywhere. So obviously we're talking AWS here. We have a SaaS offering where you can spin up Emuta free trial and just be off and running, building policies and hooking up, hooking our policy enforcement engine into your compute. That runs in our, you know, infrastructure. There's also a deployment model where you deploy Emuta into your VPC so it can run on your infrastructure behind your firewalls. And we do not require any public internet access at all for that to run. We don't do any kind of phone homing because we're obviously, privacy company, we take this very seriously internally as well. We also have on-premise deployments, again, with zero connectivity, air-gapped environments. So we offer that kind of flexibility to our customers wherever they want Emuta to be deployed. An important thing to remember there too is Emuta does not actually store any data. We just store metadata and policy information. So that also provides the customer some flexibility where if they want to use our SaaS, they can simply build policy in there and then the data still lives in their account. We're just kind of pushing policy down into that dynamically. So Stephen Tau, co-founder, CTO of Emuta. I don't think you had to worry about matching my energy level. I threw some pretty tough questions at you and you were ready there with all the answers. You want to see more interesting conversations from around the world with founders, builders. AWS re-invent is all about builders and we're talking to the builders throughout this show. Visit us on the web, The Cube. You can engage with us on Twitter, talk to you next episode of The Cube from AWS re-invent 2020.