 Hi, everyone. Welcome to the CUBE's presentation of the AWS Startup Showcase, Cyber Security. This is season three, episode three of the ongoing series that features exciting startups from the AWS ecosystem. I'm your host, Lisa Martin. Pleased to welcome back one of our CUBE alumni, Amish Devatya, co-founder and CEO of Baffle. He's here to talk about easy data protection for applications, analytics, and AI. Amish, great to have you back on the program. It's great to be here, Lisa. So as we talk about cloud often as we do, we know so many customers are migrating data to the cloud. This is customers in every industry worldwide. But there are some problems with moving data to the cloud with existing approaches. What are some of those problems? Sure. So first, let's talk about all the personas involved, right? So when it comes to data, the data scientists and the data analysts are the ones that are initiating these requests. What they have to do is to make sure that they can get security to sign off on a project before they actually move the data around. And then the last piece of it is the operations aspect of it, which is the data store managers and so on that anybody who's registering the database. What has made this even more complicated now is that not all three of these entities are even the same organization, right? Because of the fact that you are moving data to the cloud. So when data scientists start this process, they have to first identify sensitive data. They have to get security to approve and then move that data to a different infrastructure, typically the cloud where it can be analyzed. And then it lies the challenge. Sensitive data, if it's exposed in the cloud can create significant liabilities thanks to privacy regulations that find them if they lose their data. And we're going to be talking about some of those privacy regulations. One of the things also that I know in my research on back along what you guys are doing is that historically data-centric protection has been challenging to implement. Why is that? Well, because of the fact that you're changing the actual workflow, you're changing the way that data looks when it is transformed. Existing applications will not be able to process the data if it is an encrypted form. So there are different approaches to solving these problems. One of them is known as format preserving encryption or tokenization where the transformed data does look like the original data. So those are some of the controls that have to be put in place. So when you transform the data, you want to make sure that existing applications continue to work. And then when you compute the whatever query it is that you are creating from a business perspective as logic perspective, you want to restore the result to its original form. So these are some of the challenges that do happen, which is where these controls have been difficult to adopt. Everybody understands this is the last stand, right? If you can protect the data at the record level, you don't have to worry about anything, but it's been historically very difficult to implement. Let's talk about the situation that many customers find themselves in these days and that's hybrid cloud, multi-cloud. When we're talking about customers moving their data to the cloud, where is hybrid cloud and multi-cloud still relevant? Well, like what I like to say is data is always created outside of the cloud, right? You have transactions from point of sale terminals. If you're infrastructure, critical infrastructure player, you have a lot of IoT data. The data is created in the field. It needs to be ingested into a digital infrastructure, typically in the cloud. So what you want to make sure is that process is very clearly understood and very clearly engineered. That's where the hybrid comes in, right? It is not always everything is not in the cloud only. It originates outside and then gets ingested. Multi-cloud is what every enterprise follows because they don't want to have cloud lock-in. So that's the extent of the challenge. Infrastructure is multifaceted, but the controls need to be uniform. Right, uniform control is really key there. Data is the lifeblood of every organization, but data sprawl is a real thing. We talked about a lot of customers are in multi-cloud by default or because to your point, they don't want lock-in. But as sprawl is real, how is battle helping customers protect data that is so spread out? Couple of different ways. First of all, we make sure that we work very closely with data discovery vendors who can find data that is sensitive. So we have ways of integrating with these vendors so that the data itself is very clearly understood to be sensitive. We can protect the data as soon as it's created, number one. And number two, we protect it in a way where the existing application workflows are not impacted. We are merely a bump in the wire is what we like to say. We are a network proxy that sits between the place where the data is originated and where it's being migrated to. The migration tool is thinking that it's writing to the cloud database, but it's actually going through our transformation engine to make sure that no sensitive data ever leaves the firewall. And that's critical for organizations to get their hands on the data that's sensitive, making sure that it is safe and secure and doesn't go past that firewall. Let's talk about encryption. We talked about this last time you were here and compliance remind the audience why storage level encryption isn't enough for organizations most to meet compliance regulations. Yes, so first of all, let's understand what storage level encryption does, right? And why it was invented. There used to be a time when data centers were not secure. Everybody had their own data center, usually in the basement and the disk itself would get stolen or get misplaced. That was a massive, massive problem because of the fact that sensitive data might be lost. So the industry came up with a way to protect the media itself. The database vendors built on it by creating what is known as transparent data encryption but the definition of transparent data encryption is that as soon as the data is accessed it is decrypted and delivered to the database in the clear. Every day we come across these hacks where hackers are getting into enterprise environments and compromising credentials of the administrators of those databases. What we are doing is protecting exactly against that threat and the regulators are starting to wake up to that. They are now saying that if you protect the data at the storage tier you have to also adopt supplementary controls to make sure that there is ways of protecting the data all the way up to the application tier or anywhere in the infrastructure you cannot expose that data until it needs to be visualized. So that's what is really happening. PCI DSS version four which is just about to go into effect. It's already been published and it's going to be enforced starting January 1st, 2025 is causing this urgency to adopt these controls today. Tell us a little bit about some of those controls. Let's double click. You mentioned the latest version of the Kmart card industry data security standard PCI DSS goes into effect January 1, 2025. What are some of the controls that customers are going to need to be aware of? They're going to have to start putting in place. Yes. So first of all they'll have to make sure that credit cards are never exposed even in memory at no point should the credit cards be in the clear. The other part is that no point are the keys that are used to encrypt those credit card numbers are to be in the clear. So it requires sophisticated key management. Key management is a very well understood but still complex task because of the fact that there are multiple levels of keys involved. AWS provides the KMS capabilities as well as what is known as a hardware security module to back it up. So you have a data encryption key and then you have a key encryption key. The regulations require that the data encryption keys are never stored in the clear. So you need to have the key encryption key that encrypts the data encryption key to make sure that it's on the clear. The second requirement is that these keys are to be rotated. You cannot have the same key forever. And again, because of the fact that what if the key encryption key itself is compromised? Now you have the keys to the kingdom. So you want to rotate the key encryption key that in turn goes and re-encrypts the data encryption key. This envelope level layer encryption is what it's known as has a huge advantage because you don't have to re-encrypt your data. This complex capability is something that would require app dev resources to go and develop. What baffle does is actually completely absolves them of this capability. We actually do it in our product. That's our capability. And we make sure that there's no app dev burden at all. They just change the connection string, which means that they write to our abstraction layer before they move data to the cloud. And we take care of all of these complex transformations as well as key management. Let's talk a little bit about key management in particular. Bring your own key, BYLK. What regulations does bring your own key allow customers to meet? Specifically, GDPR article 17 actually calls out the ability to be forgotten, right? They call it the rights to erasure. So let's say you have data. You're erasing your data with somebody who has collected it. You have the right to tell them to erase your data. So the best way to actually implement something like this is to assign a key to a tenant that has data in these databases that are stored somewhere outside of your control. And then when you say you want to get erased, that key is revoked. And at that point, the data is opaque. CCPA and its successor, CPRA, have similar regulations as well, which require data collectors to disclose how much data they have and on request be able to delete it. So that's a specific example of what these regulations require and the controls that map into it. At Baffle, we provide our customers with a whole list of regulations and controls and how they can meet them by adopting each of those controls. Got it. Let's talk about private data now. We talk about that a lot. How do you protect private data when it's used with things like generative AI? Obviously a hot topic there. What's Baffle's role there? So this is a very, very challenging technical problem that the industry has come upon. As all of us know, generative AI is the most exciting thing that has happened in a long time. So we're all very excited about the outcomes that we get. We have to know that every single outcome that we are seeing with something like a chat GPT has benefited from all the data that it was able to ingest over the years. Now, when it comes to enterprises, they don't want anything to do with it. So they're using the big hammer approach by saying no more, no chat GPT access. Well, that only goes so far because that's only protecting against any public data that's out there and nobody wants to send public data over there. So what enterprises are starting to adopt is a blended model where you have, or lack of a better term, something called private GPT where you can have your own model that is in your own environment, completely controlled. And the cloud service providers are all providing infrastructure to do that. And then you can use those models to train on data that is sensitive. Baffle's role is that we have figured out how you can run analytics on data that is not the original data. The data is tokenized and now you can train your model on that data to make sure that you can still extract value out of it without actually exposing the data itself. And then what happens is the next step of this, once you've done the private GPT part, you still need to benefit from everything that public GPT does. So there's another layer of intelligence that comes in where you have the query that is now optimized and engineered to make sure that you can benefit from the private GPT outcome as well as the public GPT outcome to come up with the best result from a business outcome perspective that can help these data scientists solve these very difficult data problems. And it's all about those business outcomes. Let's double click on Gen AI for a second and data-centric protection. We talked about data-centric protection, how it's challenging to implement. Does Gen AI's utility get impacted there or isn't enhanced? It absolutely gets impacted if it's not done right because one of the things about Gen AI that is really scary is it lies and it lies with authority. So you want to make sure that you protect the data as it is ingested. So Baffle plays a role in two different aspects of this pipeline. It plays a role as the data is ingested. Like I mentioned before, we've figured out how to tokenize data in-flight as it is ingested into these cloud environments. And then the data is used to train the model within that particular environment. And then when it comes to the query, the query can actually be the source of leakage where if you allow uncontrolled query, you can pretty much get any secret to be revealed. So what we do is on the consumption side, we have another control which is where we can make sure that no sensitive data is leaked. Finally, the most important aspect of this is we ensure with our privacy enhanced computation techniques to make sure that the data that is being ingested is analyzed accurately. So that's where we have a way of making sure that the data is processed without actually transforming it in any way. We're not adding any random noise to it or anything like that. It is the original data except that when it's being processed, it is not revealed. So those are the three steps, ingestion, consumption and actual processing. And at all times, we make sure that we enforce data-centric protection with data-centric policies to make sure that the sensitive data is never stored or processed in its original form. Got it. So important for customers across industries, across regions. Last question for you, Amish, as the cybersecurity landscape changes daily, you talked about breaches and hackers getting in, it's, nobody wants to be the next headline, but as that landscape changes, what are some of the things that we can look for baffle to help customers combat in terms of cybersecurity threats and challenges? Well, what we like to say is it's time to change the model, right? We're always looking at security from the outside in. We're always looking for things that are happening and when bad things happen, we want to alert somebody. What we want to do is to turn that model on its head and build security into the data pipeline itself so that the data is never exposed when it's analyzed. So when bad things happen, that data is always protected. When it's leaked, when it's stolen, you're only losing encrypted or tokenized data. So that's where we see the next level of controls coming in. That's why we like to call it data-centric protection and the industry is starting to wake up to that. We're seeing a lot of awareness among the cloud service providers. They always evangelize the fact that the shared responsibility model is very clear about who has a responsibility for the data itself and now they're providing controls. They're providing ability for the data owners to use tools like baffles in the cloud so that they can always maintain control of their data on infrastructure that they don't control. That's a great point. Maintaining control of the data on infrastructure that they don't control. Where can organizations and interested prospects go amish to get their hands on baffle a test drive or anything like that? Absolutely. Just go to AWS Marketplace and search for baffle. We have multiple listings depending on your environment but it's Postgres databases but it's a Redshift data warehouse. We have built an integration with data migration service from Amazon as well as key management services. So it's all out of the box. There is a 30-day free trial. We are welcome to try that out and then we're welcome to contact us as well. There is a way to get to us to get to our documentation. And of course, AWS sales people as well as solution architects are very well-worsed. We have authored multiple blogs with them and they would be very happy to help you migrate securely to the cloud and continue to maintain utility of that data so that you are not losing anything when you are adopting controls like encryption and tokenization. Right, not losing control but being able to really mine that data for those rich insights to help organizations make really data-driven decisions based on those insights. Amish, we appreciate you coming back on theCUBE sharing with us what Baffle is doing from a data-centric protection perspective what you're enabling customers to do the challenges you're helping them erase and where prospects can go to get their hands on it. We appreciate you taking the time today. Well, thank you for the opportunity, Lisa. It was a very enjoyable discussion. Likewise, my pleasure. We want to thank you for watching theCUBE and reminding you can find all of our on-demand content right here on theCUBE.net editorial content on siliconangle.com. We want to thanks you again for watching. We're watching theCUBE, the leader at Hypertech Event Coverage.