 Welcome back friends. It's theCUBE live at Snowflake Summit 23 at Caesar's Forum in Las Vegas. Lisa Martin and Dave Vellante here. Continuing day two of theCUBE's coverage of this great event. There's about 12,000 people here and I was looking at some of our content from last year and this is the fifth annual Snowflake Summit. Dave, we're going to be talking about some really important things next with one of our alumni data privacy data residency. All important topics. I mean that's one of the big value propositions of Snowflake, right? You get that promise of governance, security, privacy, the same experience as long as it's, the data's in the Snowflake cloud and that's the trade-off and that's the promise. A huge value prop there. We've got Anju Sharma back with us, co-founder and CEO of Skyflow. Welcome back, Anju. Great to have you on the program. Well, it's awesome to be back again. What a great show, huh? Great show. Bigger than last year. Yes. Feels like we're back, baby. Talk to us a little bit about, for anyone that isn't familiar with Skyflow, what is it that you guys do? Your mission, your vision, the catalyst to launch the company. Give us that backstory. Well, I mean, you know, all of us have gone to an online pharmacy or tried to get vaccinated or go open a bank account and some random person says, give me everything you can about yourself, your passport number, social security number, date of birth, email address, second email address, work email address and we all wonder where it goes. And then we find out every three days in the front page of New York Times, it's gone to the dark web. And the problem is, if you try to protect all of the data at all times in all places, you're not really doing a good job. Our insight working at companies like Salesforce was, you know, this data is super sensitive. You don't take your gold and just buy more cameras for your house, you put it in a bank. So similarly, we think there's a value proposition here for companies to protect their customers' data and do it by using a vault. So the vault architecture essentially helps you do privacy by engineering rather than what I call privacy by design, which is like, let's hope and pray that things happen the right way. So the privacy vault doesn't seem like a nice to have for organization. It seems like these days with privacy, residency, sovereignty rules, compliance issues, dark web, that it's really an essential. Yeah, I mean, it's kind of amazing that this thing didn't exist. You know, I waited seven or eight years, just kept thinking, why hasn't somebody built this product? And turns out there's two reasons. One is some of the encryption and tokenization technology just didn't exist. So if you put data in a vault, and if I can't use it, then I can't actually do a prior authorization on your new user. You can't start using an app. So to use the data, the idea was you have to sort of keep it in its raw format, which is no longer true. And the second breakthrough I think is right here. We didn't have cloud platforms. So if you try to build a data privacy vault in our last generation database architecture, how is it going to all work together? Because of the power of Snowflake platform, now we can do it in a manner that's seamless to the customer so they can have their cake and eat it too. And that's what led to the creation of scaffold. So how does it work? That you don't have to keep it in the vault and you can't actually use the data when necessary. Yes, the secret sauce is something in the polymorphic encryption and tokenization engine. It essentially means that when I take your social security number and phone number and keep it in the vault in an encrypted format, I can still search for you. So imagine you're calling up your pharmacy or airline and they say, hey, we can't see your data, but how do we find your account number? Well, because of the polymorphic encryption tokenization, in real time, they can actually search for you. They can actually send you a text to confirm you. They can do all of those things without the call center guy ever seeing your data. And that's how it works. Okay, one double click. Is that a metadata solution? Perfect question. So if you think about data, right? It ranges from log data to structured unstructured data. And then what I call metadata-ish data. So think about your username passwords. Is that really data? Well, for Octa's mindset, it is data, but it's really very narrow data. It's just really your passwords and username. Similarly, if you're a company with 30 million customers, you may have trillions of rows of data about your customers, but you only have 30 million credit card numbers, only 30 million phone numbers, and only 30 million social security numbers. So that's small amount of data. In the back of the days, we used to call it mastery data. Remember mastery data matters? Yes, MDM. So mastery data was built for other systems, but if you think about it, when I am interesting my bank or my pharmacy with these four pieces of information, I want them to think about it holistically. From the moment I enter the data on a mobile app to a website, to when they perform the background checks, they create an account, and then they obviously move it into databases, then data warehouses, then machine learning, and now Generative AI, you're not going to be able to support those use cases if you let the data float around everywhere. So every time you've seen a keynote today, yesterday, a question Clinema was talking about it, some other conferences going on, same topic comes up again and again, which is how can we do Generative AI while keeping the privacy of the customer? And people talk about how maybe if I can secure it in a private cloud or something, well, that just protects you from the obvious problem of data just being sitting in the open, but in reality, you need boundaries. The call center worker in a bank is allowed to see a different amount of data than my banking client, right? I'm a wealth manager, I may know something about you, maybe you're buying 4,000 shares of Snowflake. That information can't go to Dave because he may then bid on that and make, you know, that would be illegal, but how do you create these boundaries? And this problem's always been there, I used to work at Salesforce and Oracle before this, so I saw this again and again, and the tools people were building were point solutions, right? Can I just encrypt it here? Can I token it, is it there? Can I buy some data security posture management tool? Can I do discovery? And it's just a hodgepodge of tools and it doesn't really solve the fundamental problem. So just as Octa said, look, if you just treated PII in the case identity, in our case PII as metadata, and you isolated it, protected it, and governed it, you can then go on and build all the applications and use cases you want on Snowflake without having to worry about these PII fields. Yesterday at our booth, the CIO of Fidelity was here, right? This company that's been adopting Snowflake across the entire company. They were inventors, yeah. Exactly, me and Shah. And the first thing they did when they were moving to the cloud architecture was they put in an organization encryption strategy because the thinking was to make the democratization of data happen, you need to give access to as many people as you can. Well, you need a hard and fast security and privacy guarantee that that's not going to result in something awful happening. So a lot of leading companies like Fidelity, Netflix, Google have built these in-house data privacy walls over the years. And it's just not a tech that's available to the mid-market or to other enterprises. Not everybody has 60 engineers who can just build this piece of technology. So that's what we did, you know? Okay, so you've got to have some level of granularity, a lot of granularity, obviously. And then the ability to determine who has access to what. How do you assure that second piece? Do you work with identity players? Do you have your own sort of identity player? So, you know, we integrate with Snowflake through their external functions and tokenization API. So it's all seamlessly integrated. You just use Snowflake the way you would. And under the hood it figures out, do you get to see the real phone number or the fake phone number? And then the role information comes for your identity provider. So it could be octapping whatever you're using. But it doesn't stop there. Because you're not just using a PII with your Snowflake, you may also be using it with Salesforce or Zendesk and HubSpot in the workflow context. And then you may be using it with Generative AI, whether it's inside the Snowflake cloud with Nemo and other models, or you may be building something that's purpose specific inside your own environments. Do you have a great customer story that really shines a light on the value that SkyFlow and Snowflake are delivering to customers? Where privacy, residency, the data privacy, both are concerned. Imagine you do based on your... Yes. So we have a lot of amazing customers ranging from IBM to Panasonic to Lenovo. We've been in the market for only two years, but we have some of the largest companies using us. I'll give you a couple of examples. One company, they're in the clinical trials business. So think of a large pharma company. You know, GLP-1 drugs are popular, everybody's trying to lose weight. Well, Eli Lilly has about a year's worth of lead against Pfizer right now. Most analysts will tell you that's about $5 to $10 billion worth of lead because in about three years, GLP-1 drugs will be like $50 a month. Right now, they're $1,500 a month. So to a pharma company running clinical trials six months faster can literally be billions of dollars. This is why all of these pharma companies are building these clinical trial data lakes with the likes of Snowflake. But if you wanna enroll patients in a clinical trial, how do you go about doing that? You may go to a pharmacy and say, hey, can you give me some people who are in insulin so I can enroll them in this trial? Today all of that happens by essentially manual, FTP, trust-based, BAs, and stuff. But using Skype for Technology, Science 37, a clinical trials platform, they're able to do that with what I call anonymous reach. So we assume that if I wanna send you an email, I should be having access to your email address. That's not really true. What if I can call an API that sends your email and only if you opt in do I see you? I can send you a text message. All of us have used Uber and no longer there's a Uber driver, no, our phone number. So imagine you're running a clinical trial instead of doing these very heavy data partnerships. You can have these what I call anonymous reach partnerships. So you can say, hey, help me find 40,000 people who are taking insulin, who'd be willing to participate in a clinical trial, using Skyflow plus Snowflake. You can do all of that analysis anonymously. You can run the workflows, reach out. The 400 people that say yes, you can then enroll them. And frankly, you can even run the clinical trial pseudo-anonymously. And to do that, sometimes the data needs to be global, right? So Snowflake has instances globally, but there are additional requirements around data residency and sovereignty. So if the clinical trial data participant happens to be in China, that data literally can't leave China. Skyflow runs a global network of data privacy walks. So now you can imagine in this end to end scenario where we can run a clinical trial, enrollment, study data, and then collect real world evidence. That's how they price the drugs, by the way. So then you can convince Medicare to pay for these drugs. So all of that requires handling a lot of sensitive data. And that's what companies like Science 37 are doing today with Skyflow. So I'm sure we've got a lot of these conferences and the big theme with the enterprise is, yeah, chat GPT, that's nice, but enterprises need something different. But the reality is there's like a hundred million users on chat GPT, everybody's using it, right? Like a billion a month or something, visitors, it's crazy. So it seems like there's a lot of risk there. Is that something that you've looked into? You have solutions for, or you have thoughts on? Yes, so my initial thoughts actually were dismissive. Three to six months ago, people were like, hey, you guys are in privacy, what about GPT? I was like, you know, you're just paranoid. Like I've used GPT chat. I'm not entering my social security number, what's the problem? And then I started talking to enterprises. I can't name some of them, but the largest financial services retirement company, the largest manufacturer of phones in the world, their customer service people are taking, cutting and pasting emails, which may include your IME ID, which is how you get your password reset, account number, credit card number, and they're not doing it maliciously. They simply want to be able to respond to the customer. Now there's multiple problems. So some of the naive initial solutions were like, well, we can just run the model in our data cloud or our cloud. That's very helpful. However, even within the organization, one customer support agent who enters the data about Dave, and another agent who's talking about your credit card number, that information can't go across an organization boundary. The finance team may know the numbers for this quarter and the marketing team is writing the press release. That information cannot leak before the end of the quarter. So what I discovered was there's a lot of nuance in exactly what this LLM is allowed to say in what context. So now we've released a solution to handle this. It works in two places, at broadly speaking. One is model training, right? So whether you're taking OpenEI and further enhancing it with vector databases or you're building your own model, irrespective of that, you're feeding data to these models. And that data has to be cleaned up and devoid of certain PII. But you can't just throw it away either because I need to be able to say, hey, this guy who showed up the hospital three days ago, later got a heart attack and died, that was a bad thing to happen so my LLM can not predict it. You can't completely remove identifiers either. So that's the modeling part. So we can remove those identifiers in a way that you can still build models that are 100% accurate because we use polymorphic tokenization. And then there is inference time. This is literally what you were doing earlier. Hey, I'm talking to this person. What should be the answer? Same thing. The question may include PII. The answer also needs PII. So if I say, hey, is the health insurance for my mother, Sudha Sharma, born on this date, still viable? You can't just throw away the data. LLM has to find the right information and then somehow we have to protect her data birth while giving her age-appropriate information. So people are dealing with this question at a very, very high level like, hey, my cloud or your cloud? It's not a my cloud, your cloud question. Just like Snowflake, it's not a question of whether you're running it in AWS or Azure. The question is, for each specific use case, how are you protecting the right amount of data without throwing it away? If you throw away the data, you basically thrown away all your proprietary edge. Wow. What a fascinating challenge that you guys are solving. Thank you so much for coming on the program, giving us the update on what's newest SkyFlow onto. It's great to have you, how you're helping customers really tackle the data privacy, data residency problems with Snowflake. Awesome stuff, we'll have to have you back. Thank you. I feel like we're just scratching the surface. For our guest and for Dave Vellante, I'm Lisa Martin. Up next, we're going to be talking about digital transfer projects fail, how you can be part of that 30% and how that balance can shift. You can catch all of our Snowflake content and all of our Kube content on kube.net, all of our analysis and editorial on siliconangle.com. You're watching the Kube, the leader in live tech coverage.