 of AWS Public Sector Summit here in person in Washington D.C. for two days live. Finally, a real event. I'm John Furrier, your host of theCUBE. We've got a great guest, Howard Levenson from Databricks, regional vice president, general manager of the federal team for Databricks. Super unicorn. Is it a Deca-corn yet? It's not yet public, but welcome to theCUBE. I don't know what the next stage after unicorn is, but we're growing rapidly. Thank you, Jerry. Our audience knows Databricks extremely well. Ollie's been on theCUBE many times. Even back, we were covering them back when big data was, big data, now it's all data, everything. So we watch your success. Congratulations. Thank you. So there's no, you know, not a big bridge for us across to see you here at AWS Public Sector Summit. Tell us what's going on inside the Databricks Amazon relationship. Yeah, it's been a great relationship. You know, when the company got started some number of years ago, we got a contract with the government to deliver the Databricks capability in their classified cloud, in Amazon's classified cloud. So that was the start of a great federal relationship. Today, virtually all of our business is in AWS and we run in every single AWS environment from commercial cloud to Gov cloud to secret to top secret environments. And we've got customers doing great things and experiencing great results from Databricks and Amazon. The federal government's the classic, I call migration opportunity, right? Because I mean, let's face it, before the pandemic, even five years ago, even 10 years ago, glacier moving speed, slows, slow, then they had to get modernized with the pandemic force really to do it. But you guys have already cleared the runway with your value purposes. You've got Lakehouse now, you guys are really optimized for the cloud. Okay, hard core. Yeah, we are. We only run in the cloud and we take advantage of every single Go fast feature that Amazon gives us. But you know, John, it's the office of management and budget did a study a couple of years ago. I think there were 28,000 federal data centers, 28,000 federal data centers. Think about that for a minute and just think about, like let's say in each one of those data centers, you've got a handful of operational data stores of databases. The federal government is trying to take all of that data and make sense out of it. The first step to making sense out of it is bringing it all together, normalizing it, federating it. And that's exactly what we do. And that's been a real win for our federal clients and it's been a real exciting opportunity to watch people succeed in that endeavor. How are we had another guest on and she said those data center huggers, AKA tree huggers, data center huggers, you know, majority of term, people won't let go. Yeah. So, but this slowly, you know, dying away and moving on to the cloud. So migration is huge. How are you guys migrating with your customers? Give us an example of how it's working. What are some of the use cases? Yeah, so before I do that, I want to tell you a quick story. I've had the luxury of working with the Air Force Chief Data Officer, Eileen Vadrin. And she is commonly quoted as saying, just remember, as airmen, it's not your data. It's the Air Force's data. So people were data center huggers, now they're data huggers, but all of that data belongs to the government at the end of the day. So how do we help in that? Well, think about all this data sitting in all these operational data stores, they're getting, it's getting updated all the time, but you want to be able to federate this data together and make some sense out of it. So for like an organization like US Citizenship and Immigration Services, they had, I think, 28 different data sources. And they want to be able to pull that data basically in real time and bring it into a data lake. Well, that means doing a change data capture off of those operational data stores, transforming that data and normalizing it so that you can then enjoin it. And we've done that, I think they're now up to 70 data sources that are continually ingested into their data lake. And from there, they support thousands of users doing analysis and reports for the whole visa processing system for the United States, the whole naturalization environment and their efficiency has gone up, I think by their metrics, by 24X. Yeah, I mean, Sandy Carter was just on the queue earlier, she's the Vice President of the Partner Ecosystem here at Public Sector. And I was commenting to her that federal game has changed. It used to be hard to get into, you had to know everybody and you navigate the tripwires and all the subtle hints and the people who were friends and it was like Kogan Dagger. And so people were locked in on certain things. Databases and data now has to be freely available. I know one of the things that you guys are passionate about, and this is kind of hardcore architectural thing, is that you need horizontally scalable data to really make AI work, right? Machine learning works when you have data. How far along are these guys in their thinking? When you talk about customers, because we're seeing progress. How far along are we? Yeah, we still have a long way to go in the federal government. I mean, I tell everybody, I think the federal government's probably four or five years behind what Databricks top clients are doing, but there are clearly people in the federal government that have really ramped it up and are on a par or even exceeding some of the commercial clients. USCIS, CBP, FBI, or some of the clients that we work with that are pretty far ahead. And I'll say, I mentioned a lot about the operational data stores, but there's all kinds of data that's coming in. At USCIS they do these naturalization interviews. Those are captured in real text. So now you want to do natural language processing against them, make sure these interviews are of the highest quality control. We want to be able to predict which people are going to show up for interviews based on their geospatial location and the day of the week and other factors, the weather perhaps. So they're using all of these data types, imagery, text, and structured data all in the Lakehouse concept to make predictions about how they should run their business. That's a really good point. I was talking with Keith Brooks earlier, he's Director of Business Development, go to Market Strategy for AWS Public Sector. He's been there from the beginning. This is the 10th year of GovCloud, right? So we were kind of riffing, but the JPL, NASA JPL, they did production workloads out of the gate. Full mission. So now fast forward today. Cloud native really is available. So like, how do you see the agencies and the government handling replatforming? I get that. But now to do the refacting where you guys have the Lakehouse, new things can happen with cloud native technologies. What's the crossover point for that point? Yeah, I think our Lakehouse architecture is really a big breakthrough architecture. It used to be people would take all of this data, they'd put it in a Hadoop data lake, they'd end up with a data swamp with really not good control or good data quality. And then they would take the data from the data swamp or the data lake and they'd curate it and go through an ETL process and put a second copy into their data warehouse. So now you had two copies of the data, two governance models, maybe two versions of the data, a lot to manage, a lot to control. With our Lakehouse architecture, you can put all of that data in the data lake. With our Delta format, it comes in a curated way. There's a catalog associated with the data so you know what you've got. And now you can literally build an ephemeral data warehouse directly on top of that data and it exists only for the period of time that people need it. And so it's cloud native, it's elastically scalable, it terminates when nobody's using it. We run the whole Center for Medicaid Medicare Services, the whole Medicaid repository for the United States runs in an ephemeral data warehouse built on Amazon S3. You know, that is a huge call out. I want to just unpack that for a second. What you just said, to me, puts an exclamation point on cloud value because it's not your grandfather's data warehouse. It's like, okay, we do data warehouse capability but we're using higher level cloud services, whether it's governance stuff or AI to actually make it work at scale for those environments. I mean, that to me is refactoring. That's not replatforming, just replatforming. That's replatforming in the cloud and then refactoring capability for new advantages. It's really true and now, you know, at CMS, they have one copy of the data. So they do all of their reporting, they've got a lot of congressional reports that they need to do, but now they're leveraging that same data, not making a copy of it for the Center for Program Integrity, for fraud. And we know how many billions of dollars worth of fraud exist in the Medicaid system and now we're applying artificial intelligence and machine learning on entity analytics to really get to the root of those problems. It's a game changer. Yeah, and this is where the efficiency comes in at scale because you start to see, I mean, we always talk on theCUBE about how software's changed, the old days, you put it on the shelf, shelf where they called it, that's our generation. And now you've got the cloud. You didn't know if something was hot or not until the inventory, it's like, we didn't sell through. In the cloud, if you're not performing, it's, you suck, basically. So it's not working. It's really true. It's an instant report card. So now, when you go to the cloud, you think of Data Lake and Data Lake House, what you guys do, and others like Snowflake and who are optimized in the cloud, you can't deny it. And then when you compare it to like, okay, so I'm saving you millions and millions if you're just on one thing, never mind the top line opportunities. So John, years ago, people didn't believe the cloud was going to be what it is. Like pretty much today, the cloud's inevitable. It's everywhere. I'm going to make you another prediction. Go ahead. And you can say you heard it here first. The Data Warehouse is going away. The Lakehouse is clearly going to replace it. There's no need anymore for two separate copies. There's no need for a proprietary storage copy of your data. And people want to be able to apply more than SQL to the data, and a Data Warehouse is just restricted. But what about an ocean house? Yeah. The lake is kind of small, or anything but, unless it's Lake Michigan, it's pretty big. No, I think it's going to go bigger than that. I think we're talking about sky computing. We've been at cloud computing. We're going to, and we're going to do that because people aren't going to put all of their data in one place. They're going to have it spread across different Amazon regions, or Amazon availability zones, and you're going to want to share data. And we just introduced this delta sharing capability. I don't know if you're familiar with it, but it allows you to share data without a sharing server directly from picking up basically the Amazon URLs and sharing them with different organizations. So you're sharing in place, the data actually isn't moving. You've got great governance and great granularity of the data that you choose to share. And data sharing is going to be the next breakthrough. I really love the lake house. We're fairly seeing data. I totally see that. So I totally would align with that and say, I bet with you on that one. The Skynet, Skynet, the sky computing. See, you're taking it away, man. I know, Skynet, I got it. Anything to do with computing in the sky is Skynet. That's terminated. So, but that's real. I mean, I think that's a concept where it's like, you know, what serverless and functions does for servers, you don't have a data. You've got to be able to connect data. Nobody lives in an island. You've got to be able to connect data. And more data, we all know, more data produces better results. So how do you get more data? You connect to more data sources. Well, Howard, great to have you on. Talk about the relationship real quick as we end up here with Amazon. What are you guys doing together? How's the partnership? Yeah, I mean, the partnership with Amazon is amazing. We have, we work, I think probably 95% of our federal business is running in Amazon's cloud today. As I mentioned, John, we run across AWS commercial, AWS Gov cloud, secret environment, C2S. And, you know, we have better integration with Amazon services than, I'll say some of the Amazon services. If people want to integrate with glue or kinesis or SageMaker or Redshift, we have complete integration with all of those. And that's really, it's not just a partnership at the sales level. It's a partnership and integration at the engineering level. Well, I think I'm really impressed with you guys as a company. I think you're an example of the kind of business model that people might have been afraid of, which is being in the cloud, you can have a moat, you have competitive advantage, you can build intellectual property. And John, don't forget, it's all based on open source, open data. Like, almost everything that we've done, we've made available to people, we get 30 million downloads of the Databricks technology just for people that want to use it for free. So, no vendor lock-in, I think that's really important to most of our federal clients and to everybody. Yeah, I've always said competitive advantage scale and choice, right? That's what Databricks. Howard, thanks for coming on theCUBE, appreciate it. Thanks again, John. Cube coverage here in Washington for face-to-face physical event. We're on the ground, of course. We're also streaming a digital for the hybrid event. This is theCUBE's coverage of AWS Public Sector Summit. We'll be right back after this short break.