 Hello, everyone, and welcome to this special exclusive CUBE conversation where we continue our coverage of the trends of the database market. With me is Nipin Agarwal, who's the Vice President of MySQL Heatwave and Advanced Development at Oracle. Nipin, welcome. Thank you, Dave. I love to have technical people on theCUBE to educate, debate, inform, and we've extensively covered this market. We were all over the Snowflake IPO, and at that time, I remember I challenged organizations. Bring your best people, because I want to better understand what's happening at database. After Oracle kind of won the database wars, 20 years ago, database kind of got boring, and then it got really exciting with the big data movement and all the not only SQL stuff coming out and Hadoop and blah, blah, blah, and now it's just exploding. You're seeing huge investments from many of your competitors. VCs are trying to get into the action, and meanwhile, as I've said many, many times, your chairman and head of technology, CTO Larry Ellison, continues to invest to keep Oracle relevant, so it's really been fun to watch and I really appreciate your coming on. Sure thing. We have written extensively, we've talked to a lot of Oracle customers. You've got the leading mission critical database in the world. Everybody from Fortune 100, we've evaluated what Gartner said about the operational databases, so I think there's not a lot of question there. And we've written about that on Wikibon, about your converged databases and the strategy there, and we're going to get into that. We've covered Autonomous, Data Warehouse, Exadata Cloud and Customer, and then we just want to really kind of get into your area, which has been kind of caught our attention recently. And I'm talking about the MySQL database service with HeatWave, I love the name, I laugh. It was unveiled, I don't know, a few months ago. So Nipin, let's start the discussion today. Maybe you can update our viewers on what is HeatWave, what's the overall focus with Oracle and how does it fit into the cloud database service? Sure, Dave. So HeatWave is an in-memory query accelerator for the MySQL database service for speeding up analytic queries as well as long-running complex OTP queries. And this is all done in the context of a single database, which is the MySQL database service. Also, all existing MySQL applications or MySQL compatible tools and applications continue to work as is, so there is no change. And with this HeatWave, Oracle is delivering the only MySQL service which provides customers with a single unified platform for both analytic as well as transaction processing workloads. Okay, so we've seen open-source databases in the cloud growing very rapidly, I mentioned Snowflake, I think Google's big query, you know, get some mention. We'll maybe talk more about Redshift later on, but I'm wondering, well, let's talk about now. How does MySQL HeatWave service, how does that compare to MySQL-based services from other cloud vendors? So I can get MySQL from others. In fact, I think we do, I think we run Wikibon in the LAMP stack, I guess we run it on Amazon, but so how does your service compare? No other vendor, like no other vendor offers this differentiated solution with an open-source database, namely having a single database which is optimized both for transaction processing and analytics, right? So the example is like MySQL, lot of other cloud vendors provide a MySQL service, but MySQL has been optimized for transaction processing. So when customers need to run analytics, they need to move the data out of MySQL into some other database for running analytics, right? So we are the only vendor which is now offering this unified solution for both transaction processing analytics. That's the first point. Second thing is most of the vendors out there have taken open-source databases and they're basically hosting it in the cloud, whereas HeatWave has been designed from the ground up for the cloud. And it is 100% compatible with MySQL applications, right? And the fact that we have designed it from the ground up for the cloud, where we spend hundreds of personal years of research and engineering, means that we have a solution which is very, very scalable. It's very optimized in terms of performance and it is very inexpensive in terms of the cost. But wait, are you saying that you essentially rewrote MySQL to create HeatWave, but at the same time maintained compatibility with existing applications? Right, so we enhanced MySQL significantly and we wrote a whole bunch of new code which is brand new code optimized for the cloud in such a manner that yes, it is 100% compatible with all existing MySQL applications. What does it mean, Nipun, to optimize for the cloud? I mean, I hear that and I say, okay, it's taking advantage of cloud native. I hear kind of buzzwords, cloud first, cloud native. What does it specifically mean from a technical standpoint? Right, so first let's talk about performance. What we have done is that we have looked at two aspects. We have worked with shapes like for instance, like you know the compute shapes which provide the best performance for dollar, per dollar. So I'll give you a couple of examples. We have optimized for certain shapes. So HeatWave is an in-memory query accelerator. So the cost of the system is dominated by the cost. So we are working with shapes which provide the cheapest cost per terabyte of memory. Secondly, we are using commodity cloud services in such a manner that it's optimized for both performance as well as performance per dollar. So an example is we're not using any locally attached SSDs. We use ObjectStore because it's very inexpensive. And then I guess at some point we'll get into the details of the architecture. The system has been really, really designed for massive scalability. So as you add more compute, as you add more service, the system continues to scale almost perfectly linearly. So this is what I mean in terms of being optimized for the cloud. And furthermore, just where I will take, like over the next few months, you will see a bunch of other announcements where we are adding a whole bunch of machine learning and data-driven based automation which we believe is critical for the cloud. So optimized for performance, optimized for the cloud and machine learning based automation which we believe is critical for any good cloud-based service. All right, I want to come back and ask you more about the architecture but you mentioned some of the others taking open source databases and shoving them into the cloud. Let's take the example of AWS. They have a series of specialized data stores and for different workloads. Aurora is for OLTP. I actually think it's based on MySQL, Redshift which is based on Par Excel. And I have asked Amazon about this and their response actually kind of made sense to me. Look, we want the right tool for the right job. We want access to the primitives because when the market changes, we can change faster as opposed to if we start building bigger and bigger databases with more functionality, we're not as agile. So that kind of made sense to me. I know we, again, we use a lot. We use, I think I said MySQL and Amazon we're using DynamoDB, you know, works, that's cool. We're not huge. You know, and I, we fully admit and we've researched this when you start to get big that starts to get maybe expensive. But what do you think about, you know, that approach and why is your approach better? Right, we believe that there are multiple drawbacks of having a different databases or different services one optimize for transactional processing and one for analytics and having to ETL between these different services. First of all, it's expensive because you have to manage different databases. Secondly, it's complex from an application standpoint applications need now need to understand the semantics or two different databases. It's inefficient because you have to transfer data at some periodicity from one database to the other one. It's not secure because there is a security aspects involved in your transferring data and also the identity of users in the two different databases is different. So it's the approach which has been taken by Amazon and such we believe is more costly, complex, inefficient and not secure. Whereas with heat wave, all the data resides in one database which is MySQL and you can run both transactional processing and analytics. So in addition to all the benefits I talked about customers can also make their decisions in real time because there is no need to move the data. All the data resides in a single database. So as soon as you make any changes, those changes are visible to customers for queries right away, which is not the case when you have different siloed specialized databases. Okay, I mean, a lot of ways to skin the cat and what you just said makes sense. By the way, we were saying before companies have taken off the shelf or open source databases shoved into the cloud. I have to give Amazon some props. They actually have done engineering to Aurora and Redshift and they've got the engineering capabilities to do that. But you can see, for example, in Redshift the way they handle separating compute from storage. It's maybe not as elegant as some of the other players like a snowflake, for example, but they get there and maybe it's a little bit more brute force. So I don't want to just make it sound like they're just hosting off the shelf in the cloud. But is it fair to say that there's a crossover point? So in other words, if I'm smaller and I'm not doing a bunch of big, like us, I mean, it's fine. It's easy. I spin it up. It's cheaper than having to host my own servers. So presumably there's a sweet spot for that approach and a sweet spot for your approach. Is that fair or do you feel like you can cover a wider spectrum? We feel we can cover the entire spectrum, not wider, the entire spectrum. And we have benchmarks published which are actually available on GitHub for anyone to try. You will see that this approach which you have taken with the MySQL database service and heat wave, we are faster, we are cheaper without having to move the data. And the mileage or the amount of improvement you will get surely varies. So if you have less data, the amount of improvement you will get maybe like say 100 times or 500 times with smaller data sizes. If you get to larger data sizes, this improvement amplifies to 1,000 times or 10,000 times. And similarly for the cost, if the data size is smaller, the cost advantage you will have is less. Probably maybe my SQL heat wave is one-third the cost. If the data size is larger, the cost advantage amplifies. So to your point, my SQL database service and heat wave is going to be better for all sizes. But the amount of mileage, the amount of benefit you will get increases as the size of the data increases. Okay, so you're saying you got better performance, better cost, better price performance. Let me just push back a little bit on this because having been around for a while, I often see these performance and price comparisons and what often happens is a vendor will take the latest and greatest, the one they just announced and they'll compare it to an N minus one or an N minus two running on old hardware. So you're normalizing for that. Is that the game you're playing here? I mean, how can you give us confidence that this is a kind of legitimate benchmarks in your GitHub? Absolutely. I'll give you a bunch of information, but let me preface this by saying that all of our scripts are available in the open source in the GitHub repo for anyone to try and we would welcome feedback otherwise. So we have taken, yes, the latest version of my SQL database service and heat wave, we have optimized it and we have run multiple benchmarks. For instance, TPC-H, TPC-DS, because the amount of improvement a query will get depends upon the specific query, it depends upon the predicates, it depends upon the selectivity. So we just wanted to use standard benchmarks. So it's not the case that if you're using certain classes of queries which may benefit heat wave more, right? So standard benchmarks. Similarly, for the other vendors or other services like Redshift, we have run benchmarks on the latest shapes of Redshift, the most optimized configuration which they recommend, running their scripts. So this is not something that, hey, we're just running out of the box. We have optimized Aurora, we have optimized Refresh to the best possible extent we can based on their guidelines, based on their latest release. And that's what we're talking about in terms of the numbers. All right, good, please continue. Now, for some other vendors, and if you get to the benchmark sections and talk about, we are comparing with other services, let's say Snowflake. Well, there are issues in terms of we can't legally run Snowflake numbers, right? So there we have looked at some reports published by GigaOM Report and we are taking the numbers published by the GigaOM Report for Snowflake, Google BigQuery and Azure Synapse numbers, right? So those we have not run ourselves, but for AWS Redshift, as well as AWS Aurora, we have run the numbers and I believe these are the best numbers anyone can get. I saw that GigaOM Report. And I got to say, GigaOM sometimes I'm like, but I got to say that I forget the guy's name. He knew what he was talking about. He did a good job, I thought. I was curious as to the workload. I always say, well, what's the workload? But I thought that report was pretty detailed. And Snowflake did not look great in that report. Oftentimes, and they'd been marketing the heck out of it. I forget who sponsored it. It was sponsored content. But I remember seeing that and thinking, hmm, so I think maybe for Snowflake, that sweet spot is not maybe not that performance. Maybe it's the simplicity and I think that's where they're making their mark. And most of their databases are small, right? And a lot of read-only stuff. And so they've found a market there. But I want to come back to the architecture and really sort of understand how you've been able to get this range of both performance and cost. You talked about, I thought I heard, you optimizing the chips, you're using Object Store. You've got an architecture that's not using SSD. It's using Object Store. So is there caching there? I wonder if you could just give us some details of the architecture and tell us how you got to where you are. Right. So let me start off saying like, what are the kind of numbers we are talking about, right? Just to kind of be clear, like what the improvements are. So if you take the MySQL database service and HeatWave in Oracle Cloud and compare it with MySQL service in any other cloud. And if you look at smaller data sizes, say data sizes which are about half a terabyte or so. Okay. HeatWave is 400 times faster. 400 times faster. And as you get to... I'm sorry, sorry to interrupt. What are you measuring there? Faster in terms of what? Latency. So we take TPCH 22 queries, we run them on HeatWave and we run the same queries on MySQL service on any other cloud, half a terabyte. And the performance in terms of latency is 400 times faster with HeatWave. Thank you, okay. If you go to a larger data size, right? Then the other data point we were looking at, say something like, you know, four terabytes. There, we did two comparisons. One is with AWS Aurora, which is, as you said, they have taken MySQL, they have done a bunch of innovations over there and they're offering it as a premier service. So on four terabytes, TPCH, MySQL database service with HeatWave is 1100 times faster than Aurora. It is three times faster than the fastest shape of Redshift. So Redshift comes in different flavors. I'm talking about dense compute too, right? And again, looking at the most recommended configuration from Redshift. So 1100 times faster than Aurora, three times faster than Redshift and at one-third the cost, right? So this is, I just really want to point out that it is much faster and much cheaper, one-third the cost. And then going back to the Gigaum report, there was comparison then with Snowflake, Google BigQuery, Redshift, and Joe Synapse. I won't go into the numbers here, but HeatWave was faster on both TPCH as well as TPCDS across all these products and cheaper compared to any of these products, right? So faster, cheaper on both the benchmarks across all these products. Now let's come to like, what is the technology underneath? Great. So basically there are three parts that you're gonna see. One is improved performance, very good scale and improved on lower cost. So the first thing is that HeatWave has been optimized and for the cloud. And when I say that, and we talked about this a bit earlier, one is we are using the cheapest shapes which are available. We are using the cheapest services which are available without having to compromise the performance. And then there is this machine learning based automation. Now underneath, in terms of the architecture of HeatWave, there are basically, I would say four key things. First, HeatWave is an in-memory engine. The representation which we have in memory is a hybrid columnar representation which is optimized for vector processing. That's the first thing. And that's pretty table stakes these days for anyone who wants to do in-memory analytics, right? Except that it's hybrid columnar which is optimized for vector processing. So that's the first thing. The second thing which starts getting to be novel is that HeatWave has a massively parallel architecture which is enabled by a massively partitioned architecture. So we take the data, we read the data from MySQL into the memory of the HeatWave and we massively partition this data. So as we're reading the data, we are partitioning the data based on the workload. The sizes of these partitions is such that it fits in the cache of the underlying processor. And then we are able to consume these partitions really, really fast. So that's the second bit which is like massively parallel architecture enabled by massively partitioned architecture. Then the third thing is that we have developed new state-of-art algorithms for distributed query processing. So for many of the workloads we find that joins other long port in terms of the amount of time it takes. So we at Oracle have developed new algorithms for distributed join processing and similarly for many other operators. And this is how we are able to consume this data or process this data which is in memory really, really fast. And finally, what we have is that we have an eye for scalability and we have designed algorithms such that there's a lot of overlap between compute and communication which means that as you're sending data across various nodes and there it would be like, you know, dozens of nodes or hundreds of nodes that we are able to overlap the computation time with the communication time and this is what gives us massive scalability in the cloud. Yeah, so some hardcore database techniques that you've brought to HeatWave. That's impressive. Thank you for that description. Let me ask you just to go to a quick aside. So my SQL open source HeatWave is what? Is it like a open core? Is it open source? No, so HeatWave is something which has been designed and optimized for the cloud, right? So it can't be open source and it is not open source. It's a service. It is a service, that's correct. So it's a managed service that I pay Oracle to host for me. Okay, got it. That's right. Okay, I wonder if you could talk about some of the use cases that you're seeing for HeatWave. Any patterns that you're seeing with customers? Sure. So we've had this service, we had this HeatWave service in limited availability for almost 15 months and it's been about five months since we have gone to it. And there's a very interesting trend of customers we are seeing. The first one is we are seeing many, many migrations from AWS specifically from Aurora. Similarly, we are seeing many migrations from Azure MySQL, we are seeing migrations from Google. And the number one reason customers are coming is because of ease of use. Because they have their databases currently siloed as you were talking about some optimized for transactional processing, some for analytics. Here what customers find is that in a single database they're able to get very good performance. They don't need to move the data around. They don't need to manage multiple databases. So we are seeing many migrations from these services and the number one reason is reduced complexity of ease of use. And the second one is much better performance and reduced cost. So that's the first thing that we are very excited and delighted to see the number of migrations we are getting. The second thing which we are seeing is initially when we had the service launch we were targeting really towards analytics. What now what we are finding is many of these customers for instance who have been running on Aurora when they are moving to MySQL and HeatWave they are finding that many of the old TP queries as well are seeing significant acceleration with the HeatWave. So now customers are moving their entire applications over to HeatWave. So that's a second trend we are seeing. The third thing and I think I kind of missed mentioning this earlier, one of the very key in unique value propositions we provide with the MySQL database service and HeatWave is that we provide a mechanism where if customers have their data stored on premise they can still leverage the HeatWave service by enabling MySQL replication. So they can have their data on premise. They can replicate this data in the Oracle Cloud and then they can run analytics. So this deployment which we are calling the hybrid deployment is turning out to be very, very popular because there are customers, there are some customers who for various reasons compliance or regulatory reasons cannot move the entire data to the cloud or migrate the data to the cloud completely. So this provides them a very good setup where they can continue to run their existing database and when it comes to getting benefits of HeatWave for query acceleration, they can set up this replication. And I can run that on any what, any available server capacity or is there appliance to facilitate that? No, this is just standard MySQL replication. So if a customer is running MySQL on premise they can just turn off this replication we have obviously enhanced it to support this inbound replication between on premise and Oracle Cloud, but something that can be enabled as long as source and destination are both MySQL. Okay, so I want to come back to this sort of idea of the architecture a little bit. I mean, it's hard for me to go toe to toe with, you know, I'm not an engineer, but I'm going to try anyway. So you've talked about OLTP queries. I always thought HeatWave was optimized for analytics. But so, you know, I want to push on this notion because you know, people think of this the converged database and what you're talking about here with HeatWave is sort of the Swiss Army knife, which is great because you got a screwdriver and you got a, you know, Phillips in a flat head and some scissors, maybe they're not as good. They're not as good necessarily as the purpose built tool. But you're arguing that this is best to breed for OLTP and best to breed for analytics both in terms of performance and cost. Am I getting that right? Or is this really a Swiss Army knife where, you know, that flat head is really not as good as the big long screwdriver that I have in my bag. Yes, so you're getting it right, but I want to make a clarification that HeatWave is definitely the accelerator for all your queries, all analytic queries and also for the long running complex transaction processing queries, right? So yes, HeatWave is the Uber query accelerator engine. However, when it comes to transaction processing in terms of your insert statements, delete statements, right? Those are still all done and served by the MySQL database, right? So all of the transactions are still sent to the MySQL database and they're persisted there. It's the queries for which HeatWave is the accelerator. So what you said is correct. For all query acceleration, HeatWave is the engine. Makes sense. Okay, so if I'm a MySQL customer and I want to use HeatWave, what do I have to do? Do I have to make changes to my existing applications? You applied earlier that no, it's just sort of, you know, plugs right in, but can you clarify that? Yes, there are absolutely no changes which any MySQL or MySQL compatible application needs to make to take advantage of HeatWave. HeatWave is an in-memory accelerator and it's completely transparent to the application. So we have like dozens and dozens of applications which have migrated to HeatWave and they're seeing the same thing, similarly tools. So if you look at various tools which work for analytics like Tableau, Looker, Oracle, and XCloud, all of them will work just seamlessly. And this is one of the reasons we had to do a lot of heavy lifting in the MySQL database itself, right? So the MySQL database engineering team has been very actively working on this. And one of the reasons is because we did the heavy lifting and we made enhancements to the MySQL optimizer in the MySQL storage layer to do the integration of HeatWave in such a seamless manner. So there is absolutely no change which an application needs to make in order to leverage or benefit from HeatWave. You said earlier, Nipun that you're seeing migrations from, I think you said Aurora and Google BigQuery, you might have said Redshift as well. Do you, what kind of tooling do you have to facilitate migrations? Right, now there are multiple ways in which customers may want to do this, right? So the first tooling which we have is that customers, as I was talking about the replication or the inbound replication mechanism, customers can set up HeatWave in the Oracle cloud and they can send the data, they can set up replication within their instances in their cloud and HeatWave. Second thing is we have various kinds of tools to like facilitate the data migration in terms of like, fast ingestion such. So there are a lot of, such customers we are seeing who are kind of migrating and we have a plethora of like tools and applications in addition to like, setting up this inbound replication, which is the most seamless way of getting customers started with HeatWave. So, I think you mentioned before, I know machine intelligence and machine learning, we've seen that with Autonomous Database, it's a big deal obviously. How does HeatWave take advantage of machine intelligence and machine learning? Yeah, and I'm probably going to be talking more about this in the future, but what we have already is that HeatWave uses machine learning to intelligently automate many operations. So we know that when there's a service being offered in the cloud, our customers expect automation and there are a lot of vendors and a lot of services which do a good job with automation. One of the places where we are going to be very unique is that HeatWave uses machine learning to automate many of these operations. And I'll give you one such example, which is provisioning, right now with HeatWave when a customer wants to determine how many nodes are needed for running their workload, they don't need to make a guess. They invoke a provisioning advisor, and this advisor uses machine learning to sample a very small percentage of the data. We are talking about like 0.1% sampling, and it's able to predict the amount of memory with 95% accuracy, which this data is going to take. And based on that, it's able to make a prediction of how many servers are needed. So just a simple operation, the first step of provisioning, this is something which is done manually on any other service, whereas with HeatWave, we have machine learning-based advisor. So this is an example of what we're doing, and in the future, we'll be offering many such innovations as a part of the MySQL database in the HeatWave service. Well, I got to say, I was a skeptic, but I really appreciate your answering my questions. And a lot of people, when you made the acquisition and inherited MySQL, thought you were going to kill it because it thought it would be competitive to Oracle database. I'm happy to see that you've invested and figured out a way to, hey, we can serve our community and continue to be the steward of MySQL. So Nipun, thanks very much for coming on theCUBE. Appreciate your time. Sure. Thank you so much for the time, Dave. Appreciate it. And thank you for watching, everybody. This is Dave Vellante with another CUBE Conversation. We'll see you next time.