 Surprising many people, including myself, Oracle last year began investing pretty heavily in the MySQL space. Now, those investments continue today. Let me give you a brief history. Last December, Oracle made its first heat wave announcement where it converged OLTP and OLAP together in a single MySQL database. Now, what wasn't surprising was the approach Oracle took. They leveraged hardware to improve the performance and lower the cost. You see when Oracle acquired Sun more than a decade ago, rather than rely on loosely coupled partnerships with hardware vendors to speed up its databases, Oracle set out on a path to tightly integrate hardware and software innovations using its own in-house engineering. So with its first MySQL heat wave announcement, Oracle leaned heavily on developing software on top of an in-memory database technology to create an embedded OLAP capability that eliminates the need for ETLing data from a transaction system into a separate analytics database. Now, in doing so, Oracle is taking a similar approach with its MySQL today as it does for its, or back then, whereas it does for its mainstream Oracle database, it today extends that. And what I mean by that is it's converging capabilities in a single platform. So the argument is this simplifies and accelerates analytics, it lowers the cost and allows analytics, things like analytics to be run on data that is more fresh. Now, as many of you know, this is a different strategy than how, for example, an AWS approaches database where it creates purpose-built database services targeted at specific workloads. These are philosophical design decisions made for a variety of reasons, but it's very clear which direction Oracle is headed. And today, Oracle continues its heat wave announcement cadence with a focus on increase automation. As well, the company's continuing the trend of using clustering technology to scale out for both performance and capacity. And again, that theme of marrying hardware with software. Oracle's also making announcements that focus on security. Hello, we want to welcome to this video exclusive. This is Dave Vellante. We're going to dig into these capabilities. Nipun Agrawal is here. He's VP of MySQL Heat Wave and Advanced Development at Oracle. Nipun has been leading the MySQL and Heat Wave development effort for nearly a decade. He's got 180 patents to his name, about half of which are associated with Heat Wave. Nipun, welcome back to the show. Great to have you. Thank you, Dave. So before we get into the new news, if we could, maybe you could give us all a quick overview of Heat Wave again and what problems you originally set out to solve with it. Sure. So Heat Wave is an in-memory query accelerator for MySQL. Now, as most people are aware, MySQL was originally designed and optimized for transactional processing. So when customers had the need to run analytics, they would need to extract data from the MySQL database into another database and run analytics. With MySQL Heat Wave, customers get a single database which can be used both for transactional processing and for analytics. There is no need to move the data from one database to another database. And all existing tools and applications which are compatible with MySQL continue to work as is. So in-memory query accelerator for MySQL. And this is significantly faster than any version of MySQL database. And also it's much faster than specialized databases for analytics. Yeah, we're going to talk about that. And so obviously, when you made the announcement last December, you had a short core group of early customers and beta customers, but then you opened it up to the world. So what was the reaction once you exposed that to customers? The reaction has been very positive, Dave. So initially we were thinking that there are going to be a lot of customers who are on-premise users of MySQL who are going to migrate to the service. And surely that was the case. But the part which was very interesting and surprising is that we see many customers who are migrating from other cloud vendors, who are migrating from other cloud services to MySQL HeatWave. And most notably, the biggest number of migrations we are seeing are from AWS Aurora and AWS RDS. Interesting. Okay, I wonder if you got other feedback. You obviously responding in a pretty fast cadence here, you know, seven, eight month cadence. What are the feedback that you get or their gaps that customers wanted you to close? Sure, yes. So as customers starting moving in to HeatWave, they found that HeatWave is much faster, much cheaper. And when it's so much faster, they told us that there are some classes of queries which could just not run earlier, which they can now with HeatWave. So it makes their applications richer because they can write new classes of queries which they could not in the past. But in terms of the feedback or enhancement request we got, I would say they were categorized, the number one was automation. That when customers move their database from on premise to the cloud, they expect more automation. So that was the number one thing. The second thing was people wanted the ability to run analytics on larger sizes of data with MySQL HeatWave because they liked what they saw and they wanted us to increase the data size limit which can be processed by HeatWave. Third one was they wanted more classes of queries to be accessed with HeatWave. Initially, when we went out, HeatWave was designed to be an accelerator for analytics queries. But more and more customer style is seeing the benefit of HeatWave beyond just analytics mode was mixed workloads. So that was a third request. And then finally they wanted us to scale to a larger cluster size. And that's what we have done over the last several months that incorporating this feedback which you've gotten from customers. All right, so you're addressing those gaps and thank you for sharing that with us. So I got the press release here. I wonder if we could kind of go through these. Let's start with autopilot. You know, what's that all about? What's different about autopilot? That's right. So MySQL autopilot provides machine learning based automation. So the first difference is that not only is it automating things and as a cloud provider as a service provider, we feel there's a lot of opportunities for us to automate. But the big difference about the approach you have taken with MySQL autopilot is that it's all driven based on the data and the queries, it's machine learning based automation. That's the first aspect. The second thing is, this is all done natively in the server. So we are enhancing the MySQL engine. We are enhancing the HeatWave engine. And that's where all the logic and all the processing resides. In order to do this, we have had to collect new kinds of data. So for instance, in the past, people would collect statistics which are based on just the data. Now we also collect statistics based on queries. For instance, what is the compilation time? What is the execution time? And we have augmented this with new machine learning models. And finally, we have made a lot of innovations a lot of inventions in the process where we collect data in a smart way. We process data in a smart way and the machine learning models we are talking about are also have a lot of innovation. And that's what gives us an edge over what other vendors may try to do. Yeah, I mean, I'm just, again, I'm looking at this mini media press release, auto provisioning, auto parallel load, auto data placement, auto encoding, auto error recovery, auto scheduling. And you know, using a lot of, you know, computer science techniques that are well known first in, first out, auto change propagation. So really focusing on driving that automation for customers. The other piece of it that struck me, and I said this in my intro is, you know, using clustering technology. Clustering technology has been around for a long time as in memory database, but applying it and integrating it. My sense is that's really about scale and performance and taking advantage, of course, you know, cloud being able to drive that scale instantaneously. But talk about scale a little bit in your philosophy there. And why so much emphasis on scalability? Right. So what we want to do is to provide the fastest engine for running analytics. And that's why we do the processing in memory. Now, one of the issues in memory processing is that the amount of data which you're processing has to reside in memory. So when we went out in the version one, given the footprint of the MySQL customers we spoke to, we thought 12 terabytes of processing at any given point in time would be adequate. In the very first month, we got feedback that customers wanted us to process larger amounts of data with heat wave because they really like what they saw and they wanted us to increase. So we have increased that limit from 12 terabytes to 32 terabytes. And in order to do so, we now have a heat wave cluster which can be up to 64 nodes. That's one aspect on the query processing side. Now to answer your question as to why so much of an emphasis, it's because this is something which is extremely difficult to do in query processing. That as you scale the size of the cluster, the kind of algorithms, the kind of techniques you have to use so that you achieve a very high efficiency with a very large cluster, these are not things which are easy to do because what we want to make sure is that as customers have the need for like processing larger amount of data, one of the big benefits customers get by using a cloud as opposed to on-premise is that they don't need to worry about provisioning gear ahead of time. So if they have more data with the cloud, they should be able to like process more data easily. But when they process more data, they should expect the same kind of performance or same kind of efficiency on a larger data size similar to a smaller data size. And this is something traditionally other database vendors have struggled to provide. So this is an important problem. This is a tough engineering problem. And that's why a lot of emphasis on this to make sure that we provide our customers with very high efficiency of processing as they increase the size of the data. You're saying traditionally you'll get diminishing returns as you scale. So sort of as the volume grows, you're not able to take as much advantage or you're less efficient. And you're saying you've largely solved that problem. You're able to use, I mean, people always talk about scaling linearly. And I always skeptical, but you're saying, especially in database, that's been a challenge. You're saying you've solved that problem largely. All right. What I would say is that we have a system which is very efficient, more efficient than like, you know, any of the database we are aware of. So as you said, perfect scaling is hard to achieve, right? I mean, that's a critical limit of scale factor one. That's very hard to achieve. We are now close to 90% efficiency for end-to-end queries. This is not for primitives. This is for end-to-end queries, both on industry benchmarks as well as real world customer workloads. So this 90% efficiency we believe is very good and higher than what many of the vendors provide. Yeah, right. So, you know, not just primitives, the whole end-to-end cycle, I think 0.89, I think it was the number that I saw just to be technically correct there, but that's pretty good. Now, let's talk about the benchmarks. It wouldn't be an Oracle announcement with some benchmarks. So you laid out today in your announcement some pretty outstanding performance and price performance numbers. Particularly, you called out, I feel like it's a badge of honor. If Oracle calls me out, I feel like I'm doing well. You called out Snowflake and Amazon. So maybe you could go over those benchmark results and we could peel the onion on that a little bit. Right. So the first thing to realize is that we want to have benchmarks which are credible, right? So it's not the case that we have taken some specific unique workloads where Heatwave shines. That's not the case, right? What we did was we took industry standard benchmark, which is like, you know, TPCH. And furthermore, we had a third party independent firm do this comparison. So let's first compare with Snowflake. On a 10-terabyte TPCH benchmark, Heatwave is seven times faster and one-fifth the cost. So with this, it is 35 times better price performance compared to Snowflake, right? So seven times faster than Snowflake and one-fifth the cost. So Heatwave is 35 times better price performance compared to Snowflake, not just that. Snowflake only does analytics whereas MySQL Heatwave does both transactional processing and analytics. It's not a specialized database. MySQL Heatwave is a general-purpose database which can do both OLTP analytics whereas Snowflake can only do analytics. So to be 35 times more efficient than a database service which is specialized only for one case, which is analytics, we think it's pretty good. So that's a comparison with Snowflake. So that's, you're using, I presume you got to be using list prices for that, obviously. That is correct. Let's put that into context. I mean, 35x better, you're not going to get that kind of discount, I wouldn't think. That is correct, yes. Okay, what about Redshift? Aqua for Redshift has gained a lot of momentum in the marketplace. How do you compare against that? Right. So we did a comparison with Redshift Aqua, same benchmark, 10 terabytes TPCH. And again, this was done by a third body. Here, Heatwave is six and a half times faster at half the cost. So it's Heatwave is 13 times better price performance compared to Redshift Aqua. In the same thing for Redshift, it's a specialized database only for analytics. So customers need to have two databases, one for transaction processing, one for analytics with Redshift. Whereas with MySQL Heatwave, it's a single database for both. And it is so much faster than Redshift. That again, we feel is pretty remarkable. Now, as you mentioned earlier, you're obviously, I presume, you're not cheating here. You're not including the cost of the transaction processing data store, right? We're ignoring that for a minute and ignoring that you got to move data, ETL it in. We're just talking about like to like. Is that correct? Right. This is extremely fair and extremely generous comparison. Not only are we not including the cost of the source OLTP database, the cost in the case of the Redshift, which I'm talking about, is the cost for one year paid full upfront. So this is the best pricing a customer can get for one year subscription with Redshift. Whereas when I'm talking about Heatwave, this is the pay as you go price. In the third aspect is, this is Redshift when it is completely fully optimized. I don't think anyone else can get much better numbers on Redshift than we have, right? So fully optimized configuration of Redshift, looking at the one year prepay cost of Redshift and not including the source database. Okay. And then speaking of transaction processing database, what about Aurora? You mentioned earlier that you're seeing a lot of migration from Aurora. Can you add some color to that? Right. And this is a very interesting question in a, it was a very interesting observation for us as well. When we did the launch back in December, we had numbers on four terabytes TPCH with Aurora. So if you look at the same benchmark, four terabytes TPCH, Heatwave is 1400 times faster than Aurora at half the cost, which makes it 2800 times better price performance compared to Aurora. So very good number. What we have found is that many customers who are running on Aurora started migrating to Heatwave and these customers had a mix of transaction processing and analytics and their data sizes were much smaller. Even those customers found that there was a significant improvement in performance and reduction in cost when they migrated to Heatwaves. In the announcement today, many of the references are those class of customers. So for that, we decided to choose another benchmark which is called CH benchmark on a much smaller data size. And even for that, even for mixed workloads, we find that Heatwave is 18 times faster, provides over 100 times higher throughput than Aurora at 42% of the cost. So in terms of price performance, again, it is much, much better than Aurora, even for mixed workloads. And then if you consider pure old TP, as you have an application which has only old TP, which by the way, it's like a very, very uncommon scenario. But even if that were to be the case, in that case for pure old TP only, MySQL Heatwave is at par with Aurora with respect to performance, but MySQL Heatwave costs 42% of Aurora. So the point is that in the whole spectrum, pure old TP, mixed workloads or analytics, MySQL Heatwave is going to be a fraction of the cost of Aurora. And depending upon your query workload, your acceleration can be anywhere from 1,400 times to 18 times faster. That's interesting. I mean, you've been at this for the better part of a decade because my sense is that Heatwave is all about OLAP. That's really where you've put the majority if not all of the innovation. But you're saying just coming into December's announcement, you were at par in a rare but hypothetical old TP workload. That is correct. Yeah. Well, you know, I got to push you still on this because a lot of times these benchmarks are a function of the skills of the individuals performing these tests, right? So can I, if I want to run them myself, have you published these benchmarks? What if a customer wants to replicate these tests and try to see if they can tune up, you know, Redshift better than you guys did? Sure. So I'll say a couple of things. One is the numbers which I'm talking about, both for Redshift and Snowflake were done by a third party firm. But all the numbers we are talking about, TPCH as well as CH benchmark, all the scripts are published on GitHub. So anyone is very welcome. In fact, we encourage customers to go and try it for themselves. And they will find that the numbers are absolutely as advertised. In fact, we had a couple of companies like in the last several months who went to GitHub, they downloaded our TPCH scripts and they reported that the performance numbers they were seeing with Heatfill were actually better than we had published back in December. And the reason was that since December we had new code which was running. So our numbers were actually better than advertised. So all the benchmarks are published. They are all available on GitHub. You can go to the Heatfill website on Oracle.com and get the link for it. And we welcome anyone to come and try these numbers for themselves. All right, good. Great, thank you for that. Now, you mentioned earlier that you were somewhat surprised. You saw, not surprised that you got customers migrating from on-prem databases, but you also saw migration from other clouds. How do you expect the trend with regard to this new announcement? Do you have any sense as to how that's going to go? Right. So one of the big changes from December to now is that we have now focused quite a bit on mixed workloads. So in the past in December, when we first went out Heatfill was designed primarily for analytics. Now what we have found is that there's a very large class of customers who have mixed workloads and who also have smaller data sizes. We now have introduced a lot of technology including things like auto scheduling, definitely improvement in performance where MySQL Heatfill is a very superior solution compared to Aurora or other databases out there both in terms of performance as well as price for these mixed workloads and better latency, better throughput, lower cost. So we expect this trend of migration to MySQL Heatfill to accelerate. So we are seeing customers migrate from Azure. We are seeing customers migrate from GCP and by far the number one migrations we are seeing are from AWS. So I think based on the new features and technologies we have announced today, this migration is going to accelerate. All right, last question. So I said earlier, it seems like you're applying what are generally well understood, improved technologies like in memory, you like clustering to solve these problems. And I think about the things that you're doing and I wonder, you know, I mean, these things have been around for a while and why has this type of approach not been introduced by others previously? Right, so the main thing is it takes time, right? That we designed Heatfill from the ground up for the cloud. And as a part of that, we had to invent new algorithms for distributed query processing. For the cloud, we put in the hooks for machine learning processes, machine learning processing right from the ground up. So this has taken us close to a decade. It's been hundreds of person years of investment, dozens of patterns which have gone in. And the other aspect is it takes talent from different areas. So we have like, you know, people working in distributed query processing. We have people who have a lot of like background in machine learning. And then given that we also like other custodians of the MySQL database, we have a very rich set of customers we can reach out to to get feedback from them as to what are the pain points. So a culmination of these strengths which we have, this talent, the customer base and the time, right? So we spent almost close to a decade to make this thing work. So that's what it takes. It takes time, patience and talent. A lot of software innovation bringing together, as I said, that hardware and software strategy. Very interesting. Nipin, thanks so much. Appreciate your insights and coming on this video exclusive. Thank you, Dave. Thank you for the opportunity. My pleasure. And thank you for watching everybody. This is Dave Vellante for theCUBE. We'll see you next time.