 When Oracle acquired Sun in 2009, it paid $5.6 billion net of Sun's cash and debt. Now I argued at the time that Oracle got one of the best deals in the history of enterprise tech. And I got a lot of grief for saying that because Sun had a declining business that was losing money and its revenue was under serious pressure as it tried to hang on for dear life. But Safra Katz understood that Oracle could pair Sun's lower profit and lagging businesses like its low index 86 product lines. And even if Sun's revenue was cut in half, because Oracle has such a high revenue multiple as a software company, it could almost instantly generate $25 to $30 billion in shareholder value on paper. In addition, it was a catalyst for Oracle to initiate its highly differentiated engineering systems business and was actually the precursor to Oracle's cloud. Oracle saw that it could capture high margin dollars that used to go to partners like HP, its original Exadata partner and get paid for the full stack across infrastructure, middleware, database and application software when it eventually got really serious about cloud. Now there was also a major technology angle to this story. Remember Sun's tagline, the network is the computer, they should have just called it cloud. Through the Sun acquisition, Oracle also got a couple of key technologies, Java, the number one programming language in the world and MySQL, a key ingredient of the LAMP stack. That's Linux, Apache, MySQL and PHP, Pearl or Python on which the internet is basically built and is used by many cloud services like Facebook, Twitter, WordPress, Flickr, Amazon Aurora and many other examples, including by the way MariaDB, which is a fork of MySQL created by MySQL's creator basically in protest to Oracle's acquisition. The drama is Oscar worthy. It gets even better. In 2020, Oracle began introducing a new version of MySQL called MySQL Heatwave. And since late 2020 has been, it's been sort of in sort of a super cycle rolling out three new releases in less than a year and a half in an attempt to expand its TAM and compete in new markets. Now we covered the release of MySQL Autopilot which uses machine learning to automate management functions. And we also covered the bench marketing that Oracle produced against Snowflake, AWS, Azure and Google and Oracle's at it again with Heatwave adding machine learning into its database capabilities along with previously available integrations OLAP and OLTP. This of course is in line with Oracle's converged database philosophy which as we've reported is different from other database, cloud database providers, most notably Amazon which takes the right tool for the right job approach and chooses database specialization over a one size fits all strategy. Now we've asked Oracle to come on theCUBE and explain these moves. And I'm pleased to welcome back Nipan Agarwal who's the Senior Vice President for MySQL Database and Heatwave at Oracle and today in this video exclusive we'll discuss machine learning, other new capabilities around elasticity and compression and then any benchmark data that Nipan wants to share. Nipan's been a leading advocate of the Heatwave program. He's led engineering and that team for over 10 years and he has over 185 patents in database technologies. Welcome back to the show Nipan, great to see you again. Thanks for coming on. Thank you Dave, very happy to be back. Yeah, now for those who may not have kept up with the news maybe to kick things off you could give us an overview of what MySQL Heatwave actually is so that we're all on the same page. Sure Dave, MySQL Heatwave is a fully managed MySQL database service from Oracle and it has a built-in query accelerator called Heatwave and that's the part which is unique. So with MySQL Heatwave customers of MySQL get a single database which they can use for transaction processing for analytics and for mixed workloads because traditionally MySQL has been designed and optimized for transaction processing. So in the past when customers had to run analytics with the MySQL based service they would need to move the data out of MySQL into some other database for running analytics. So they would end up with two different databases and it would take some time to move the data out of MySQL into this other system. With MySQL Heatwave we have solved this problem and customers now have a single MySQL database for all their applications and they can get a good performance of analytics without any changes to their MySQL application. Now it's no secret that a lot of times queries are not most efficiently written and critics of MySQL Heatwave will claim that this product is very memory and cluster intensive it has a heavy footprint that adds to cost. How do you answer that Nippon? Right, so for offering any service any database service in the cloud there are two dimensions, performance and cost. And we have been very cognizant of both of them. So it is indeed the case that Heatwave is an in-memory query accelerator which is why we get very good performance but it is also the case that we have optimized Heatwave for commodity cloud services. So for instance we use the least expensive compute we use the least expensive storage. So what I would suggest is for the customers who kind of would like to know what is the price performance advantage of Heatwave compared to any database we have benchmark against Redshift, Snowflake, Google BigQuery, Azure Synapse. Heatwave is significantly faster and significantly lower price on a multitude of workloads. So not only is it in-memory database and optimized for that but we have also optimized it for commodity cloud services which makes it much lower price than the competition. Well, at the end of the day it's customers that sort of decide what the truth is. So to date what's been the customer reaction? Are they moving from other clouds from on-prem environments? Both why, what are you seeing? Right, so we are definitely seeing a whole bunch of migrations of customers who are running MySQL on-premise to the cloud to MySQL Heatwave. That's definitely happening. What is also very interesting is we are seeing that a very large percentage of customers more than half the customers who are coming to MySQL Heatwave are migrating from other clouds. We have a lot of migrations coming from AWS Aurora, migrations from Redshift, migrations from RDS MySQL, Teredata, SAP HANA. So we are seeing migrations from a whole bunch of other databases and other cloud services to MySQL Heatwave. And the main reason we are told why customers are migrating from other databases to MySQL Heatwave are lower cost, better performance and no change to the application because many of these services like AWS Aurora are API compatible with MySQL. So when customers try MySQL Heatwave not only do they get better performance at a lower cost but they find that they can migrate the application without any changes. And that's a big incentive for them. Great, thank you Nipini. So can you give us some names? Are there some real world examples of these customers that have migrated to MySQL Heatwave that you can share? Oh absolutely, I'll give you a few names. Astuta.com, this is an educational SaaS provider raised out of Brazil. They were using Google BigQuery and when they migrated to MySQL Heatwave they found a 300X, 300 times improvement in performance and it lowered their cost by 85 times. Another example is New Era. They offer cybersecurity solutions and they were running their application on an on-premise version of MySQL. When they migrated to MySQL Heatwave their application improved in performance by 300 times and their cost reduced by 80%. So by going from on-premise to MySQL Heatwave they reduced the cost by 80% improved performance by 300 times. VR Glass, another customer base out of Brazil they were running on AWS EC2. And when they migrated with the novres they found that there was a significant improvement like an over 5X improvement in database performance and they were able to accommodate a very large virtual event which had more than a million visitors. Another example, Genius Sonority. They are a game designer and operator in Japan and when they moved to MySQL Heatwave they found a 90X improvement in performance and many, many more. Like a lot of migrations again from like, you know, Aurora they shipped many of the databases as well and consistently what we hear is getting much better performance at a much lower cost without any change to the application. Great, thank you. You know, when I ask that question a lot of times I get, well, I can't name the customer name but I got to give Oracle credit. A lot of times you guys have your fingertips. So you're not the only one but it's somewhat rare in this industry. So okay, so you got some good feedback from those customers that did migrate to MySQL Heatwave. What else did they tell you that they wanted? Did they kind of share a wish list and some of the white space that you guys should be working on? What did they tell you? Right. So as customers are moving more data into MySQL Heatwave as they're consolidating more data into MySQL Heatwave customers want to run other kinds of processing with this data. A very popular one is machine learning. So we have had multiple customers who told us that they wanted to run machine learning with data which is stored in MySQL Heatwave. And for that, they have to extract the data out of MySQL Heatwave. So that was the first feedback we got. Second thing is MySQL Heatwave is a highly scalable system. What that means is that as you add more nodes to a Heatwave cluster, the performance of the system improves almost linearly. But currently customers need to perform some manual steps to add nodes to a cluster or to reduce the cluster size. So that was other feedback we got that people wanted this thing to be automated. Third thing is that we have shown in the previous results that Heatwave is significantly faster and significantly lower price compared to competitive services. So we got feedback from customers that can be trade off some performance to get even lower cost. And that's what we have looked at. And then finally, we have some results on various data sizes with TPC ads. Customers wanted to see if we can offer some more data points as to how does Heatwave perform on other kinds of workloads? And that's what we've been working on for the last several months. Okay, again, we're going to get into some of that. But so how did you go about addressing these requirements? So the first thing is we are announcing support for in database machine learning, meaning that customers who have their data inside my SQL Heatwave can now run training, inference, and prediction all inside the database without the data or the model ever having to leave the database. So that's how we address the first one. Second thing is we are offering support for real-time elasticity, meaning that customers can scale up or scale down to any number of nodes This requires no manual intervention on part of the user. And for the entire duration of the resize operation, the system is fully available. The third in terms of the costs, we have double the amount of data that can be processed per node. So if you look at a Heatwave cluster, the size of the cluster determines the cost. So by doubling the amount of data that can be processed per node, we have effectively reduced the cluster size which is required for running a given workload to have, which means it reduces the cost of the customer by half. And finally, we have also run the TPCDS workload on Heatwave and compared it with other vendors. So now customers can have another data point in terms of the performance and the cost comparison of Heatwave with other services. All right, and I promise I'm going to ask about the benchmarks, but I want to come back and drill into these a bit. How is Heatwave ML different from competitive offerings? Take, for instance, Redshift ML, for example. Sure. Okay, so this is a good comparison. Let's start with, like, say, Redshift ML. There are some systems like Snowflake which don't even offer any processing of machine learning inside the database and they expect customers to write a whole bunch of code and say Python or Java to the machine learning. Redshift ML does have integration with SQL. That's a good start. However, when customers of Redshift need to run machine learning and they invoke Redshift ML, it makes a call to another service, SageMaker, right? So the data needs to be exported to a different service. The model is generated and the model is also outside Redshift. With Heatwave ML, the data resides always inside the MySQL database service. We are able to generate models. We're able to train the models, run inference, run explanations all inside the MySQL Heatwave service. So the data or the model never have to leave the database, which means that both the data and the models can now be secured by the same access control mechanisms as the rest of the data. So that's the first part, that there is no need for any ETL. The second aspect is the automation that for training, training is a very important part of machine learning, right? And it impacts the quality of the predictions and such. So traditionally, customers would employ data scientists to influence the training process so that it's done right. And even in the case of Redshift ML, the users are expected to provide a lot of parameters to the training process. So the second thing which we have worked on with Heatwave ML is that it is fully automated. There is absolutely no user intervention required for training. Third is in terms of performance. So one of the things we are very, very sensitive to is performance because performance determines the eventual cost to the customer. So again, in some benchmarks which we have published, and these are all available on GitHub, we are showing how Heatwave ML is 25 times faster than Redshift ML. And here's a kicker at 1% of the cost. So for benefits, the data also remains secure inside the database service. It's fully automated, much faster, much lower cost in the competition. All right, thank you, Nipin. Now, so there's a lot of talk these days about explainability and AI, you know, the system can very accurately tell you that it's a cat, you know, or for you Silicon Valley fans, it's a hot dog or not a hot dog, but they can't tell you how the system got there. So what is explainability and why should people care about it? Right, so when we were talking to customers about what they would like from a machine learning based solution, one of the feedbacks we got is that enterprises are a little slow or averse to uptaking machine learning because it seems to be, you know, like magic, right? And enterprises have the obligation to be able to explain or to provide an answer to their customers as to why did the database make a certain choice? With a rule-based solution, it's simple, it's a rule-based thing and you know what the logic was. So the reason explanations are important is because customers want to know that why did the system make a certain prediction? One of the important characteristics of HeatWave ML is that any model which is generated by HeatWave ML can be explained and we can do both global explanations or model explanations, as well as we can also do local explanations. So when the system makes a specific prediction using HeatWave ML, the user can find out why did the system make such a prediction? So for instance, if someone is being denied a loan, the user can figure out what were the attributes, what were the features which led to that decision. So this ensures like, you know, fairness and many of the times there is also like a need for regulatory compliance where users have a right to know. So we feel that explanations are very important for enterprise workload and that's why every model which is generated by HeatWave ML can be explained. Now I got to give Snowflake some props, you know, this whole idea of separating compute from storage, but also bringing the database to the cloud and driving elasticity. So that's been a key enabler and it's solved a lot of problems, particularly the snakes walling the basketball problem as I often say, but what about elasticity and elasticity in real time? How is your version? And there's a lot of companies chasing this. How is your approach to an elastic cloud database service different from what others are promoting these days? Right. So a couple of characteristics. One is that we have now fully automated the process of elasticity, meaning that if a user wants to scale up or scale down, the only thing they need to specify is the eventual size of the cluster in the system completely takes care of it transparently. But then there are a few characteristics which are very unique. So for instance, we can scale up or scale down to any number of nodes. Whereas in the case of Snowflake, the number of nodes, someone can scale up or scale down to other powers of two. So if a user needs 70 CPUs, well, their choice is either 64 or 128. So by providing this flexibility with HeatWave MySQL HeatWave, customers get a custom fit. So they can get a cluster of, which is optimized for their specific work. So that's the first flexibility of scaling up or down to any number of nodes. The second thing is that after the operation is completed, the system is fully balanced, meaning the data across the various nodes is fully balanced. That is not the case with many solutions. So for instance, in the case of Redshift, after the resize operation is done, the user is expected to manually balance the data, which can be very cumbersome. In the third aspect is that while the resize operation is going on, the HeatWave cluster is completely available for queries, for DMOs, for loading more data. That is again, not the case with Redshift. Redshift suppose the operation takes 10 to 15 minutes. During that, a window of time, the system is not available for rights. And for a big part of that chunk of time, the system is not even available for queries, which is very limiting. So the advantages we have are fully flexible. The system is in a balanced state and the system is completely available for the entire duration of the operation. Yeah, I guess you got that hyper granularity, which sometimes I say, well, t-shirt sizes are good enough, but I'm not thinking myself. I got some t-shirts fit me better than others. So, okay, you noted, I saw in the announcement that you have this lower price point for customers. How did you actually achieve this? Can you give us some details around that, please? Sure. So there are two things for announcing the service. It's lower the cost for the customers. The first thing is that we have doubled the amount of data that can be processed by the HeatWave node. So if you have doubled the amount of data, which can be processed by a node, the cluster size, which is required by customers reduces to half. And that's why the cost drops to half. The way we have managed to do this is by two things. One is support for bloom filters, which reduces the amount of intermediate memory. And second is we compress the base data. So these are the two techniques we have used to process more data per node. The second way by which we are lowering the cost for the customers is by supporting pause and resume of HeatWave. And many times, you find customers of like HeatWave and other electric services that they want to run some electric queries or some electric workloads for some duration of time, but then they don't need the cluster for a few hours. Now with the support for pause and resume, customers can pause the cluster and the HeatWave cluster instantaneously stops. And when they resume, not only do we fetch the data at a very like, you know, a quick pace from the object store, but we also preserve all the statistics which are used by autopilot. So both the data and the metadata are faced extremely fast from the object store. So with these two capabilities, we feel that it will drive down the cost to our customers even more. Got it. Thank you. Okay, I promised I was going to get to the benchmarks. Let's have it. How do you compare with others, specifically cloud databases? I mean, and how do we know these benchmarks are real? My friends at EMC, they were back in the day, they were brilliant at doing benchmarks. They would produce these beautiful PowerPoint charts, but it was kind of opaque, but what do you say to that? Right, so there are multiple things I would say. The first thing is that this time we have published two benchmarks. One is for machine learning and other is for SQL analytics. All the benchmark, including the scripts which we have used are available on GitHub. So we have full transparency and we invite and encourage customers or other service providers to download the scripts, to download the benchmarks and see if they get any different results. So whatever we are seeing, we have published it for other people to try and validate. That's the first part. Now for machine learning, there hasn't been a precedence for enterprise benchmarks. So we took about 18 open data sets and we have published benchmarks for those, right? So both for classification as well as for regression, we have run the training times and that's where we find that heat wave ML is 25 times faster than that shift ML at one person at a cost. So fully transparent available. For SQL analytics in the past, we have shown comparisons with TPCH. So we would show TPCH across various databases across various data sizes. This time we decided to use TPCDS. The advantage of TPCDS over TPCH is that it has more number of queries. The queries are more complex. The schema is more complex and there is a lot more data skew. So it represents a different class of workloads and which is very interesting. So these are queries derived from the TPCDS benchmark. So the numbers we have published this time are for 10 terabyte TPCDS and we are comparing with all the four major services, Redshift, Snowflake, Google BigQuery, Azure Synapse. And in all the cases, heat wave is significantly faster and significantly lower price. Now, one of the things I want to point out is that when you're doing the cost comparison with other vendors, we are being overly feared. For instance, the cost of heat wave includes the cost of both the MySQL node as well as the heat wave node. And with this setup, customers can run transaction processing analytics as well as machine learning. So the price captures all of it. Whereas with the other vendors, the comparison is only for that analytic queries. So if customers wanted to run old data, you would need to add the cost of that database. Or if customers wanted to run machine learning, you would need to add the cost of that service. Furthermore, with the case of heat wave, we are coding pay as you go price. Whereas for other vendors like Redshift and like we're applicable, we are coding one year fully paid upfront cost. So it's like very fair comparison. So in terms of the numbers, so price performance for TPCDS, we are about 4.8 times better price performance compared to Redshift. We are 14.4 times better price performance compared to Snowflake, 13 times better than Google BigQuery and 15 times better than Synapse. So across the boat, we are significantly faster in significantly lower price. And as I said, all of these scripts are available in GitHub for people to try for themselves. Okay, all right, I get it. So I think what you're saying is you could have said, this is what it's gonna cost for you to do both analytics and transaction processing on a competitive platform versus what it takes to do that on Oracle MySQL heat wave. But you're not doing that, you're saying let's take them head on in their sweet spot of analytics or OLTP separately and you're saying you still beat them. So, okay. So you got this one database service in your cloud that supports transactions in analytics and machine learning. How much do you estimate your saving companies with this integrated approach versus the alternative of kind of what I called up front, the right tool for the right job and admittedly having to ETL tools. How can you quantify that? So, okay. The numbers I quoted, right? At the end of the day in a cloud service, price performance is the metric which gives a sense as to how much the customers are going to save. So for instance, for like a TPCDS workload, if we are 14 times better price performance than Snowflake, it means that our cost is going to be one 14th for what customers would pay for Snowflake. Now in addition, other costs in terms of migrating the data, having to manage two different databases, having to pay for another service for like, you know, machine learning, that's all extra. And that depends upon what tools customers are using or what other services they're using for transaction processing or for machine learning. But these numbers themselves, right? Like they're very, very compelling. If you are one fifth the cost of Redshift, right? Or one 14th of Snowflake, these numbers themselves very, very compelling. And that's the reason we are seeing so many of these migrations from these databases to MySQL Heatwave. Okay, great. Thank you. Last question in the Q3 earnings call for fiscal 22. Larry Ellison said that MySQL Heatwave is coming soon on AWS. And that caught a lot of people's attention. That's not like Oracle. I mean, people might say, maybe that's an indication that you're not having success moving customers to OCI. So you got to go to other clouds, which by the way I applaud, but any comments on that? Yeah, this is very much like Oracle. So if you look at one of the big reasons for success of the Oracle database and why Oracle database is the most popular database is because Oracle database runs on all the platform. And that has been the case from day one. So very akin to that, the idea is that there's a lot of value in MySQL Heatwave and we want to make sure that we can offer the same value to the customers of MySQL running on any cloud, whether it's OCI, whether it's the AWS or any of the cloud. So this shows how confident we are in our offering and we believe that in other clouds as well, customers will find significant advantage by having a single database, which is much faster and much lower price than what alternatives they currently have. So this shows how confident we are about our products and services. Well, that's Grammy. Obviously for you, you're in MySQL group. You love that, right? The more places you can run, the better it is for you, of course, and your customers. Okay, Nipin, we've got to leave it there. As always, it's great to have you on theCUBE. Really appreciate your time. Thanks for coming on and sharing the new innovations. Congratulations on all the progress you're making here. You're doing a great job. Thank you, Dave, and thank you for the opportunity. All right, and thank you for watching this CUBE Conversation with Dave Vellante. For theCUBE, you're leader in enterprise tech coverage. We'll see you next time.