 Hello everyone, this is Dave Vellante. We're diving into the deep end with AMD and Oracle on the topic of MySQL heat wave performance. And we want to explore the important issues around machine learning. As applications become more data intensive and machine intelligence continues to evolve, workloads increasingly are seeing a major shift where data and AI are being infused into applications and having a database that simplifies the convergence of transaction and analytics data without the need to context switch and move data out of and into different data stores and eliminating the need to perform extensive ETL operations as becoming an industry trend that customers are demanding. At the same time, workloads are becoming more automated and intelligent. And to explore these issues further, we're happy to have back on theCUBE, Nipin Agarwal, who's the Senior Vice President of MySQL heat wave and Kumaran Siva who's the Corporate Vice President Strategic Business Development at AMD. Gents, hello again, welcome back. Hello, hi, Dave. Thank you, Dave. Okay, Nipin, obviously machine learning has become a must have for analytics offerings. It's integrated into MySQL heat wave. Why did you take this approach and not the specialized database approach as many competitors do, right tool for the right job? Right, so there are a lot of customers of MySQL who have the need to run machine learning on the data which is stored in MySQL database. So in the past, customers would need to extract the data out of MySQL and they would take it to a specialized service for running machine learning. Now, the reason we decided to incorporate machine learning inside the database, there are multiple reasons. One, customers don't need to move the data. And if they don't need to move the data, it is more secure because it's protected by the same access control mechanisms as rest of the data. There is no need for customers to manage multiple services. But in addition to that, when we run the machine learning inside the database, customers are able to leverage the same service, the same hardware which has been provisioned for OLTP analytics and use machine learning capabilities at no additional charge. So from a customer's perspective, they get the benefits that it is a single database. They don't need to manage multiple services and it is offered at no additional charge. And then as another aspect, which is kind of orthogonal, which is based on the IT, the work we have done, it is also significantly faster than what customers would get by having a separate service. Just to follow up on that, how are you seeing customers use heat waves, machine learning capabilities today? How is that evolving? Right. So one of the things which customers very often want to do is to train their models based on the data. Now, one of the things is that data in a database or in a transactional database changes quite rapidly. So we have introduced support for auto machine learning as a part of Heatwave ML. And what it does is that it fully automates the process of training. And this is something which is very important to database users, very important to my secret users that they don't really want to hire or data scientists or specialists for doing training. So that's the first part that training in a Heatwave ML is fully automated, doesn't require the user to provide any specific parameters, just the source data and the task of which they want to train. The second aspect is that training is really fast. So the training is really fast. The benefit is that customers can retrain quite often. They can make sure that the model is up to date with any changes which have been made to their transactional database. And as a result of the models being up to date, the accuracy of the prediction is high. So that's the first aspect which is training. The second aspect is inference, which customers run once they have the models trained. And the third thing, which has perhaps been the most sought after request from the MySQL customers, is the ability to provide explanations. So Heatwave ML provides explanations for any model which has been generated or trained by Heatwave ML. So these are the three capabilities, training, inference and explanations. And the whole process is completely automated, doesn't require a specialist or a data scientist. Now that's nice. I mean, training obviously very popular today. I've said inference I think is going to explode in the coming decade. And then of course, AI, explainable AI is a very important issue. Kumaran, what are the relevant capabilities of the AMD chips that are used in OCI to support Heatwave ML? Are they different from say the specs for Heatwave in general? So actually they aren't. And this is one of the key features of this architecture or this implementation that is really exciting. There with Heatwave ML, you're using the same CPU. And by the way, it's not GPU, it's a CPU for both, for all three of the functions that Nipon just talked about, inference, training and explanation, all done on CPU. Bigger picture with the capabilities we bring here, we're really providing a balance between the CPU cores, memory and the networking. And what that allows you to do here is be able to feed the CPU cores appropriately. And within the cores, we have these, we have AVX instructions with the Zen 2 and Zen 3 cores. We had AVX2 and then with the Zen 4 core coming out, we're going to have AVX512. But we were able to, with that balance of being able to bring in the data and utilize the high memory bandwidth and then use the computation to its maximum, we're able to provide, build probably enough AI processing that we're able to get the job done. And then we're able to build a fit into that larger pipeline that we build out here with the Heatwave. Got it. Nipon, you and I, every time we have a conversation, we've got to talk benchmarks. So you've done machine learning benchmarks with Heatwave. You might even be the first in the industry to publish transparent open ML benchmarks on GitHub. I mean, I wouldn't know for sure, but I had not seen that as common. Can you describe the benchmarks and the datasets that you used here? Sure. So what we did was we took a bunch of open datasets for two categories of tasks, classification and regression. So we took about a dozen datasets for classification and about six for regression. So to give an example, the kind of datasets we use for classifications like the Airlines dataset, Higgs, Census, Bank, right? So these are open datasets. And what we did was for on these datasets, we did a comparison of what would it take to train using Heatwave ML and then the other service we compared with this Deadshift ML. So there were two observations. One is that with Heatwave ML, the user does not need to provide any tuning parameters, right? Heatwave ML using RML fully generates a train model, figures out what are the right algorithms, what are the right features, what are the right hyperparameters and such, right? So no need for any manual intervention, not so the case with Deadshift ML. The second thing is the performance, right? So the performance of Heatwave ML, aggregate on these 12 datasets for classification and the six datasets on regression. On an average, it is 25 times faster than Deadshift ML. And note that Deadshift ML in turn invokes Sage Beaker, right? So on an average, Heatwave ML provides 25 times better performance for training. And the other point to note is that there is no need for any human intervention that's fully automated. But in the case of Deadshift ML, many of these datasets did not even complete in the set duration. If you look at price performance, one of the things again I want to highlight is because of the fact that AMD does pretty well in all kinds of workloads, we are able to use the same cluster, users can use the same cluster for analytics, for OLTP or for machine learning. So there is no additional cost for customers to run Heatwave ML if they have provisioned Heatwave. But assuming a user is provisioning a Heatwave cluster only to run Heatwave ML, right? That's the case. Even in that case, the price performance advantage of Heatwave ML over Redshift ML is 97 times, right? So 25 times faster at 1% of the cost compared to Redshift ML. And all these scripts and all this information is available on GitHub for customers to try to modify and like see, like what are the advantages they would get on their workloads. Every time I hear these numbers, I shake my head. I mean, it's just so overwhelming. And so we'll see how the competition responds when and if they respond. So, but thank you for sharing those results. Kumaran, can you elaborate on how the specs that you talked about earlier contribute to Heatwave ML's benchmarks results? I'm particularly interested in scalability. Typically things degrade as you push the system harder. What are you seeing? No, I think it's good. Look, yeah, by those numbers just blow my head too. That's crazy, good performance. So from an AMD perspective, we have really built an architecture. Like if you think about the chiplet architecture to begin with, it is fundamentally, it's got a kind of scaling by design, right? And one of the things that we've done here is been able to work with the Heatwave team and the ML team and then been able to within the CPU package itself be able to scale up to take very efficient use of all of the course and then of course work with them on how you go between nodes so you can have these very large systems that can run ML very, very efficiently. So it's really building on the building blocks of the chiplet architecture and how scaling happens there. Yeah, so what you're saying is near linear scaling, essentially? So let the punk comment on that. Yeah, also, so how about as cluster sizes? Grow, what happens there? So one of the design points for Heatwave is scale out architecture, right? So as you said, that as we add more data set or increase the size of the data or we add the number of nodes to the cluster, we want the performance to scale. So we showed that we have nearly linear scale factor or nearly linear scalability for SQL workloads in the case of Heatwave ML as well as users add more nodes to the cluster. So the size of the cluster, the performance of Heatwave ML improves. So I was giving you this example that Heatwave ML is 25 times faster compared to Redshift ML. Well, that was on a cluster size of two. If you increase the cluster size of Heatwave ML to a larger number, I think the number is 16, the performance advantage over Redshift ML increases from 25 times faster to 45 times faster. So what that means is that on a cluster size of 16 nodes, Heatwave ML is 45 times faster for training. These are again, dozen data sets. So this shows that Heatwave ML scales better than the competition. So you're saying adding nodes offsets any management complexity that you would think of as getting in the way, is that right? Right, so one is the management complexity and which is why by features like elasticity, customers can scale up or scale down very easily. The second aspect is, okay, what gives us this advantage of scalability or how are we able to scale? Now, the techniques which we use for Heatwave ML scalability are a bit different from what we use for SQL processing. So in the case of Heatwave ML, they're really like two trade-offs which we have to be careful about. One is the accuracy because we wanna provide better performance for machine learning without compromising on the accuracy. So accuracy would require like more synchronization if you have multiple threats. But if you have too much of synchronization that can slow down the degree of parallelism we get, right? So we have to strike a fine balance. So what we do is that in Heatwave ML, there are different phases of training like algorithm selection, feature selection, hyperparameter training. Each of these phases is parallelized. And for instance, one of the ways techniques we use is that if you're trying to figure out what's the optimal hyperparameter to be used, we start up with the search space and then each of the VMs gets a part of the search space and then we synchronize only when needed, right? So these are some of the techniques which we have developed over the years and they're actually papers file, research publications filed on this and this is what we do to achieve good scalability. And what that results to the customer is that if they have some amount of training time and they wanna make it better, they can just provision a larger cluster and they will get better performance. Got it, thank you. Kumran, when I think of machine learning, machine intelligence, AI, I think GPU, but you're not using GPU. So how are you able to get this type of performance, price performance without using GPUs? Yeah, definitely. So that's a good point. And you think about what is going on here and you consider the whole pipeline that Nipon has just described in terms of how you get your training, your algorithms and using the MySQL pieces of it to get to the point where the AI can be effective. In that process, what happens is you have to have a lot of memory transactions, a lot of memory bandwidth comes into play and they're bringing all that data together, feeding the actual complex that does the AI calculations. That in itself could be the bottleneck, right? And you can have multiple bottlenecks along the way. And I think what you see in the AMD architecture from Epic for this use case is the balance and the fact that you are able to do the pre-processing, the AI and then the post-processing all kind of seamlessly together, that has a huge value. And that goes back to what Nipon was saying about using the same infrastructure gets you the better TCO but it also gets you better performance. And that's because of the fact that you're bringing the data to the computation. So the computation in this case is not strictly the bottleneck. It's really about how you pull together what you need and to do the AI computation. And that's probably a more, it's a common case. And so you're gonna start, I think, we'll start to see this, especially for inference applications, but in this case we're doing both inference explanation and training all using the CPU and the same OCI infrastructure. Interesting, Nipon, is the secret sauce for HeatWave ML performance different than what we've discussed before, you and I, with HeatWave generally? Is there some additive, engine additive that you're putting in? Yes, the secret sauce is indeed different. Just the way I was saying that for SQL processing the reason we get very good performance and price performance is because we have come up with new algorithms, which help the SQL process and scale out. Similarly for HeatWave ML, we have come up with new IP, new like algorithms. One example is that we use meta-learn proxy models. That's the technique we use for automating the training process. So think of this meta-learn proxy models to be like using machine learning for machine learning training. And this is an IP which we developed. And again, we have published the results and the techniques but having such kind of like techniques is what gives us a better performance. Similarly, another thing which we use is adaptive sampling that you can have a large data set but we intelligently sample to figure out that how can we train on a small subset without compromising on the accuracy. So yes, there are many techniques which we have developed specifically for machine learning which is what gives us the better performance, better price performance and also better scalability. What about MySQL Autopilot? Is there anything that differs from HeatWave ML that is relevant? Okay, interesting you should ask. So MySQL Autopilot is think of it to be an application using machine learning. So MySQL Autopilot uses machine learning to automate various aspects of the database service. So for instance, if you want to figure out that what's the right partitioning scheme to partition the data in memory, we use machine learning techniques to figure out that what's the best column based on the user's workload to partition the data in memory or given a workload if you want to figure out what is the right cluster size to provision. That's something we use MySQL Autopilot for. And I want to highlight that we don't have any other database service which provides this level of machine learning based automation which customers get with MySQL Autopilot. Interesting. Okay, last question for both of you. What are you guys working on next? What did customers expect from this collaboration? Specifically in this space, maybe Nipin, you can start and then Kumaran can bring us home. Sure. So there are two things we are working on. One is based on the feedback we've gotten from customers. We're going to keep making the machine learning capabilities richer in heat wave ML. That's one dimension. And the second thing is which Kumaran was alluding to earlier. We are looking at the next generation of like processes coming from AMD. And we will be seeing as to how we can more benefit from these processes, whether it's the size of the L3 cache, the memory bandwidth, the network bandwidth and such or the NUMA effects and make sure that we leverage all the greatness with the new generation of processes we'll offer. It's like an engineering playground, Kumaran. Let's give you the final word. No, that's great. Now look with the Zenfour CPU cores we're also bringing in AVX 512 instruction capability. Now our implementation is a little different it was in Roman Milan too where we use a double pump implementation. What that means is, you know we take two cycles to do these instructions but the key thing there is we don't lower our speed of the CPUs so there's no noisy neighbor effects. And it's something that OCI and the heat wave has taken full advantage of. And so like as we go out in time and we see the Zenfour core we can, we see up to 96 CPUs that's going to work really well. So we're collaborating closely with OCI and with the heat wave team here to make sure that we can take advantage of that. And we're also going to upgrade the memory subsystem to get to 12 channels of DDR5. So this should be a fairly significant boost in absolute performance. But more importantly, just as importantly in TCO value for the customers the end customers who are going to adopt this great service. I love their relentless innovation guys. Thanks so much for your time. We're going to have to leave it there. Appreciate it. Thank you. Thank you, Dave. Okay, thank you for watching this special presentation on theCUBE, your leader in enterprise and emerging tech coverage.