 Hi everybody, we're back at GTC 2024. We're here in San Jose, San Jose Convention Center downtown. They had still rocking at GTC. This is Dave Vellante, John Furrier's here, and Martin Yip is in the house. He looks after EC2 and networking product marketing at AWS. Martin, thanks for coming on theCUBE. It's great to be here. Thank you for having me. Yeah, we got the little pop-up cube. We're running and we're gunning, and it's kind of like, you know, unconferences, this is like the uncube. So we just kind of sort of wing it a little bit. But of course we know AWS. We had you guys on and following you for years and years and years. What do you make of this conference? I mean, we're entering a new era, it feels like. We absolutely are. It's an amazing conference. Love to see all the innovation. Love hearing how Jensen sees the future in his keynote, and just loving talking with everyone, networking. This is the first GTC that's been in person since the pandemic. So it's great to meet and see everyone again. There's a lot of, you know, talk. I would even say confusion. Well, NVIDIA has GPUs, but AWS has GPUs too. And yeah, huh? So you guys have been making silicon for a long, long time. It gives you competitive advantage. You started with Annapurna. Even before you bought Annapurna, we saw Graviton, we saw Tranium, we see Inferentia. So you guys have been at this for a while. What are you doing specifically with NVIDIA? And then let's talk about what you guys are doing in silicon. Yeah, we are deepening our partnership with NVIDIA. We have a great partnership with NVIDIA. We were the first to bring GPUs to the cloud quite a while ago, 13 years, I believe. And we are bringing a lot of their new GPUs to the cloud. One of the big announcements is that we are working with them on something called Project SEBA, which is building a supercomputer for them in the cloud using their GB Blackwell, Grace Blackwell architecture, bringing that platform to AWS for the use of NVIDIA's R&D. So Blackwell was the big star of the Monday keynote. Of course, Jensen was sort of tongue in cheek when he was saying sorry, Grace Hopper. We love you, but this guy is big. So it's quite amazing to me to see the pace of innovation here. I mean, it was just like last week, I felt like everyone was trying to get their hands on the previous generation. You guys were shouted out with some other cloud providers as in line to get the first orders. So how should we think about the progression of, I mean, we know for years we watched Moore's Law and we saw the innovation there. We're at a much steeper innovation curve in terms of performance. It's insane. Of course, energy consumption is a really big consideration. How are you guys thinking about all the dynamics and the complexity of this new era? Well, it's absolutely great what NVIDIA is doing. If you saw some of the performance numbers that Jensen shared, he was showing more than 2x gains in performance over the Hopper architecture, as well as lowering the energy efficiency, the lowering the energy usage of the GPUs while increasing performance. So that's just amazing. I think it's going to unlock a lot of new use cases. We're going to see an explosion in new applications and customers figure out how generative AI is going to fit into customer experiences, whether it's building new models, making it easier for customers to do something like create an image or summarize text or something else that we haven't thought of yet. It's going to be interesting the next couple of years. We wrote a piece several years ago talking about AWS's secret weapon, Nitro and then sort of Graviton and then these other silicon chips that we talked about. I wonder if I could ask you about Nitro. And it was really groundbreaking, gave you such an advantage in terms of control, optionality, security, I mean, on and on and on. And we've had a number of deep dives and very impressive. And it really is a competitive weapon in that it allows you to both lower your costs, bring in other silicon suppliers, bring in your own silicon and continuing to drive costs down. Now we enter the AI era, is Nitro ready? Nitro absolutely is ready. And in fact, Project SEBA is going to be in part bringing the best of Nvidia with the best of AWS, including things like Nitro, EFA, key management system. So it is absolutely going to be a part of that Project SEBA super computer that I talked about. In addition to lowering costs, bringing more choices to customers, it also helps with performance quite a bit in terms of our ability to offload virtualization functions onto dedicated hardware, ultimately freeing up resources as well as increasing performance for things like networking and storage. So it's going to be tremendous. Security is one aspect that you didn't talk about. And one of the things that Nitro enables is, we have this capability called Nitro Enclays that allows you to have trusted execution environments. And that's going to be huge in terms of, when we think of GenAI, we are going to think about security as well and making sure that the customer's data is secure. You can obviously have security with encryption at rest and in motion. But then when you're computing on the data, you can't encrypt it. So then therefore you need this trusted execution environment. And that's where Nitro is going to play in all this. And classic virtualization, too many people can get to the data and even AWS can't get to the data in this situation. So that's definitely a huge breakthrough. So what's different? I mean, the cloud obviously changed the way that we thought about IT, took away all that undifferentiated heavy lifting that became a famous phrase. How much of that applies to the AI era, the GPU era? And what's different about the era in terms of the workload, the characteristics of the workload and how are you guys and your customers adjusting? Yeah, no, the benefits of the cloud, the main reason why people move from on-premises to cloud is very much the same reason why customers want to access the GPUs in the cloud as well, which is the scalability, the reliability, the security and the availability, the global availability. The way we build our data centers is very different from other cloud providers. But that leads to the reliability again that I talk about and the availability. No, I think what is different is just, you could see very quickly what the impact of generative AI is going to be, whether it's through building and training your own model to deploying it, to using applications that have generative AI capabilities built in. It's very clear to see what some of the benefits are very quickly. How would you characterize AWS's compute strategy? I would say it's obviously a lot of optionality. There's not a vendor that you really don't work with. Obviously, Nitro and Graviton, et cetera, ARM-based designs, using Intel, using AMD, you've got NVIDIA, et cetera, et cetera. But how would, help people understand your strategy? How would you characterize it? Yeah, it's really about offering the broadest and deepest choice for customers. It's really up to what the customers need for their application workloads. Some workloads are compute intensive, others are memory or storage intensive. And others require an accelerator like a GPU. So it's really about giving customers the choice to get the level of performance and at the cost and security that they need. Do you think AI can help us choose the right processor? Absolutely. I mean, I'm kind of tongue in cheek question, but I'm actually kind of serious. Absolutely, and one of the things that we have built recently is a queue, we have something called queue and we have an instant selection tool that will absolutely help customers pick the right instance for them. We have over 750 instances now, which is a lot. But then based on a customer's workload and the characteristics of the workload, we can absolutely help them pick the right instance that will fit their needs. And then we have compute optimizer, which is kind of, as your workload is running, we will assess and analyze how you're using the infrastructure and make recommendations to you based on what your workload needs are again and help you optimize, again, what the infrastructure you need is. So Amazon's got a very easy to understand, even though it's very rich and capable and detailed AI stack, starting with the infrastructure at the bottom where that's where the silicon is and then I think the middle layer is AI tools. So that's where bedrock would be in there and then the top of the stack is the applications, which would be queue. And as well as Code Whisper and many other AI tools as well. Code Whisper is the coding assistant. Yes. Okay, and so in thinking about where all that goes and thinking about compute and all the different optionality of silicon you have, I'm curious as to what you're seeing from customers in terms of training and inference. I asked Adam Salipski, actually John asked him, I was listening in to the recording, Adam about this, about training versus inference and he said, you know, we got a lot of training to do and that's happening in the cloud but more and more people are talking about inference. Is it different infrastructure, the same infrastructure? Sometimes it depends, how are your customers sort of adjudicating between training and inference and what types of chips there or compute they're choosing? Yeah, I mean the characteristics and the needs of training and inference are quite different. We have instances, our P-series are mostly targeted towards training workloads and then we have our G-series as well as Inferentia which is more for inference. So P-series and training for training, G-series and Inferentia for inference and the characteristics is quite different. With training you need of course powerful processors and powerful compute as well as very fast network. You need large clusters of compute, aggregate compute with inference. It's more about near real-time responses so you need really low latency. You know, it would really not be a good experience if someone asked a question or had a query and it took minutes or even hours to get a response, right, you want it in real-time so you need that low latency there. And just, bedrock is all about LLM optionality and I presume that's, it's horses for courses from the EC2 standpoint, right? It's the right tool for the right job but are you discerning any patterns now that you've, you know, GA'd bedrock? In terms of matching the LLM characteristics with the EC2 instance and the choices that you have there, is there clarity there amongst customers or is there still a lot of experimentation? I think there's still a lot of experimentation. I think Adam always says, you know, there's no model to rule them all so really you have to figure out what you want to do and then try out different models to see which one works best for your needs. So if you're creating an image, it's very different for an image generation model versus a translation model versus something that will write a essay for you. All right, give us the final word here. You know, you guys, a customer obsessed, so give me the customer angle and thinking about the customers that you talk to. You know, what do they need in this AI era and how is AWS helping them? Yeah, you know, I think it's still really, really early in this AI era, especially in AI. Customers are still trying to figure out what they need, honestly and they're coming to us to ask the questions. The biggest thing is try things out, make sure to not just trial the infrastructure but also talk to the experts to help you deploy it. Gen AI is conceptually simple but under the hood it's quite complex and you need to work with the right experts also to help you. But the big thing is that it's still really early on and customers are still trying to figure things out and we're here to help. Yeah, so two thirds are the data from our partners from ETR shows that two thirds of the customers say they want to get ROI inside of 12 months. That's very aggressive. I don't know, I said the last question but do you think 2024 is going to be the year of AI ROI or do you think is maybe it's going to get pushed into latter part of this year or maybe into next year? I think it's going to be very much like the internet boom that we saw in the early 2000s where there'll be a mad rush of companies trying to figure out what this internet's about. Some of them won't make it but for those who do it'll be an amazing thing and regardless the industry is going to change because of it and we're going to see a lot of great applications, a lot of new use cases, a lot of new companies pop up to support it. So you want to experiment, the cloud's a great place to do that Martin. Thanks so much for coming on theCUBE. Appreciate it. Great to have you. All right, keep it right there everybody. This is Dave Vellante for John Furrier. We're live, we're not live actually but we will be on demand. GTC 2024, you're watching theCUBE.