 Hello and welcome to theCUBE's exclusive coverage of AWS re-invented as part of our SuperCloud 5 special edition broadcasts out of the Palo Alto Studios. We're here on location, 11th year of theCUBE, going to be here for Monday through Thursday, wall-to-wall coverage. We're going to have a ton of content, of course we've got a special report on SiliconANGLE, check it out. Our next guest is Prasad Kalyanaram, who's the Vice President of Infrastructure Services at AWS, welcome to theCUBE. Thanks for coming. Thank you very much John, it's a pleasure of meeting you. So we're getting set up, we're in the press area here, a lot of action, you know a lot of briefings. The keynote's tomorrow, this is day one, so there's a lot of news we don't know about, I know a little bit of it, you know all of it. Infrastructure. Well there's one keynote today, evening and one tomorrow, so stay tuned. It's going to be a lot of infrastructure and tips. I wrote a post, had an exclusive with Adam Sileski, kind of laid it out, he didn't reveal it, but it was pretty much a preview of what's to come. The business of ADF is strong, although most people don't understand how that business works, it's very lumpy, you do a lot of upfront discounts, but overall net new business is growing fast, cost optimization is almost done, but there's a surge of generative AI spending that's coming because we're in an experimental phase, this is the hottest conversation here at Reinvent is, the generative AI services that are coming and at the three layers stack, the infrastructure piece, huge enabler for you guys, the model layer, foundation model layer where the action is, that's the new middleware, right? That's what's going on, a lot of activity there, you got Anthropic, you got Bedrock, you got SageMaker. Hugging face and so on. Everything's in there, but it's the infrastructure is going to be key, there's a huge conundrum around GPUs, people just get their hands on them, they're sold out, there's supply chain problems, you guys have a differentiation with the chips coming and the relationship with, say, Anthropics highlights where this is going, so share the vision of what you're working on right now because the infrastructure layer, just like, I'd say Cloud 1.0, was a huge enabler for developers, startups and companies to get value quickly. Yeah, well, John, you talked about generative AI and in fact I often want to talk about how many things have not changed and the infrastructure layer and it's important to really take stock of it and the reason why it's important is because this is not something you can just enable overnight, this is over years of innovation that we've actually done and I'll talk about a few things here, the most obvious one is in terms of our chip design and in terms of our investments and if you think about, we've been working all the way from Graviton to Inferentia to Tranium and over the course of time we keep innovating on the chip design and we keep innovating on price performance of these chips as well but then there's a lot of other things in the infrastructure that's important but I'll start off with security, right? Security's job number one and that remains true in the generative AI space as well and if you think about our innovations on Nitro, our innovations on making sure that we have secure communications between our data centers and we encrypt the traffic, all that is actually fundamental to generative AI and it's also important that customers data are secure so that's on the security angle. Now the next part of it is our networking angle which is how do we innovate on the networking side? As these models become larger in size it's important for inter-process communication and inter-node communication and that's where in 2018 we launched EFA, Elastic Fabric Adapter and lately we'll talk about it in through the course of this week on what we call as network innovations that we've done in the form of ultra clusters which allows us to increase the bandwidth between these different nodes on the network as well as the latency reduction that these large language models needs and the last part of it is in our supply chain innovation. As you all probably know that we build our own servers, we design our own servers as well and we've gone down to the details of trying to make sure that the hardware infrastructure is fairly efficient from a power and from a sustainability perspective. Adam Sileski always talks about it. We love the word and at Amazon. We have customers and we have partners with Nvidia and others so I want to get your reaction to a quote just published yesterday in my schools with Adam Sileski. The premise was the real challenge in Genevieve i is integrating the chips with critical infrastructures like networking storage and scalable clusters. The integration is vital for the future of Genevieve i as workloads grow in diversity and complexity. That was my prompt to Adam. His quote, I want to get your reaction to, he says, the customers recognize it's not about just having chips but also having highly performed servers around the chips such as networking inside the clusters. Quote, we've seen customers go and investigate their own GPU clusters. I put that in there. And then come running back to us saying, you know, having chips is great but it doesn't actually work. What scenario is because a lot of people right now are trying to think, oh, I'll just get some GPUs. I'll put them on premise and I'm good to go. You're now having similar paradigm as the old cloud days where it's like, hey, I can do some stuff on premises in the data center, but that's not the same data center. This is a huge nuance point but for people considering standing up their own infrastructure to support their workloads, what's so complex about them? What's different about the AI workloads that the infrastructure is better in the cloud? Yeah, well, the first thing is that if you think about it, for you to even run any kind of generative AI workloads, whether it's training or inference and so on, you need a corpus of data services, right? And so it's not just about putting the chips in there. Even if you've figured out a way of connecting them, it's also important to actually figure out where your data is going to be stored and you need a really highly scalable data source which is performing. And so you look at services like S3 and you look at EBS and so on, zero ETL and so on. So you need the data layer that comes with it, right? Now, beyond the data layer, if you're running it just for your own application, you can probably put a certain level of perimeter security on top of it but you need security at every layer of the infrastructure stack all the way from the nitro layer where we actually build it on the servers to the network layer and then to the perimeter as well. So that's the second part. So you need the data services to provide you the data. Then you need the network. Now the network itself, if you were connecting a few machines together, a few hundred machines, you can do that with a fairly flat network, right? And there are traditional network topologies that are out there, like the clove fabric is a very common fabric that many customers or many other providers use as well. But what we had to do was that that fabric is not sufficient for the low latency that you need. And so we had to go and innovate on ultra clusters to try and reduce that network latency. And we've been building our own set of network devices and our own operating systems on this network. And so because of that, we could actually change all of that through control planes in our software to actually optimize our network to be able to be performed for gendered AI. So you can do these at small scale when you actually have to do this at a larger scale and when you have to build an end-to-end application like that, you need the data services, you need the network, you need the security, you need the supply chain. You know, I remember when you guys came out with Nitro and Hypervisor, Dave Lothan, I were like, oh my God, this is, we can see this is going. Of course, we had interviews with James Hamilton and he kind of laid the dots out to connect. But now you're in multiple generations of chips. Okay, I mentioned some of those in advance with what you guys are building. As you look at this next wave of training, which I see is more of a setup, like a sandbox of data. Training will still be around, but right now, that's everyone's talking about the training costs. Okay, that's pretty big to train, but the inference is where the action is. Okay, Adam confirmed that on my post there as well in person. We've been seeing in the industry the inference is where the ongoing iteration is with the data. Right. What needs to be in place to make that inference work at scale because inference is going to be everywhere. Edge, core, premise, everywhere. That's right. And this is where the statement that it is not just about the chips is so critical, right? Because when you actually have an application, I know John, you and I were talking about an application that you all built. It was a pretty cool application on Q. You liked that, didn't you? Yes, I did. On your video application, if you think about it, all the other data that's there, all the other services that are required to make that happen, it's hard to do that if you just had chips. You need the other underlying services, right? And that's where if you think about all the services, the 240 plus services that we've built over many years and the 3,300 plus features that you get every year, those are critical to actually really build an application. So I believe in the fullness of time that will still be foundational models. People will build smarter and smarter foundational models, they'll be larger. Some will be more specific as well. But then you need to actually have inference to build applications. And for you to be able to do that, you need all the other AWS services that are there. You need infrastructure to be there. You need the high levels of availability. And don't forget power that's actually important, right? You need to produce sustainable power. And as you know, we're very committed to actually being 100% renewable by 2025. We're 90% there. And we expect to actually get there as well. Actually, I'll be interviewing former Amazonian, Adrian Cockroft later this week. He's been doing a big thing on sustainability. The energy required is huge. Give some insight into the costs and complexity of the energy involved in these new generative AI apps. They're pretty much off the charts from a scope it out for us. What are we going to do? So let me give you a little bit of an idea of what a chip actually really takes in terms of power. Typically the current generation chips are about 700 to 800 kilowatts a chip. You're going to actually reach, sorry, 700 or 800 watts. You're going to reach 1.1. And at that point is when you're going to have to require liquid cooling on these chips, right? And so compared to a traditional compute server, typically some of these ML servers, whether it's Tranium or whether it's GPUs, they will consume about two X or so the power. And so on a particular server, the number of servers you can stack in a rack is very limited. An AWS rack typically has about 20 to 30 servers that's common in the industry. Whereas an ML server is going to have about two to three, not more than that. And then you need to actually be able to cool the server as well. And today we've been actually fairly efficient in being able to air cool these servers. But that's something that we've been working on. We've been working on this in the lab environment for many years now, because we knew that this day was going to come in terms of liquid cooling. And that's something that we had actually innovated on as well. I got to give you guys props. You're seeing the skate where the puck is as the saying goes. You guys have done that. The question I want to ask you is what's different about the generative AI workloads begin because you guys talk about there's three layers of the stack, but you have the traditional cloud services. Never thought I'd call AWS traditional services. But the standard stuff before, the non-engine AI workloads still require storage networking. What's going to be the infrastructure requirements to power the gen AI workloads? Yeah, a few things that will be different. As I said, lots will be similar, but there'll be a couple of things that will be different. One is that the server design requires us to be careful about how many servers we can pack in a particular rack, because typically on a particular rack you have a certain VIP that comes into the server and so you have a certain amount of power that you can deliver. And so that's one part of the infrastructure that has to be different because the servers actually consume more power. The second thing is that you have to be careful about how you land these servers on a particular lineup. And the reason for that is that you want to probably maximize the usage of utility power and any power that actually comes into a particular lineup, right? You don't want to leave any stranded power. Power is so expensive in a data center that you want to actually maximize the use of it. And so how you land these servers and what kind of servers you land on a particular lineup requires a certain amount of optimization that you need to do. And we have systems that decide what servers you need to land on every single lineup. How do we maximize the use of power? Think of it as a bin packing problem, but you have to actually bin pack knowing that if you have future supply chains you don't have perfect visibility into it. So you have to actually use some predictive capabilities on what you need to land on a particular lineup. That's on the power side of it. We already talked a little bit about the network and how the network has to be low latency and that we've been building it over a period of time as well. And then the third part of it is just all the other supply chain components to make the server actually come to fruition, right? Now you could build servers by just taking stock servers that are commercially available. That won't be highly efficient over a long period of time. And so we innovate fairly deep into the supply chain in terms of building our own manufacturing capabilities and trying to optimize the supply chain for an entire server as well. That's also going to be different. You guys certainly have a great advantage. I got to ask you the question around as these new paradigms come, these always change, both Adam and Matt Garmin both told me on camera that they recognize that there's going to be some use cases where you're going to want to have data on location or at the edge, obviously, you can't get to it. So how should companies think about their on-premises cloud operations from a design standpoint because with the LLMs and the models coming out, there's proprietary data that's going to be their ground jewel. And that's becoming well understood. Like, wait a minute, I'm not going to throw everything into a closed, large model like OpenAI or Anthropic. I want to keep this protected. So I'll maybe keep it on premise for compliance reasons or I put it in the cloud, you guys have VPCs and all that. No need to go into that. But like for the designers out there thinking, oh, I'll just stand up my own infrastructure and I'll connect to the cloud VNAPI. What's your recommendation of how to craft that data center strategy or on-premises strategy? If you think about it, this is where I think our years of innovation actually pays for itself. We started off with our AWS regions and then we built a fairly large edge network with a fairly redundant backbone as well. Our backbone is all 100 gig and it's our own custom backbone that we actually built. It's not one of those networks that was just lying around and then we happened to use it for the cloud. We had to build it from scratch, right? And we manage all the routing on that layer as well, right? So you have our AWS regions where 99 plus percent of our workloads are going to run. As you said, there'll be some use cases where customers might require on-prem locations for sensitive data, although the cloud is becoming a lot more secure today than it was ever before. And so our innovation on outposts are going to be super critical for those set of customers, right? Beyond that in the middle, we also have local zones for latency sensitive workloads. So if you think about the layers of our infrastructure, you have our AWS regions where we expect large majority of our workloads run. You have local zones where we will have latency sensitive workloads will run. You have outposts where you will actually have some of the workloads that customers want to keep on-prem run. And then recently we announced with the Singapore government, the dedicated local zones. So some of the more sensitive workloads like the government workloads may require their own local zones as well and we've been able to build that for them as well, right? And so that entire stack, all the way from our regions to local zones to dedicated local zones to outposts pretty much covers the gamut of all workloads that customers may want it on. So in Ferencia, Graviton, Tranium, big chips, local zone, you got FSXs in there, what's the chip scene going to be like for the folks coming out of re-invent? What's going to be the big takeaway from an infrastructure standpoint? Well, the thing is that when we get into these things, we get into these things for the long term, right? So if you think about Graviton, we start off with the first version of Graviton and release Graviton 2, Graviton 3, and every single version of it is like Graviton 3 is 25% more efficient than Graviton 2. And we expect that customers will keep pushing us on price per phones and very happy to actually innovate on that. You'll see the same things on our in Ferencia as well as our Tranium chips as well. We'll see continuous innovation. Stay tuned for a bunch of announcements today. And again, we're going to have announcements on all layers of the stack, all the way from the infrastructure layer, the model layer to the application layer. It's funny how I asked Adam about the competition. He said, not everyone is innovating at all three layers. Microsoft announced a chip set, but it's only for internal use. Apparently really not ready for prime time. You guys are on third generation. Big advantage. And then integrating into the models is very interesting as you learn about what those workloads going to be like. I'm sure there's going to be more chips. As a VP of Infrastructure Services, how does that impact your job every day? Yeah. Well, I'll tell you busy. Well, the thing is that we're in an era right now where I think we're actually seeing the next clip of cloud growth. And some of the challenges that we're going to be facing and our customers are going to push us on, it's going to be super exciting. You just think about every layer of the stack all the way from, you know, I talked about networking, I talked about supply chain systems, I talked about like innovation in supply chain is unheard of before to it, right? What is a cloud provider to do with supply chains? Like all the automation systems that we are actually building in terms of how we land racks, how we actually design lineups. Our data center design is going through a bunch of innovation. We've been on that journey for a long time. Sustainability is such a core area for us, right? Just look at the 400 plus projects that we've actually enabled with the largest procure of renewable power in the world today. And that doesn't come. Like no one would actually imagine that a cloud provider would actually become the largest procure of renewable energy. And here we are. And so, you know, not a day goes by where I don't think about all these things. And as you rightly pointed out, for us it's an and, it's not an R. Yeah, Prasad, it's been great. Great to have you on, great to have a conversation. You know, three years ago, Dave and I were on theCUBE and everyone was like, we don't want to talk about speeds and feeds. Let's talk about solutions and value to the customer. Past year, two years ago, we said hardware matters and we're like, people are booing us. Well, what do you mean hardware matters? Like no, the world's going to come back to speeds and feeds. I think at the end of the day, your customers, all the hype aside, it comes down to cost, energy, cost, and then also time to value. Price, performance, and capabilities, right? And security has to remain number one, so. And data too, it's all kind of coming together. The whole new operating system is coming. I congratulate you, thanks for coming. Thank you very much, John, appreciate it. Price. Okay, theCUBE coverage here on location from SuperCloud 5, part of re-invent coverage of our coverage at theCUBE in Silicon Valley. I'm John Furrier. We'll be back with more after this short break.