 Hello, and welcome back to SuperCloud 4. This is our in-studio live event covering generative AI. This is our fourth episode of our quarterly digital series where we unpack the next generation cloud technologies, how that's going to impact customers, the industry, and developers building those next generation apps. Generative AI is all the buzz and hype, as well as matching with reality. This is the first time in history I've seen in my career where hype and reality are coming together at such amazing speeds. And here is Brett and Sahab, VP of AI and ML services at AWS, who couldn't make it in the studio, so we're bringing them in remote. Brought in great to see you, and thanks for coming on our SuperCloud to do one of the key nails. I really appreciate your time. Thank you, John. Nice to be here. Thank you for having me. So we're kicking off kind of what this all means for the next gen cloud with AI. Obviously, I was talking to Swami when he announced the expanded relationship with Huggingface back in the day, a couple of months ago, feels like two years ago. You key noted our Amazon startup showcase which featured Cohere, Stability AI and a handful of other startups in the area. That was in February, okay? So much has happened since then and coming up on reinvent in a few months, we're going to see probably an explosion of even more ecosystem activity. The customers here are enterprises and the startups that are emerging. The private companies as well, two sets of customers, very interested in leveraging cloud for generative AI. Give us the update since February in the past couple of months when the tsunami of announcement saw Bedrock when GA, give us the quick update, high level, what's changed in just in the past few months from an AWS perspective around generative AI and your customers? Yeah. So, you know, Bedrock and GA just sometime back and we've added more models to Bedrock. We have since the announcement earlier this year, we also added Cohere. And, you know, we are really excited at how customers are starting to use them, use the models in Bedrock. And then in addition, code whisperer that I think we talked about earlier in the year and I talked about before, that really launched the customization capability. So, that allows customers to, you know, kind of customize it for their own code bases, their own styles of coding and so on. We are also very excited by what customers can do with Trainiam and Inferentia. And, you know, the performance improvements and the performance per dollar improvements that they get from those. And a lot of customers are continuing to build generative AI models on top of SageMaker. So, you know, all in all, going in from Bedrock being the easiest way to build generative AI apps with its, you know, gen AI capabilities and its foundation models to code whisperer to the infrastructure. We have the hardware infrastructure which is purpose built for generative AI so that it gives you the best cost and performance. And then, you know, providing the ML infrastructure, the software infrastructure for customers to build their own FMs, their own foundation models. Bedrock, Bedrock's the curated set of models you guys are putting out there. And then you have SageMaker as everything else. You've got open source models, people can play in there. And that's kind of the interface for your customers. Is that right? Yes. And, you know, Bedrock provides state of the art models. So, you know, it's, as you said, John, curated state of the art models, but then there are a lot of other models that are available on SageMaker. Jumpstart in open source models and others. And we think that a single model isn't going to work necessarily for all use cases. Customers want this choice. And that is why we are excited about it. And we actually see the industry moving in this direction. If you look at the way it's going, there's one other application that, you know, we can talk about later on, and that is HealthScribe. You know, we actually released a GNAI-based product, HealthScribe, that helps with patient-physician interaction. So we can go into that as well later on. Yeah, this is, again, the teasing out, again, what I think has changed since our last conversation. You had, I thought, said some pretty brilliant things. In our last meeting, I thought was, you know, basic, but from an industry standpoint, but it kind of was genius in the sense it was right. You said three things with AI. And I just tweeted this, actually. Changing the way you interact with the information, generating software number two, and three, other areas like heavy lifting, undifferentiated documentation. Okay, great, get that baseline. Now you're seeing the products, apps come out as chatbots, some sort of co-pilot like human assistance, and then predictive, kind of like the, that seems to be the state-of-the-art apps today. Now they're going to another level, okay? And we're starting to see models that need to move into production. And that's the biggest conversation we're seeing going into re-invent, and here in the SuperCloud conference we're having, how do I get the production, move from experimentation demo, which you can do with the right clean data, or synthetic data, how do I move that into production, at scale, at a cost structure. And so I have to ask you, again, your perspective on that, and two, as people build their own models, I won't say proprietary, but relative to their data, which is intellectual capital and property, they will need to interact with the big three or the big four proprietary models. Through APIs, managing, compute, and horsepower. So this is kind of a multi-dimensional, multi-side coin here. What's your reaction to that? Because this seems to be where the conversation is going. Do you see it the same way? And how would you react to that? Yeah, and I think that is where, it comes back to, if you take a step back, it comes back to deploying generative AI at scale in the enterprise is a different kettle of fish than just having demos or consumer apps. Because like you said, a lot of factors come into the play. There's accuracy, there is cost, there is latency, there's performance and all of that. And so it's a multi-dimensional, multi-faceted optimization problem that you need to do. And that is where having a choice of models is so important because there are use cases where you may want to use a particular model that has been fine-tuned on your data. So getting back to your question, how do customers make it, how do we make it easier for customers to deploy to production? I think bedrock makes it really easy because we have done a lot of the heavy lifting of taking these models, training it, optimizing it and we're going to do a lot of optimizations in terms of reducing the inference cost. The second thing I'll also say is that when you think about all of these models, there is an aspect around how do you make them scale as these products are being used by a lot of consumers. And that is where I think there's a lot of heavy lifting in terms of getting the right latency, the right traffic control, the auto scaling and all of that. And all of that is taken care of behind the scenes by bedrock. Now there might also be situations where customers actually build their own proprietary models or take a model and fine tune it and then deploy it. There are two options there. One, if you go do it with bedrock, you actually get your own private copy of the model. And so when you want to fine tune the model because you're getting your own private copy of the model, your data doesn't get into the base model. But there might also be other cases where you want to take one of these open source models that actually works well for you and that is where we have a number of features that we have added in SageMaker so that we can reduce the cost by an order of magnitude. There are a couple of features that are now in private beta with customers that were hopefully launched at re-invent. And those are things that reduce cost by orders of magnitude. And so these are all the things that we're really thinking through and innovating on to reduce the cost, the latency and keep improving on the performance. Yeah, and I was talking with Octo ML, one of your partners, Louise, computer science professor, founder of, they do a lot of training, management around costs. He brought up a couple of use cases where if something can go viral, you could be in the millions of dollars. So managing costs is a big concern for customers. And so the lightweight models that you can maybe get some visibility into the unit economics of the AI is one factor. But if you zoom out, there's a couple of things going on I want to get your thoughts on. One is this notion of a context window, which is a lot around the language and all the foundational models. How much data can you bring in? The tokens that are first in, what's the throughput? Reminds me of the old PC days when you had to look at performance. There's a lot of throughput performance issues that need to be managed and there's costs associated with it. And then there's the apps that are being developed. So let's start with the apps. Then you have AI wrappers type apps where people are wrapping some AI around it with the large language models. I call that the basic website for analogy. You get the web and then the websites became an application for the web to use that analogy. We're seeing a lot of really cool AI wrappers. And I was kind of down on that, but now I'm thinking that's actually cool. And then you have AI native apps. And then you have kind of under the hood platform engineering like apps. Do you see it the same way? And what's your comment on those three app layers and how does Amazon fulfill that? Because you're seeing startups and enterprises deploying those three scenarios. Yeah, so I think the way we are seeing is kind of along the lines of what you mentioned. One is apps that kind of already exist, but you kind of accelerate or turbo charge those apps through generative models. And so you can do things that with those apps that you couldn't do before. Then there is the next set, which I think you're kind of called AI native or AI from the ground up, which is you really are coming up, rethinking the app and embedding generative AI from the get go. And the kinds of capabilities that you can see in those situations are different. So we are seeing both of those happen. And we are doing some of it ourselves, where if you think about code whisperer, it was like from the ground up, like without generative AI, that product isn't really there. So this is just built from the ground up. If you think about health scribe, again, it's built from the ground up. But then there were also other situations where you're embedding generative AI into what already exists and making it more powerful and more useful. So I think we see both of those. Talk about the throughput on one of the big areas again, and these big changes you're going to see. I see a big wave back to talking about the silicon. We've been harping on this for years. Not that it's new to the queue, but new to everybody else. But how fast can you go with the GPUs, the CPUs, compute performance at the silicon level? So you got that infrastructure layer powering this new generation of apps, whether it's context window expansion for tokens on the apps or just managing costs so that you can understand how to do policy based stuff that's familiar to cloud scale. How should customers think about that? What's the best practice from an Amazon perspective, Amazon web services perspective on how to manage the right horsepower, understanding the tuning, not just training, but this has got inference there as well. How to integrate with the APIs in a large language. What's the best practice to manage all the infrastructure? I think it's first pretty use case specific. And that is where having choice and flexibility is important. The next phase, depending on your use case, you may want to have different approaches. In some use cases, the foundation model just out of the box pretty much works well and all you're doing is prompt optimization and prompt engineering. And you can play around in the context windows to allow the model to learn more from the information that you're providing in context learning. There are also situations where you would want to have some fine tuning to get to the right accuracy level. And in those situations, you want to take your model and then tune it and then train it some more on your data to get to the right accuracy level. And what that may also let you do is actually be able to use a smaller model because that model is more specialized for the tasks that you want to get done. And when you're doing that, that brings down your cost envelope. There are also situations when you're really looking to get the highest accuracy. And in those cases, you don't necessarily and you want to restrict the answers to within a particular corpus of knowledge. And in those cases, you may want to go with a rack-based approach where, you're basically taking data sources and you are using retrieval augmentation to generate your answers from there. And then there are situations where we have seen, you may want to kind of use a little bit of a hybrid approach where some of the parts are maybe done by a non-foundation model approach and then some of the parts, more of the interactive Q&A part is done by a foundation model approach. So it's kind of pretty use case specific and that experimentation. And you have to do experimentation. And so I think from an Amazon web services perspective, you will say, look, one size doesn't fit all. One model isn't going to cut it. You need choice, you need flexibility. We agree. We see an integration of models. We think that's the API economy at the next level. I know you got a hard stop. I want to get one final question in. As you look at the use cases of the experimentation on the developer community as they get the good demos and look into production at a low cost and you guys are focused on that. A big question comes up around how to handle the data. So a lot of data modeling is coming in to figure out, you're seeing excitement around vector databases. You're seeing how people are using, you mentioned retrieval, how do you get memory, not memory on the machines, but memory in the retrieval with embeddings and whatnot. This is cutting edge stuff around this. So that changes the data modeling and hence the observability equation. How do you know what to measure? If I change my embedding pattern or change my window and context, there's going to change the results and there's new metrics coming out. Can you share your view on how people should think about instrumenting and looking at this new kind of observability? What's your perspective on this? So one of the things that we have started, and I think people will increasingly have to do this, is there's a lot of work on, okay, let's go look at the accuracy of the model. But just as you pointed out, John, it's not the accuracy of the model, it's the accuracy of the end-to-end solution. And then there are many factors that go into building it. And so you have to measure the accuracy of the end-to-end that can often be different than just the accuracy of the model. And you may actually see the parts interacting. And then you also have to ensure that you have a data architecture that actually provides you all of the basic guarantees in terms of security and privacy and all of that. Encryption at rest and transit and all of that. So I think, we ourselves do a lot of that now, where we're looking at the end-to-end, not just the model performance and ensuring that we are having the right data architecture to make sure that the data is being handled in the right way. All of the basic security privacy stuff, but also things like bias protection and toxicity and all of that needs to be eradicated. I think there's a whole app performance management aspect of this that's going to unleash new things. I think, I know Adam uses the three steps in the 10K race. We like to stick with innings, because we love baseball. It's not even the game yet, it's pre-game. So, Breton, I really appreciate you coming on and spending the time, I know you're super valuable, I know you got some reinvent. Is there anything, reinvent content, is there anything you want to share about reinvent before it comes up while we're here? You know, I think, you know, I look forward to meeting you and others at reinvent. I mean, I think it would be a great, great place. And you know, I'm sure we'll have a lot of exciting announcements. You don't want to share any here? No. We'll take you at reinvent and then, you know, hopefully we can talk more there. I know you guys just busting your chops there. I know you guys are working on it. Looking forward to it, it should be exciting. Thanks for taking the time for coming on our SuperCloud for this keynote. I know you could make it in person. Really appreciate you coming in into our remote studio. Thank you so much. Thank you, John. Really nice being here. Okay. Ratan Saha, VP of AI and Machine Learning, Sir Zated Best, breaking down what's going on is the data modeling and this new era is feeding the AI. It's about end to end, not just model training, there's inferences, there's smaller language models, large proprietary. It's a fusion. It's a collision of innovation at the infrastructure to power these apps from AI wrappers to native applications to whole new sets of integrations. So it's going to be a next gen cloud meets AI. It's the same game, performance at the hardware level and application, whether it's cloud or on premise, it's a SuperCloud either way. Genevieve AI is a hot topic and we'll be back up with the next segment after this break.