 Welcome back everyone, live CUBE coverage here in San Francisco for Google Next 23. CUBE team coverage, I'm John Furrier, host. Rob Streche, CUBE analyst, leading the team there. We've got team coverage, Lisa Martin here. Dustin Kirkland, CUBE analyst. We've got two great guys. We're talking about infrastructure and the TPUs and GPUs. All the things powering the solutions that make all the AI work, which is critical. Everyone knows what's going on with the horsepower needed on the GPUs. And the TPUs, which is Google's signature, Mark Lohmeyer, Vice President General Manager of Compute and Machine Learning Infrastructure at Google Cloud at Sacha Gupta, VP and General Manager of Infrastructure and Solutions Group. We call them the dynamic duo, they're here. Really taking care of the physics and making sure that it's running fast. Guys, thanks for coming on theCUBE. Robin, I really appreciate you coming on. Yeah, we've been circling this interview all week because we just loved the demos yesterday on Do-It AI showing really, like really next-gen AI. First-party apps, crossing over using data on both sides, deducing it, making reasoning decisions, action. It's kind of really powerful, kind of shows the direction of where this is going. Everyone is completely loving the AI, the general AI market right now. And so with that becomes the need for speed, right? So you guys are doing that. So give us an update on what you guys do in front of your groups. You've got two different businesses you're running, Mark. So explain how you guys are working on this real quick. Yeah, sure. So I think we're seeing incredible customer excitement around what they can do with general AI. And because we're responsible for infrastructure, we sort of look at, hey, how do we design and architect these end-to-end infrastructure stacks across hardware and software to meet the demands of this next generation of workloads? And it is clearly an inflection point in computing, right? If you look at the size of large language models, they've been growing on average 10x per year over the last five years. So that's 10 to the fifth over just five years. That's placing incredible demand on the infrastructure. And so at Google, we're really excited to be able to help customers meet those needs with a comprehensive and advanced stack that includes all these great hardware capabilities, GPUs, TPUs, storage, how we bring them together from a networking perspective, but also the software on top of that, right? Support for frameworks, the compilers, all the things that make it easy for AI researchers to not pursue their job. And your business is specifically what? That's in your business. Yeah, yeah. So I'm responsible for compute and machine learning infrastructure. So this is general purpose compute and then the core cloud GPUs and TPUs. discussion, what's your particular? I look at storage, networking and distributed cloud. Got it, okay, great. Let's get out this, make sure we can get the lanes in there. Stay in your lane, no. So let's get into the drivers right now. What are you guys seeing with this event and this market, the key drivers in your business? Obviously the GPUs, everyone's talking about it. TPUs here behind us is quite the impressive setup. They got guards watching those things and billions of dollars worth of gear there. What's going on with the drivers? So it comes back to these workloads, right? If you sort of look at our customers, they're looking to train, to tune and to serve a broad range of models. And against that, they want to have the right infrastructure options for each of those types of models. And so the fact that we support both GPUs and TPUs, in fact we have 13 different instance types across those, gives customers a lot of choice and flexibility to deliver awesome performance but also at the right price point. On the networking side, this has been a big thing, the cross-cloud, just for you to hear the cloud and the cross-cloud, it's not cross-cloud services, it's networking, right? I mean, that's the key differentiation. Yeah, I can explain that. So it's about helping customers get access to the latest and greatest in AI and data services that's reside on Google Cloud. So if you want to use Vertex, for example, if you want to use BigQuery or Spanner, but your data is sitting on-prem or sitting in another cloud provider, connecting that and securely bringing it into Google Cloud so that you can get the best of our services is what customers are looking for. As the cross-cloud network is about, with this strong SLA, how do we provide that connectivity? How do we provide you the best of great security services but also the ability to bring your own? And so that you're not sort of locked in in a closed environment, you can, it's a very open ecosystem that we support. And then once you can have that connectivity fabric really, you can now get access to those Google services and run much, much more quickly. And I think one of the things I just wanted to add is, right next to those GPUs and TPUs, you also need storage that's built for AI and data. And so customers are looking for files, but object storage scale and performance. And they're also looking for parallel file systems. Those are all the great innovations we were announcing this week. Actually, you're the first person to say the word storage to us, so congratulations. We're excited. That excites me because I thought that was part of the message that we didn't hear loud enough was, we hear about all of this in the physics of storage to your point and how you connected together. I'd loved actually the networking, the global networking cross-cloud. That made a lot of sense. And storage is really the foundation under, that's where the data sets, right? So. Storage, absolutely. When you're training these large models, if you make them hiccup, if they have some sort of error in between, if they can't read the data or checkpoint the data fast enough, you slow down the overall effort. And so months and months of work can go away and can get slowed down. And so customers are looking for things like, give me Fuse capability, but I want it on top of GCS. And that's why we introduced Cloud Storage Fuse. So you get files access, whether using TensorFlow or PyTorch, it doesn't matter. And that was an innovation that we had to bring out in storage to help these AI workloads. And so it's incredible. We're partnering with Intel actually on their Deus technology and bringing that into our parallel store product that gives you much, much higher performance, parallel file systems. And that's more for like HPC type of use cases that also leverage ML capabilities. So yeah, I mean, you cannot, I mean, you need storage, you need networking, right next to those fantastic GPUs and TPUs and the software on top. We were geeking out yesterday about this and glad you brought that up, Rob, because a lot of the stuff that's gone on and the hyperscalers that you guys have done at Google, as you've done a lot of work on the physics, you know, architecting from IO to all the subsystems around making that performance work at scale. And now the incomes AI and they need that data and movement. They need faster processors and subsystems. So, you know, this is like the perfect storm in a way for Google to leverage that scale. So the question I have is, what's the new tweak that you guys are doing with the assets of Google from an infrastructure standpoint and what's new? We saw the TPUs, the TPU V5E and the A3 VMs. What else do you guys have? What's, take a minute to explain the current state of the infrastructure and what's new? Yeah, I can maybe start there. Yeah. Because it ties together. That's why you're the dynamic duo. So I think we talked about cloud TPUs, cloud GPUs, the importance of security, the storage and networking. The other area we're investing a lot in is the software on top of that hardware infrastructure. And when it comes to machine learning, you really need to do that at a systems level and optimize the software to work with the underlying hardware to get the performance and the scale at the cost our customers expect. And so we had some really exciting announcements in the software space as well. One of my favorites is something called multi-slice for cloud TPUs. And so what this enables you to do is basically aggregate individual smaller clusters of TPU V5E into a single larger cluster that you can use to then train the largest scale models or serve models with great performance. And so we can do that very cost effectively because you're taking these very, these TPU V5Es with amazing price performance and aggregate them together in these larger systems. So we just see an opportunity to help our customers quite a bit with those software layers on top. The one other interesting announcement we had to highlight there on the GPU side of things is together with NVIDIA, we announced that we are supporting JAX on top of NVIDIA GPUs. And this is a fantastic partnership between the two companies. JAX is an amazing framework for AI researchers. Together, we've optimized that through the OpenXLA compiler to deliver fantastic performance on top of NVIDIA GPUs. And then we're making that available to our joint customers. So the software is super important in these environments too. Sasha, what's new on your end that's state-of-the-art? Yeah, I already talked a little bit about storage with Cloud Storage Fuse and Parallel Store and Cross Cloud Network. But I wanted to touch a little bit on distributed Cloud and build on what Mark shared about Vertex. By the way, that was my favorite announcement. I don't think they got enough press, so definitely. So distributed Cloud, sometimes customers just cannot move their data or cannot move their workloads into the public regions. And so they need something at the Azure in their data centers and they're looking for the latest AI capabilities. And so we already support through Vertex AI services like translation, like speech-to-text, OCR capabilities. We're bringing in workbench capabilities so they can develop their own models easily. And so it's super exciting because they can now also start doing things like document translation, take Microsoft Office, PDF documents, convert them locally without ever having to move that data into the cloud. And so you can imagine any kind of more traditional ML capabilities or as we move to generative AI, we think serving those and fine-tuning those right at the edge or on-prem is going to be super critical. And that's exactly what Google Distributed Cloud is good for. And it also seemed like it was an opportunity to grow the ecosystem, especially with CSPs because I think one of the references was Orange doing that. And like to your point, like having sovereign clouds where France is kind of particular, Germany's kind of particular about certain data can't move out, like financial services data can't move outside of that country. That's exactly right. So we announced with Orange and actually just yesterday we had an announcement about providing services with Google Distributed Cloud in El Salvador. And so similar to the Orange use case, they're very excited about Dataproc running on GDC so that they can remove PII and sensitive data that the regulations prohibit and the rest they can take to the cloud. But they can run locally, analytics locally and filter, run Dataproc and then get the best of the edge and the best of the cloud. Sovereignty is huge, sovereign clouds are hot. Now that's general availability, I think you guys announced on that product, just to clarify. Yes, GDC has been generally available. We're making generally available a new revision of hardware. And so we've made the performance of the hardware much better, reskilled to much more racks. And then for retail customers, we needed a small form factor. So we introduced a three node, one RU system, built for those stores for the next generation of retail edge. You mentioned software on top of the TPUs and you got the ecosystem developing with the cloud there through the cloud. What's your ecosystem look like right now? How would you describe the partners? Your partners, you were going to say hardware, is it only Google, got other hardware, you got other software vendors, ISVs, the GSIs are thrilled by the way on the AI side. They're building their own technology on top of Google. And so you have that wave going on, we call it the super cloud wave, but what's your ecosystem look like right now? What's your vision on how you see that expanding? Yeah, so the ecosystem is obviously absolutely critical and our customers expect us to come to them with solutions, right, not piece of parts. And so those partners are really, really important. Maybe I'd highlight two key areas. First, our compute hardware partners are absolutely critical. So you obviously saw Thomas and Jensen on stage talking about how critical that relationship is and how closely we're working together. We're also working closely with other partners like Intel, like AMD and others to make sure we're able to offer customers choice and flexibility to hardware layer. And then at the software layer, I think it's a fascinating time and we're really, really honored that over 70% of the GenAI unicorns have actually chosen to build their solutions, their models on top of Google Cloud. And so that's both a technology engagement as well as a go-to-market engagement. Many of those partners have put those models into our Vertex AI model garden. And ultimately what this enables for our customers is they can choose the best model for their needs. They can run it on the right hardware platform and help them ultimately deliver the results they need. On your side, I can imagine that the demand is high, especially the cross-cloud networking latency aside, I don't know how you solve physics problem, but bring your own cloud. This is going to be distributed. So clearly that's the path. I mean, people got to be excited to be partnering with you. The partnerships have been fantastic for us. So I think in storage, I mentioned Intel, but we partnered very closely with NetApp as well. We were offering our first-party service built on NetApp for files. And then on the networking side, we announced partnerships at Broadcom, Semantic, as well as Palo Alto Networks. We're taking their security service edge and bringing that into our backbone that helps lower latency as well by 35% as I mentioned earlier. And then on Distributed Cloud, customers want to be able to run things like Elastic, things like MongoDB. We just announced a partnership with SAP right in their data centers, sometimes even fully air-gapped. And so it's a thriving partner ecosystem and we're loving that. You guys as veterans in the industry, got to be looking at this market and saying, wow, this is so much going on. And new changes in good way, architectural changes for customers to rethink how to organize themselves. What you just described is kind of a new phenomenon that we're just going to be jumping into this, okay, I want to get better capability. We haven't heard edge much either in the event, but the edge is huge, right? I mean, I heard someone say, move the workload to the data, not the compute. So that's something that you're going to be doing, but also workloads are moving around. So the whole edge of the distributed nature of the infrastructure, I think it's going to get rebooted. I mean, I see so many use cases where the efficiency of AI, if done properly with the infrastructure deployed in a certain way, will be amazing. Now the question I have for you guys is, what are you seeing as tell signs or signals from the market on any kind of playbook or architecture at the edge or on-premise cloud combo of architecting out the performance? Yeah, I'm not sure it's a one-size-fits-all, right? And so I think our approach is to make sure that we try to understand the workloads, the use case, and provide the most optimized solutions for those, right? And so one of the things we like to talk about is, it's really about your cloud, your way for customers. And so what do you want to run in the public cloud? What do you believe needs to be on-prem or at the edge in your data centers? And how do we provide the networking capability in between to make that as easy and as secure as possible? And I mean, frankly, in Mark and I'm sure can chime in, we find customers who are doing very large-scale training and it's all about our regions with the latest and greatest TPUs. But then we find customers who are trying to do visual inspection inside their retail store and they need inferencing at the edge. And we want to be able to make sure that we have workload-optimized infrastructure for any of those use cases. Yeah, that made total sense with the smaller retail version that you were talking about, the one-rack unit version for retail. I think that inference at the edge is going to be important. I think, to John, what John was saying, we're also seeing other different, what edge is to one company is very different to another company or a manufacturing floor or something of that nature. It's very different than a retail store. So, but. Yeah, no, absolutely. I think Sachin laid out it pretty comprehensively. Maybe just the one thing I would add is the other thing we're hearing a lot from customers is the importance of speed, right? So as you said, I mean, this market is incredibly dynamic. These companies are trying to move really quickly to capture the opportunity. And so anything we can do to help them move faster and easier is hugely valuable. So if you think about, for example, if you can take the time to train a model down from a month to a week, that enables these companies to iterate so much faster. And then you can enable them to deploy that model to the edge so they can serve their customers with lower latency. That delivers a better user experience. And so a lot of this is, how do we help those companies innovate faster with better infrastructure? And the training thing comes up brings a great point. Like, at what point is lag and latency on training? Oh, it was trained yesterday, one minute ago. So I like how extensions and embeddings are coming in to kind of fill that gap between front lines of using data and in a way that's AI friendly and enabled. So that's very cool. I think that's something that's not talked so much as we'll unpack that later on the queue, but new things are emerging that aren't that obvious to the naked eye, so to speak, and be like, wow, that's a left tweak and change things. Yeah, and I just, in addition to the speed, you know, one of the things after the keynote yesterday, I've heard many customers talk to us about is we showed in Vertex AI how we support Google models, but we also showed Lama2, like an open source from somebody else, and we showed Claude, which is from a third party with Fromanthropic. And so they're like, hey, I can come to one place. As Mark said, I can move quickly. I know I can stay on top of things. And it's not just about Google models, it's completely open. We're supporting OSS and third party in the same place, and that's incredible. I told Webb, this is called OpenGarden. Whatever you do, don't call it WolfGarden. Don't call it a WolfGarden. I love the ModelGarden, I love the ModelGarden. I love the ModelGarden, that's a great, and it's easy to understand concept-wise, even though you've got your foundation models in there. I got a request, I got a text, Mark, if you don't mind from someone who's watching, texted me and wanted you to explain, they're not familiar with TPUs, Tensor Processing Units. They wanted to know what's addressing the TPU and a GPU relative to all the AI stuff. Would you mind giving a quick tutorial of what a T, a T, Tensor Processing, your T and P is, and how does that compare or contrast with a GPU? Happy to. So let me get a little context on TPUs first, and then we can talk about TPUs and GPUs. So, you know, I think it was eight or nine years ago, someone at Google asked an interesting question, which is, what if we wanted to enable everyone that is using Google Search to interact with Google Search through voice for one minute a day? How much processing power would that take, right? As it turns out, it would take more than double the total amount of compute processing power that Google had deployed up until that time. Like so massive, right? And unfathomable, to be honest, and that was just for a minute a day. And so that sort of insight led, you know, a great minds at that time to think about how we needed to architect and design things differently to meet the needs of this next class of workloads, right? And it was sort of that insight that led to the original TPU V1 many years ago. Since then, we've iterated five different, through five different versions to TPU V5E today. And every step of the way, you know, designing them to meet the needs of the workloads at that point in time. So we're really, really pleased to be able to offer that to our customers with, you know, with TPU V5E, the goal was really to make it more accessible to a broader range of customers and use cases. Now on GPUs, of course, many, many enterprise customers love GPUs, they love NVIDIA, they have built and optimized their models to work on top of that. And so we also partnered really closely with NVIDIA to make those GPUs available. Ultimately, these customers are very technically sophisticated. And so they look at the specific needs of the workload and they can choose between different flavors of GPUs and different flavors of TPUs to meet their needs. One final point is, as a result, we're starting to see more and more customers use both. So one example of this is character AI. If you've ever, if you haven't played around with it, it's really pretty cool tool. But they're going to be leveraging both our A3 NVIDIA GPU instances, as well as our TPU V5E for different use cases in their workloads. Awesome, thanks for explaining that. Guys, thanks for coming. I know you're super busy. Last 30 seconds that we have left, each of you quickly give a summary of how you see this event, what it means for your business unit and message to your customer. Yeah, go ahead. So it's all about AI and data. And when you think about AI and data, you need to think about the software, the infrastructure, the entire stack and how we make this easy, how we make it secure for customers. So as you said, you can't forget about storage, all right? If you forget about storage, you're going to be burning SSDs all day long. It's not going to work. So please think about storage and we've got the right high-performance storage solutions to support the AI and data workloads. Same thing on networking. Your data is in different places, perhaps in silos. How do you bring that together to get the best of our products like Vertex AI and BigQuery? And don't forget about the edge. If there's some workload, some data that simply cannot move, we've got a form factor, we've got services, we've got connected air gap options to help our customers. And so we're looking at, you know, we're really excited on just continuing to work with customers and help them along this journey. Yeah, no, I just said, I think this is an inflection point in computing, right? If you look at the pressure and the requirements that this next generation of workloads is going to place on the infrastructure, already is placing on the infrastructure, it's unprecedented. And so to meet that need, we fundamentally believe you need to take a full stack approach, look at how you optimize across software and hardware, across compute network of storage as we all, we're just talking about. And by doing so, you can meet the needs of that next generation of workloads. And so, you know, we're really excited to be able to work with so many great customers to try to help them achieve their goals. Speed, power, architecture, the new ways, setting up the AI table, so to speak. Set it up for the next gen cloud solutions. Thanks for coming on guys. Mark Soskin, thanks for taking the time. Two of the dynamic duos here, leading the businesses of giving us more horsepower. It's like Star Trek, Scotty, more power, you know? Thanks so much guys for coming on. Thank you. I think the Star Trek joke in there. Okay, live cube coverage here. We'll be right back with more. Day two of three days of wall-to-wall cube coverage. Team coverage, I'm Jeff from Rob's Stretching. We'll be right back after this short break.