 Welcome back everyone to theCUBE's live studio here in Barcelona, Spain. This is theCUBE's live coverage of MWC24, formerly Mobile World Congress. I think they kind of still call it that, Dave. I'm John Furrier, your host. We do Dave Vellante and Shelley Kay. We still call it Mobile World Congress. Wrapping up day three with a great segment from Qualcomm leader. Obviously everyone knows the name of the company. They do a lot of great work. They know telecom, they know devices. They got the chips. Everything about wireless and mobile, they've been in the game for a long time. Zia Dashkar is here, SVP, senior vice president, product management, head of AI technology roadmap. He's got the keys to the kingdom. He's going to share that with us. Zia, thank you very much for coming on theCUBE. Really appreciate your time. Thank you for having me. Great featured interview to end day three. So first of all, I got to just throw it out there. Product roadmap. That means you can see the future. Product management means you got to deal with the reality of today. Absolutely. Which is a challenge because it's a lot of hype but the reality is matching up to the hype pretty quickly. So I'd love to get your perspective right out of the game. Where are we with AI? Besides the fact that I was talking about it, reality, what you're working on at Qualcomm and how that relates to real world. Yeah. So it's been amazing, right? Last year at Mobile World Congress, we started by showing something called stable diffusion, which is a text to image model. And we were able to run that seven billion or one billion parameter model entirely on the device. In this one year, we've made amazing progress. And if you go today to our booth, what you would see is we're running a seven billion parameter model. And it's a model that basically can understand images. So it's basically a lava model. We have done in the meanwhile, large language models like Lama2. We have an engagement with Mera. So just think about it, right? The chat GPT like model is a 175 billion parameter model. And what we're running on the device, seven billion parameter model, the models are improving and becoming a lot more capable and smaller. And that's opening up an amazing opportunity for on-device processing. It's literally like that the center of gravity of AI is shifting towards the edge. And that's why it's so exciting for us that we are actually seeing devices launching with AI enablement. So if you go to the Honor booth or if you go to the Samsung booth, these are devices based on Snapdragon 8 Gen 3 from Qualcomm that actually has use cases that are running today on the smartphone. Things like translation, things like, you know, circle to search, things like being able to do transcription, to be able to do summarization, amazing utility applications that people are using today. So I'm very excited about it. You know, it's interesting what you just said wasn't even possible a few years ago. But then when AI hit with chat GPT and everyone saw that value, it took massive resources of GPUs, back-end cloud operations to just do some of the things you're saying. Now the trend is what it means is it's going to the device. Which means that's an offload from a resource now, energy. So a little save the planet action there too in terms of sustainability. Are we just getting started? Where do you see this going next? I mean, how fast is it moving one? And when do the devices catch up in terms of processing at the edge, in this case the device, with some of the back-end stuff? Hold on, I want to jump in here because last year, at this show, you were talking about stable diffusion. And I remember going over and getting the demos and it was just like mind-blowing. But it was very interesting because people were kind of scratching their heads about. What do we do with this? On device, how is that going to work? How does that make sense? So like you're saying a short one year later, the amazing use cases, I mean it's really fantastic. Really fantastic. Honestly it's very amazing and encouraging to see when you work on a technology and to get it this quickly into end product is just superb. But I think there's very good reason for it. What do we see happening, right? A lot more people are coming in and using Generative AI, number one. A lot more applications are coming for using Generative AI because there's amazing amount of benefit that it brings. And because of that reason, as that scales, how do you cater to billions of people, not few million people? Just like with the internet, right? Before the advent of smartphones, internet was available to a very small number of people on the PC and we were able to grow it massively once it came to the smartphone. It's very similar. Generative AI may have started on the cloud, but it's transitioning to the edge. It's coming to the device and that brings in with it amazing amount of benefits. It democratizes the technology. It keeps what is private private as you process that data on the device. It does not go to the cloud, which means it stays private. Number three, what it allows you to do is to be able to do that fraction of cost compared to what it would take you if you were doing it in the cloud. Remember, this is running on gigantic engines, graphic engines in the cloud, burning hundreds of watts of power. But with Qualcomm technology, we are able to run it in milliwatts of power. Amazing sustainability benefits. And then lastly, what is really amazing is that you can now personalize the generated AI experience in a very big way because there is information about each one of us on the device, information that stays on the device, but never leaves the device. I always give this example, right? If I go to a large language model and ask a question, that question, the answer should be different when I ask it versus when my seven year old boy asks it, right? But right now it's the same answer. But if I do this on the device, the device knows who's asking the question and I can give you a far more pertinent answer. So what this really means is the generated AI experience on the device is far better and in many ways more complete than what you're able to get on the cloud. So I believe that. By the way, it's not always the same answer because sometimes it hallucinates and gives you a different answer but that's not what's standing. I want to ask you about the parameter wars because I've read and talked about what people say, well, when you go from 500 billion parameters to a trillion parameters, it really doesn't get much better, et cetera, et cetera. Now you're scaling it down from a billion to seven billion. You must have seen a big difference between one billion and seven billion. Help us cut through the marketing hype and understand that sort of delta in terms of parameter size and what we could expect as users. Yeah, I think what we are seeing is that when a new model comes up, it's usually fairly large. Then people start to do curation. People start to distillation. They do pruning. They do quantization. All techniques that we have honestly perfected at Qualcomm because well, we were all trying to run these models on very small handheld devices. So those things are allowing us to be able to do things that others cannot do. Number one. Number two, what's happening is when the new modality comes, the model might be very large. But over time, people choose the right amount of data and that smaller model is still very performant. I gave the example of chat GPT-3 versus say Lama-2 from Mera. If you look at some of the comparisons, they come very close to each other. Even the one is 175 billion and the other is a 7 billion parameter model. So there is this new advent of what people are calling SLMs or small language models. They're still very performant, still very capable and that's what's allowing this great transition to happen. Another thing I would like to mention are what we're calling domain specific models. So you know, you can train this model on reams of data on multiple books and all. But what you really need is for example, let's say a medical application. If you train the data only on that medical data, that model is going to be smaller and it's going to be less prone to hallucinations. It's going to get a better response and I think that's where the technology's going. The question I want to ask you on that point is you mentioned Lama-2 and catching up. The proprietary models have great penetration but the open source ones are catching up in capability and adoption. Okay, we totally agree with you on the power law that's going to happen both on volume and also specialty. So we see that exactly same. The question is if that continues of open, you're going to have the scale and the low power on the device, the developer market's going to be ripe. So I'd love to get your thoughts. What's your vision on the developer market coming in here because as this continues to move down the track, the device is getting smarter, faster, cheaper, open, scalable, distributed. This is exactly what we are seeing by the way. So you really pointed to the trend that we are seeing and as a result, what you've done here at Mobile World Congress, you've launched the AI Hub. The AI Hub is a great tool for a developer. If he wants to take any of these innovations that we have done, we have placed about 80 curated models. They've been tested on our platforms. You can go to the Snapdragon AI Hub, you can go to GitHub, you can go to Huggingface, download the model, it creates a binary, you plug it into your application and voila, right away you're actually able to launch an application. We want to reduce that friction for a developer to be able to take these experiences onto a device. So AI Hub like tools, we're seeing amazing response with that because people are seeing that benefit of being able to take them quickly to a device. So real quick on the clarification, AI Hubs where the developers go, the news about the on-device gen-air, that's the capabilities you were just referring to. So it's the combination of the on-device enablement, AI Hubs where the developers will be warring hold for the developers and that's where they do their stuff. But for on-device to your point, what we're doing is we're continually moving the bar. We're making the AI engine inside our products much more capable. The CPU, the GPU, the Hexagon NPU, they're all becoming far more capable. We can do larger models. We can do more AI processing at lower power consumption. And I think, and along with that, an amazing AI stack that allows you to be able to do this. Think about it this way, right? NVIDIA and other companies do a great job on the cloud side. What we are driving is basically the power of inferencing on the edge with the best in class, hardware, software and assets for the edge processing. Constraints on the old days were motherboards and the sides of the boards and the chips. Now, power is a constraint. You mentioned power, huge issue. So open check, I think that's going to end up being most of the long tail and the neck and torso. Maybe some proprietary models will hang around and be there because the size needed to get them. But if the open continues, the low power is going to be the enabler. Where do you see that being going? What's the impact of the power piece? Yes, I think it's amazing in the sense that, think about it, let's say billions of people running these models and billions of use cases. You run this on the cloud versus you run the edge. Just look at the amount of difference it makes from a sustainability perspective. I recently saw an article which is talking about the fact that to run an LLM, you need like almost a bottle worth of water to cool those systems down. This is one LLM, now just scale it over. Advantages are massive and open source models becoming as performant as closed models is a huge trend. And I think that will really open up these capabilities to the masses. There's a lot of benefit that those companies that are opening up are able to derive because people are building on top of the work that they have done. So we're going to see this trend continue, more on device, center of gravity, shifting much more towards the edge, more performed models coming, more open source models coming. And the power aspect really is what allows this to be available to everybody, not just a few select people. Well in that I think is the beauty of the Qualcomm AI Hub because you've long been known for your developer support, your developer ecosystem and all that. So this is yet another tool for developers to allow them to create and innovate and scale and everything else. Absolutely, by the way we're seeing a great response already because people can take these generative AI models like stable diffusion, it's there over there. You can pick it up from there and readily create an application. And what I'm excited about is there are many applications that we might not even thought about when we were designing these chips. But I hope that developers are able to come up with those amazing ideas. That's why we do this, right? What's the difference between a Hexagon NPU and just a regular old everyday NPU? Yeah, so think about it this way, right? The AI engine that we have inside our chip, there are three engines that can do an amazing job of AI processing. There's a CPU, there's a GPU and the Hexagon NPU. CPU is great for something that are relatively smaller models that you run quickly and go away very latency sensitive. GPU is good for heavy models, high precision models. But if you're talking about models that are running nonstop, if you're talking about pervasive AI, for example a co-pilot like application where on your PC, the AI model is continually looking at how you use the device and learning from it, if it's a sustained use case like that, you need an NPU from a power perspective. Number one, Hexagon NPU is special because we have built it over time over the last six, seven years that I've been driving it to incrementally be able to do more specialization, better performance on transformer networks, to be able to do much better scalar vector and tensor processing, which really maps very well to how a neural network looks like. And that allows us to be able to do more AI processing than anybody else out there from a performance per watt perspective. And that's the- Yeah, performance per watt is the key metric. I mean, you're absolutely right. You know, sometimes people just talk about tops, but it's not about tops. It's about inferences per second per watt. Tops are cool, but that's cool. I agree. But performance per watt. Okay, so if that happens, if the low cost happens, then you have this distributed model we were kind of talking about before we came on camera. You got the combination of an edge, now device with AI hub, developer platform, and you got the centralized, say core cloud and more on-premise, whether it's a GPU cluster or whatever, big clustered system. What is that, how are we in the progress of that distributed system that you think we are? Where are we in the progress bar of advancement? Are we there? Good enough? Yeah. A lot of experiments going into production. Maybe some people are overhyping that, but when does it go full-scale production? When you start seeing that distributed environment, where are we on that certainly edge? You know, we're calling this the hybrid AI concept. What this means is that today you can do perhaps the smaller models on the device and then the larger models you basically run in the cloud. That's kind of how it is today. But what I envision is a far greater collaboration, almost like a symbiotic relationship between the cloud and the edge. So for example, what you could do is technique like speculative decoding, where part of the processing is happening on the edge and remaining processing is happening in the cloud. And that really takes off a lot of the burden from the cloud such that you don't have to do as much processing on the cloud. But the consumer still sees the benefits of both and that's where we want. We want the best of both the world there. Since you brought up speculative decoding, I got to ask the question around how you see the interaction of two things. API, connection points, and how foundation models interact with each other. Whether they're LLMs on LLMs or multimodal graphics on graphics, graphics on, there's going to be kind of a generative, maybe as an assembly or compilation in the future. Compiler maybe? I don't know. I mean, I'm a little bit out there in the dream state, but where do you see that going? I think what's happening is people are creating this orchestration layer where the law of this orchestration happens, right? And in this particular context, what you could do is, for example, with speculative decoding, there is a notion of a draft model. You can think of it as a student model, and then there's a target model or a teacher model that's running in the cloud. The teacher model is much larger than the model that's running on the edge. But with the two working in cahoots, you're actually able to do a lot more processing cheaply on the device with the student or the draft model. Pass on that information to the cloud. Cloud does the assessment of, hey, did it do a good job or not? But it does not need to do the full processing on the cloud. So that's where you get the benefits of both the worlds, right? Very, very powerful thing. And then what's happening is the models continue to evolve. Multimodality is something we're actually showing already running on the device. See, the way I see it is in the past, you had something happen on the cloud. Used to take a long time for it to get to the edge. That gap is closing in a very, very fast way. So speculative decoding is like we're back to client server. So the time to do the inferencing and get an outcome is less. It's more efficient when you're not changing the outcome. I mean, the pendulum we always talk about swinging, it is client server for AI in a conceptual way. It's a very good metaphor that you bring because what happened in that case, right? Something started with the server, moved to the cloud. Very similar from what I think will happen with AI. A lot of the inference is starting in the cloud, but we believe we'll come to the edge. It's actually absolutely right on distributed computing. If you think about just other paradigms, storage, data going to say a tactical edge and whether it's war fighting or edge, there's a difference between highly available and high availability. That's right. Which is the data go fetched here or making it available all the time at the edge. So you're getting at this notion of, hey, I'll offload to the cloud, but then I'll bring all my needing availability at the edge where I need to have it right there for low latency touch points. I mean, similar concept. I mean, it's still storing things. Absolutely. It's exactly like the heterogeneous compute idea that I talked about. You have head compute within the device with CPU, GPU and NPU. You can think of cloud as an extension of the head compute idea to a great extent, right? And you use the right engine for the right workload at the right time. A lot of chat about the LPU now. We added a new XPU and LPU. What are your thoughts on that? Are you guys participating in that activity? You think it's an LPU, the language processing unit? You see that as a real opportunity? Do you guys have a solution there? I think it's a massive opportunity for us. I think from a Qualcomm perspective, if you are able to do this language processing and natural language processing on the device, it opens up so many amazing things, right? Just think about it. Let's say you're waiting a AR device. You cannot interact with that device by typing. You got to interact with it by natural language processing by talking with it. And as we make these models smaller and you're able to run it lightly on a glass-like form factor, it completely changes the experience that you can have. And I think that's what we are really looking forward to. And then to add to that, see today what happens with VR. A graphic designer sits there, creates some assets and all of us see the same assets. But what if I could create that asset on the device based on the colors and the textures and the themes that each one of you like? It changes the experience entirely. Yeah, personalized. Exactly. Zia, great to have you on theCUBE. Great insights, again, feature interview to the end of the day. As we have maybe a minute and a half left, I'd love to get your thoughts since you run the roadmap. This is where you got to pick your favorite children. Okay? So, as always, something has to fall off. Someone has to be prioritized. You got, again, the future is the roadmap. The reality is product management. Where are you on the map? Can you share a little bit of color on what you've got focused on? What would you like to see come down sooner? What's out a little further out? Give us a taste of what's on the product roadmap for Qualcomm. Yeah, I mean, I have some favorite children. Let's just go that way. All right, let's hear it. So, basically, the key idea over here has to be, look, it has to add greater value to the consumer. So, I'm really very focused on the use cases that materially move the needle. We can do incremental stuff, but I keep telling my team we have to do and focus on those things that are new use cases. New experiences. And I think this is what AI is great about. It's able to take away the mundane, the routine, and really offload humans to do what they do in an amazing job at innovation, at novelty, right? At new ideas, innovation, right? That's what I want us to be able to do. And then what I'm also very focused on is there are some interesting societal benefits that we can bring. So, I'll talk about two use cases, which I really like, right? Today, you sit in a classroom full of students, and each one of them is getting the academic material delivered to them at the same pace. You know, what if they are using a tablet to get that data, and there's a camera on top, it gets a very good idea of how much of that material a student is absorbing. And what if I could change the pace of delivery of that material, right? One student is getting it better, one student isn't getting it better, I slow down how I'm delivering the material because I have generated AI. So, I'm actually changing the lesson plan on the fly. Personalize. Changes the way people are able to get. I think personalization is one of the, that is a great example. I think so. Everyone think about right, many countries have elderly populations, populations that are aging. Can you imagine that you have a box sitting over there that's actually able to aggregate all the sensor data that's coming in from these sources? And what if it can change the therapy? What if it can change the remedy for that patient or that person without actually having to go to the doctor? Or it can actually predict certain things that might be happening before they happen? I think those are the things that really make generated AI very, very interesting to me, because those are the benefits that we can bring to society as well, instead of just creating certain content. So there's a lot more. Well, there's new business models that come out of it too. There's new experiences. I think that makes the end user go, wow. Yeah. Exactly. That make that wow moment brings the AI to the table rather than just a better mechanism. Yes. You know, just some better software. On that point, I mentioned this in one of our summits and I was talking to somebody today, right? Let's say today you want to book a restaurant reservation. You'll go to Yelp, find the restaurant. You'll go to the map. Is it closed enough or not? You'll go to your calendar and figure out what time works for you and then you'll basically get the reservation. What if you have a virtual assistant sitting on your device that you say, reserve a restaurant for me tomorrow? It basically figures out your calendar. It figures out Yelp. It finds a place close to you, goes to open table, makes the reservation for you and you're done. See, this is the kind of human machine interaction that I think that we can enable in the future. Exactly. You guys have a unique perspective because you have the device form factor, but also the device now changes to anything the user interface with, a TV, a plasma, any kind of interface, whether it's on a plane. Camera. You know, Samsung showed, was talking to Samsung, they have this demo where in first class you get a little dome, a sphere over your head and you can look at whatever you want, personalize your trip. Absolutely. Watch the sunset. Absolutely. So as you get this interface, the data you're getting is going to be interesting. So the data advantage is huge. Yeah. I think data advantage is huge and it's at multiple different levels, right? The benefit of the data on the device is the personalization aspect. There is a benefit of that data at the telco level. There is a benefit of data at the cloud level. And I think that's the real interesting part. The number of moving parts in this area, like we were discussing, it's a very interesting field. You got new models coming, new model topologies coming. You got new compiler technology coming. You got new hardware technology, new software technology, new tools. It's just, for a technophile, it's a great area. It's a small concept. You're in an amazing position with a highly differentiated strategy. I would ask, how much of that was luck, or was it luck by design? You know, there's a saying. It says, chance favors the prepared mind. And I think that's what has really helped us, right? So we knew, we anticipated these larger models so we're already working on things like quantization. People are running these models in 16-bit float. We're running it in four-bit integer. I can do it far more cheaply. I can make a much larger model fit on the device than what others can do. So we really have prepared for a lot of this and we're ready and we actually are preparing for the next gen now. And I tell you, we're just really getting started. So there's a lot more techniques, models, capabilities, use cases that are in the play. We're going over, as usual, on these great interviews and insights. Final question, final, final question. If you're a product person or an engineer, share your thoughts on how cool of a time is it to be right now in this industry. I mean, it's, I mean, opportunities, but this challenge is problems to solve. Yeah, but I mean, these are really interesting problems, right? I was telling somebody that, you know, you're coming into work and you're actually excited and I'm actually kind of happy to pay me for to do some really interesting stuff. And honestly, I wasn't working. I'd still be looking at a lot of this stuff. Anyways. You're a nerd like us. That's okay, Zia, you're geeking out. Total alpha geeks here. Zia, thank you so much for your time. Thank you for having me. Yeah, that's great. S C, the vice president of product management and he runs the roadmap at Qualcomm for AI, great stuff. And wrapping up day three here on theCUBE in our CUBE studios in Barcelona and MWC. I'm John Furrier with Dave Vellante and Chilly Kramer. Thanks for watching. Thank you.