 Good morning, AI fans, and welcome back to Super Cloud 5, the battle for AI supremacy. This is special event coverage from the Cube's Palo Alto studio here in California, in addition to our editorial team on the ground in Las Vegas, Nevada for AWS re-invent. I am joined by my magnificent and brilliant co-host this morning, Lisa Martin. Good morning, Lisa. Good morning, darling, how are you? I am great, I really enjoyed our banter to start off the day. How was that fun? It really was fun. Smith versus reality, brilliant title, by the way. Thank you. Well, I think that's really what we're all about here. We want to facilitate the dialogues that are not just the same hype that the 100 tech boys all enjoy talking about. But the tech and the enterprise software and the hardware that's going to power the future for the millions of us all around the world. Absolutely. Which is why I'm particularly excited about our next guest. Jonathan Ross first joined us in Denver at Super Computing 2023 just a few weeks ago. But we are thrilled to have the CEO of Grok with us back in the studio today. Jonathan, thanks for coming back to hang out with us. Thank you so much for having me. It's a pleasure. We had such a good time in Denver. We needed another round. But just in case folks haven't had a chance to review that video at thecube.net, what does Grok do? Well, we accelerate AI. We make it fast. And I think we're going to get in a little trouble today by breaking a couple of speed limits. And we've got the radar out. I think we'll pull. This is, you know, we've got the three hashes on this track. You can go as fast as you want. We'll treat it like the Audubon. No big deal. There we go. So we talked, inference is a real key here. And you talked about sequential problems when we were chatting before. I want to take people on the journey from how you started Grok a bit to then get into the meat of where we are today. Back in 2016, you didn't start by building a chip. In fact, you banned it. You started by building a compiler. And then you built a factory. And then came the rest of it as a company and a person great at sequential problems. How did you know that was the right sequence of events to approach solving this? Well, so my background, I actually started the AI chip at Google. Casually. Well, it was a side project. It was unfunded. A side project? Yeah. OK. Hold it 20%. Impress him. He invented the TPU just as a hobby. NBB. I need new hobbies. NBB. Even the whole production studio is giggling with us. Yeah, yeah. Just a humble brag over here from Jonathan Ross. Ladies and gentlemen. But I didn't make the software for it. And the DeepMind team had made a software program that could play the world's best players in Go. And they lost the test game. And they needed a little more power. So they went to the TPU. And we went from losing by quite a bit to winning by quite a bit, as was seen when we played Lease at All the World Champion. Yeah. And so the realization was inference was going to need a lot of compute. Was that a turning point? Did you know that before that illustration? No, no idea. And so when that happened, was that when the seeds for grok started forming in your head? I think that's when I realized we weren't going to have enough compute. There were going to be the haves and the have nots. That's what's actually happened. And I wanted to make sure that everyone would have access to AI, that there wouldn't be the haves and the have nots. People use the term democratization. We discussed it even in Denver. How do we do that? I think there is a lot of hyperbole around that right now and a lot of good intentions. But what does it mean to make sure that there aren't haves and have nots in AI? That's a great point. I would say AI is inevitable. But human agency is not. And I don't mean that in AI taking over. That that's not going to happen. But I mean in dictatorships, totalitarian regimes, people trying to influence. And our mission is to make sure that that doesn't happen. It's to make sure that everyone has access to AI and that we all can benefit from it, that we continue to have human agency. And so we need to be essential to the infrastructure to help guide that. Talk about what makes grok unique in that sense of guiding it in a way that, you know, Savannah and I were talking in our open, you know, about AI myth versus reality. And some of the things we were even talking about yesterday is being in technology. We get to hear so many of the great use cases for AI, what's happening now. The mass media is talking about a lot of the doomsday. And so the doomers, yes, the general public is very fearful of that. How can grok help start mitigating some of that fear and showing the world all of the positives that are already coming with AI? One of the biggest things is we keep thinking of AI safety in a defensive way of we must stop the negatives. Yes. But I was just chatting with an AI safety expert this morning and as we were talking, we came to the mutual conclusion that we need to play offense. Think about it this way. Imagine if you type Aquarian and it's a query that goes to a negative place and it guides you. It says, you know what? Why don't we think about this a little bit? There's subtlety and nuance to your questions. Not as simple as you think because the solution to a lot of strife between people is that subtlety and nuance. And we've never had the ability, like just imagine if today you could have some of the smartest people in the room when people are trying to do bad things and just going, no, no, no, hold on a second. Let's think about this. So I think we have this ability to get on the offensive with safety and culture and make sure that people are seeing the full picture. I think you're absolutely right. And I think I'm not even intentionally making a pun here, but it's gonna happen regardless. When we're talking about language and lexicon, it's subtlety that makes a really big difference. It's like those analogies we always see where there's a comma in the sentence and without the comma. And some of them are a little inappropriate. Some like to say them out loud on the show, but you get the value of punctuation, right? I think what you're talking about is the subtlety or the importance of subtlety and nuance within these models and the ability to look across this perspective and give someone potentially a less biased perspective than they're currently having. Double plus on good. Yeah. 1984, right? The simplification of language is the enemy of communication between people. Well stated. Okay, how many soundbites do you think we can have over the course of the next 20 minutes? Let's set a record. Let's do it. I think, yeah. Challenge accepted. I really love that. So Grock's LPU, Language Processing Unit, another creation of your fine brain will provide 10x higher throughput, 10x lower latency, and 10x lower cost than current industry standards. That is no significant amount. That is optimizing compute on three different axes. It's unbelievable. Simultaneously. If you weren't sitting here, I'm not sure I would believe you. How do you do it? Well, we've built a factory for AI. Rather than designing a single chip that runs a model, we built a system. And that system is capable of running these very large models completely inside the chips in the memory and just communicating small bits of data between them. And so when we run a model like Lama 270 billion or something like that, we're running that on hundreds or even thousands of chips. And the more chips that we add to it, the more economical it gets per chip, just like a factory. The bigger the factory, the more economical. And so we flip the whole script. It's like we've brought factories and mass manufacturing to language, to large language models to AI. Which is exactly what it means, right? AI needs scale to move the fastest it can. Absolutely. Is that a competitive differentiator? It's a factor that you build. It sounds quite unique. It's very unique. So the largest system that we're aware of in production today for inference is 24 GPUs. Except for us, we're at 576. Wow. I'm not the fast mathematician. There's a bit of a difference there. And growing. And growing. 20X and growing. Oh my goodness. So we talk about faster, cheaper, obviously compute and power, big conversation across the board this week and in general and whole AI conversation. You said something in an interview recently that really stuck with me that I've been thinking about this morning. Rather than making claims like a lot of competitors are doing in this space or folks trying to add to the noise, if you will, you solved an unsolved problem. You knew that it wasn't just about faster or cheaper. You knew that it was gonna be necessary to run those models. And you did that. So folks are finding themselves not able to run their models on certain systems outside of Grock. So we've solved the software problem. I know, crazy. So everyone else is struggling. Another casual statement. Well, you said we banned the development of chips at Grock for the first six months. That's true. We actually banned whiteboards. We banned everything because people kept drawing pictures of chips and we needed to work on the software first. So we worked on the compiler. Once we had the compiler working, we designed the chip. It's an amazing discipline. It is an amazing discipline. How did you do that from a leadership perspective? Great question. Creating a culture that was really thinking, saying thinking outside the box is not even a fair statement here. But how did you do that from your leadership position and get the folks on board to go this? Yes, outside the chip. We kept the team small. So we would not have been able to build what we built if we had thousands of people. So right now Grock is 180 people and we built our own chip. We built our own networking. We built our own system. We built our own runtime. We built our own compiler. We built our own orchestration. We have marketing. We have sales. We have HR. We have all of these things in 180 people. That's what most people refer to as lean, but actually in execution. Can I ask how much capital you've raised and how you've been able to do this for at such scale with such a small team? So we've raised 400 million. Building chips ain't cheap, but that's probably one of the smaller amounts of any company in the space. But we've actually got chips working. We're actually ramping our production. And we have the software. We have over 800 different models. Most of them just downloaded from Huggingface, took the PyTorch, compiled. No humans modifying that code. That's impressive. Speaking of where you're headed, I'm sorry, is it a hundred million? One million chips. You're looking to have one million. In 24 months. In 24 months. You say that with such confidence. You feel like the machine's ready and oiled and you're about to start pumping them out? Oh, absolutely not. I mean, we're gonna be sleeping under desks. That's the hardest thing we've got to do. But the good news is, we know that we can get those chips to build the systems. It's the capacity. It's that 99th step before the hundred. That's awesome. Yeah, we've got all the hard, like we've got all the, I should say we've got all the easy parts. We have the capacity. Now we have to do the hard part of actually doing it. I wanna dig into inference a little bit because that is one of your core differentiators. It's how you're able to do what you do. Why is this such an important process in machine learning? So when you train the model, that costs you money. When you put it in production, you make money. Inference is when you make money. Training is when you spend money. So inference is pretty important if you're running a business. So the way to think about it is when you were developing software, you would compile it. That's training. When you deployed it in production, that's inference. And so for machine learning, we've finally gotten to the point where we have models that are useful. Up until now, it was all about trying to get a model that worked. Now we have them. And this always happened. This is why we built that chip at Google originally because we had trained a model. It was better than humans at the task. We couldn't afford to put it in production. Yeah. So the whole conversation this week, the battle for AI supremacy. Obviously we're not quite at a point where we have a supreme AI leader. This is starting to sound darker than I expected to as I say it out loud. But I do think that there are some players or at least some brand names that we're hearing with greater frequency than others. There's also a ton of noise and a ton of hype. NVIDIA obviously being one of the more popular brands when it comes to compute and power. Sitting next to you I've got to ask, is there finally an alternative to NVIDIA for LLM inference? Well, why don't I let you be the judge? Oh, I love that. You ready? Yeah, we're absolutely ready, Jonathan. You got something to show us. I do. We'll have the wonderful team pull it up. So here we are looking at Grock. All right, so this is Lama 270 billion. This is the largest large language model from Metta. And why don't you ask it a question? Should we ask it if there's an alternative to NVIDIA? It's good to say no. For LLM inference? It'll say no. It actually, I tested it this morning. Well, let's see what it says. Let's find out what Grock says. Well, it's already done. Whoa. That's our brand. So this was, we did this deliberately. This is Lisa's first time experiencing Grock. You had the exact same response that I did and that apparently everyone does. You said that wow is your brand. Everyone says wow. Lisa, why don't, and this picks something random. Oh, we were talking about cupcakes, cooking and dogs yesterday as an AI potential business model. Let's ask Jonathan something fun about that. Oh yeah, here we go. Perfect. Oh, we got a recipe. We got a recipe. Pupcakes. Well, you got more than a recipe. Pupcakes. Rolled oats, very healthy. Oh my goodness. So now we just got a, And how can we make this better? We just learned how to cook cupcakes for puppies, correct? And to be very clear just to the audience is where this is all real time. None of this is prerecorded. We're showing you Grock and Jonathan's screen in real time. This is phenomenal. If you have even googled something in the last few weeks, the results are not as quick as we are seeing here on Grock. 330 tokens a second is where we're at. I can confirm I was playing with it last night. It's absolutely true. You said something that I thought was really incredible and also indicative of when things really change. The paradigm shift really happens in technology. You said that it's so fast, it's a different experience. And I couldn't help but think back to other technological and industrial and manufacturing revolutions that we've been a part of. And I'm wondering if this isn't a little bit of a Henry Ford moment when we're going from the bicycle to the automobile. Is Grock the motorcycle ride into our future? To continue on? This is the motorcycle for the mind. Yeah. Ooh, the motorcycle for the mind. Oh yeah. This is, my cheeks hurt from smiling. I am absolutely making cupcakes. You make cupcakes? Yes. How do you deliver this with such speed and confidence as well? Well, is that AI? So our system is very reliable. Our background, we came from building very large data centers and very large systems and deploying them. And so the confidence is there because we built a system that doesn't have most of those components that tend to fail in a GPU. So we also have redundancy in the system, resilience, we can do failovers, you name it. So this will be one of the highest reliability systems. In fact, you may have noticed through no fault of those who are actually building some of these services now, they're off and down, the hardware is just not that reliable. This is very reliable, but it's also very fast. And it's fast because it's deterministic and it is synchronous. So we know at the start of generating a token exactly to the nanosecond, how long it's gonna take to generate. How do you avoid hallucinations or dial them down? That's not us, that's the software. However, there are ways to do that. Got it. One of the simplest ways and one of the use cases, you asked about use cases that we're being asked to do is content moderation because there's a speed element. Yes. And so now imagine if every time you post something, it can really quickly check to see if it follows the rules. Yeah. Really quickly. And that really just stuck a chord with me because not only is that great for prevention in terms of harmful content getting out there, but that's the type of instance that affects humans in jobs right now where they are quite unfortunately going through some extraordinary trauma in that role. It's humans modding all of that right now. We've heard some of the horror stories, great reporting by Casey Newton on that a few years ago that honestly haunts my tech dreams to this day. Don't post bad things on the internet, people. But I think that that's a great, I mean what a brilliant application for this to prevent that, you know, we say we can't unsee. You could essentially make it so that folks could unsee something horrific that popped out. They would never see it to begin with. But we can go further. Rather than just saying no, you can't post that. Most of the people are posting these things and doing this, if they really reflect on it, they don't wanna be doing that. If you give them a little bit of time, like that kind of stuff, if you could talk to them and say, okay, I'm gonna start counseling you on how, if you want, if you want, on how you can change your behavior. Cause most of these people don't wanna be doing what they're doing. And over time it can get better and better and better and reach more people. So it's not just a no, you can't do this. It's a no, but I'm gonna help you. Yeah, yeah, it's a yes and. Exactly. And it's a thoughtful solution. Bandhammer works in certain scenarios, but it's not the end all be all. We can't just ban bad thoughts and assume it's gonna fix the world. This ties into what you were talking about. You talked about security in the beginning. I love that you think that AI can make a better world for all of us. Why is AI the safest when we all have access to it? So this gets into control theory. I mean, it's funny, we talk about AI and all of a sudden our minds go blank on all the stuff we already know about science and engineering. But we know a lot about control theory. And control theory is systems staying stable. There's three things when it comes to safety for AI. Number one, specifying what it even means to be safe. Fundamentally. What does it mean to be safe? We're not gonna agree. Everyone's gonna have their own definition and that's part of the trickiness, right? Different countries. That's the nuance. Different people. So how do you deal with that? The second is, is it actually following those instructions? And third, how do we know whether or not it's following those instructions? So it's the specification. It's the following that specification and it's the checking. Those are all very hard. And here's the thing. Imagine a genie. Imagine you get a lamp and the genie comes out. Right, right, right. And you ask to become a billionaire. Great. But does it file the taxes for you? Does it start doing financial planning? Does it help you avoid the lottery winner's curse? Yeah, seriously. Right? Or stave off your enemies that are now coming for you. Family. Yeah. We all just had Thanksgiving, a little drama at the dinner table, you know. But in all seriousness, there will never be something that just magically gives you what you want. There are gonna be tools and everyone's gonna have access to them. If a small group has access, it becomes unstable. Just like in control systems, when you have a small number of dimensions, it's hard to keep it within the parameters. The more dimensions, the more stable. Think about it this way. If you have a small people in power, how does that work for society? Not well, speaking of somebody who studied political science. Well, there you go. So what you want is you want a bunch of voices all being able to have a say. And so that's why our mission is to make sure that everyone continues to have human agency. And that means that they have a say. They may not always get what they want, but they have a say. Well, we have a say. At least we're gonna have a say here. Yes, we do. In this wonderful conversation. Absolutely. This is getting really nice and introspective. I feel like I'm pondering humanity. I'm not even worried about the tech inside it anymore, which is really exciting. You just wowed Elisa here on the show. Literally. You wowed our production team, both here and in Denver. You wowed me immediately. I'm just a total textbook case study of apparently what everybody does when they get to start playing with it. How are you gonna wow us next? Oh, oh my gosh. I'm not gonna tell you in advance. I'm gonna keep that a surprise. Well, how long do we have to wait? Yeah. Oh, not long. I think the next surprise will probably. Manage my expectations, Jonathan. Let's aim for something in January or February. Oh, okay. I think that's when we'll have something really, really interesting. I should have brought my magic ball. Darn it. I can't. Well, next time. Tomorrow. We'll have to train in on it. You didn't answer the question. So I said, you could be the judge about AI supremacy. What do you think? I mean, you know what I think. I'm the one who invited you back on the show. You can have it as you want. And I know that I speak for John and a few of us here. Extraordinarily impressive. I mean, last night I was playing with it with everything from, you know, across a variety of different topics, including things about myself. And one of the things that I could tell was remarkable was you've put up guardrails and it doesn't tell private information about people. And I feel like certain AIs that I've played with will kind of reveal anything that's in the tank. There's no filter on that what might not be appropriate. There's less data hygiene on what's a, to your point, what's a suitable response to that query. And I noticed I was like, wow, Grock respects my privacy. Well, I want to give credit where credit is due. We didn't create the model. This is the open source model from Meta. This is them. Who would have thought they would have been looking after our privacy? Seriously. But they've done a really, really good job. And I think this is one of the really important lessons. I hadn't noticed that before. Yeah. But open source tends to provide better security, better results. And they've opened this up. People are doing things on top of it. We're learning. It's amazing. It is amazing. What types of customers, or it doesn't have to be customers because I realize that might be revealing for you. What types of applications get you most excited as an individual, not just as a leader of Grock but as a human being? Well, I am excited about the content moderation angle. I'm also excited about real-time coaching, which is a thing that we're seeing. So people entering new jobs, getting real-time coaching as they're doing the job. I love that. And it has to be super low latency to work, especially when they're interacting with other people and it's coaching them. Anything around improving websites and making them more tailored and specific to the users landing on them. But also, speed is quality. When you have the ability to iterate, you can ask questions like, how could this answer be better? Great, make it better. You can also run bigger models that are higher quality. So I think we're about to see a major jump in the capacity and capability of these models. What are some of the industries that you think this is gonna make, or is already making a huge impact in? So startups are embracing this big time. Larger companies are going, yeah, we've got this thing that works kind of well. We don't wanna risk that we make it worse. The startups, they don't even have that function yet. I was at an event with 20 startup founders who are all Series A or seed, and 15 of the 20 were relying on large language models for core parts of their business. Wow. Yeah. Is that a cultural thing? 75%. The incumbents, the larger organizations who were, mm, what's working is fine. Status quo is good. Yeah. The challenger is coming in saying, mm-mm. Yeah, and those challengers, I've seen some demos that I can't talk about because it's their own stuff. There are things that you could have, that literally made a billion dollar business out of what someone put together in a weekend. Wow. At these companies. Woo. And they're just building these things internally rapidly. It's crazy. It is crazy. I mean, we were talking to Johnny Dallas at Supercomputing as well, and he was saying that he was saying engineers build the same thing with just three people as they are with 300. You're a perfect example of that with 180 total employees. The world's a roister. It is. What's your advice to other A.A. entrepreneurs who are thinking about getting in this space and wanting to be a challenger like Rock is? What do you, how do you advise someone? Give us the inspo. Yeah. Number one, differentiate. Do not just do what everyone else is doing. I've heard so many people who are just doing the exact same thing as someone else. Yes. Number two, don't try and just take what humans do and automate it. That's not very interesting. Do things that were not possible before, the things that don't even exist in science fiction. Amen to that. Solve the unsolvable problems. And the never before thought of problems because I'm starting to see those happening. I think it's exciting. Lisa gave me a challenge yesterday by trying to come up with an AI business around, like you just did with the cupcakes. Yeah. Cupcakes, cooking, and puppies. We can just play in the store. Yeah, now you're we can just fan. Your side hustle is fruitful. It is. Speaking of fun things like cupcakes, you did something very fun in Denver that struck us all. We've obviously been talking about Lama here, Metta's model. You brought a live Lama to supercomputing. What was that experience like? We've got some live footage here of Bunny, I believe, what's the Lama's name. That's Bunny. And by the way, that's not exactly right. We didn't bring a live Lama. We got two for the price of one. Bunny was actually pregnant. So Lama two. Oh my god. That's fantastic. I love it. And it made a huge impact. Everybody wanted a selfie. Well, I think that is so nice. And Jonathan sends Bunny and Lama to both the in utero Lama as well as that live beauty there, walking around the streets of Denver. We're such a hit. And our interview was so much fun. We wanted to make sure you felt at home here in the Cube Studio. And we brought a Lama here to you today. So like you said, it's all about safety and making people feel good. We may have different definitions, but at least we both love Lama's. Well, I think we need to fist bump on that one. There we go. Yes, I love it. I'm standing. You know what? On this note, since we've had such a fun morning and I can't see a darn thing, this sounds like a fantastic moment to close. What is my favorite interview of Super Cloud Five? The battle for AI supremacy. Jonathan Ross, CEO of Grock. Thank you so much for being here. Lisa Martin, absolutely fantastic to have you on my right hook. And thank all of you for tuning in from our studios here alive in Palo Alto as well as with our editorial team, live from Las Vegas, AWS re-invent. My name's Savannah Peterson. You're watching the Cube, the source for cloud and generative AI coverage in Lama's.