 I want to solve AI, I want to unlock the mysteries of intelligence in the universe. We're going to build a superhuman driving agent for the sake of building a superhuman driving agent, but if we build something that can generically learn problems by watching humans solve them, we can solve a whole lot more than driving. George Haatz, the hacker slash entrepreneur who as a teenager became famous as the first person to jailbreak the iPhone, and then the PlayStation 3, is in a race to build the world's first fully self-driving car system. Haatz's company, Kama AI, which has raised 8.1 million dollars, is competing against giants like Google Offshoot Waymo, Amazon Own Zooks, and GM's Crudes, which rely on custom vehicles with expensive sensor arrays including LiDAR to identify surrounding objects and people and navigate a predetermined path. Kama takes a radically different approach. This is all you need to drive a car. Yeah, clap! I don't want to be the only company building this hardware. OpenPilot's open source. Come play in this beautiful ecosystem. We're paving the way for you. Maybe one of you out there. It's latest product, the Kama 3, runs on a smartphone processor and plugs into most new cars, taking over the built-in steering, gas, and brake systems. Kama's open source software, OpenPilot, uses artificial intelligence to predict where a human would drive in real time. Now we can chill! Make driving chill! Self-driving cars are just the beginning of what all of this technology can allow, I want robots that clean my house and cook me food, right? It's the same tech. Hott says his better-financed competitors have misled the public with hype about what their products can do and on what timetable. I think that this self-driving space is incredibly sad. Like, it went from companies aren't profitable to companies don't have revenue to companies don't have users. So George, the last time that we talked was January 2020. It was about two months before the pandemic hit or before the lockdowns. And I asked you to predict what was going to happen in the next two years and you said we could expect to see a wave of consolidations. These, you know, self-driving startups get absorbed back into the parent company or closures and we've seen some of that happen. Zooks laid off over a hundred people. We saw Uber and their self-driving efforts lift also ended theirs. You think we'll see more things like that start to happen? Undoubtedly. I'm not sure when we talked of Zooks had been bought by Amazon yet. The Nicola founder was indicted. Like, I'm not going to make a prediction that like Waymo and Cruz are going to fold in the next two years. I don't really know but yeah the smaller ones are going to fold. The smaller ones where it's more of a scam. You saw the Uber ATG fold into Aurora. This one seems like a hot scam right now. Aurora also just got permission to operate self-driving cars with a driver monitor in California. Cruz does have a permit to operate fully autonomous vehicles whatever that means at this point in California. So do you think that any of that matters? Investors can't even properly evaluate this space. Why could governments possibly? The hard part's not the permit. The hard part's not the hardware. The hard part is completely the software. You and Tesla and I believe Mobileye have a different approach. Can you just explain what the difference is in your approach? Us, Tesla and Mobileye are all building cars that operate in a similar way to a human. Where the intelligence is on the car. The intelligence is not in the cloud. What separates you or Tesla from a Waymo? When a Waymo decides where to go, that Waymo doesn't individually decide where to go. It's been programmed on maps to basically follow a line. With us and Tesla, we look at the road and we decide where to drive the car. This is also what humans do. If we can't solve it with vision, we are not solving AI. The goal of comma is to solve AI. A lot of the criticism that you've gotten is that you can't possibly perform as well as these systems that have all these sensors like LiDAR. Let's say that there's a world in which the costs of LiDAR come down dramatically so that it would actually be an affordable thing that you could integrate into your product. Is that something that you would do or is that fundamentally misaligned with your vision system? There is an existence proof in the case of humans that you can drive a car with vision. We will drive a car with vision. It's just a question of time. Doesn't that make it a much harder problem to solve though? Because let's say you're driving the snow or whatever. If you have LiDAR, maybe the car can see, there's new technologies out that say they can see hundreds of yards ahead, whereas a human wouldn't be able to. So does that limit your short run development, do you think at all? No. Most human accidents are not caused by failures of the human perception system. Distraction, reaction time, just like kind of like cognitive overload, right? I remember when I started driving, even relatively simple driving scenarios would kind of put me in a cognitive overload state. I think everyone remembers that when they were 17. It's like, I guys make it a turn. I'll make it a turn. Is it me? Is it him? So it's that that causes accidents a lot of times for humans. Can you just talk about how your vision system works in contrast to a system that like is lane tracking or that uses radar or other sensors integrated into it? We talked about using lane lines as kind of the original sin of Comet AI. We shouldn't have to use lanes. We want to drive end to end. We want to look at where a human drove and then we want to mimic that. We want to do the same thing as a human without any reference to anything like lane lines or cars. The only thing we want to say is imagine giving a human a picture of a road and say, where do you drive? So end to end, it's about intentionality and planning and where a human would go, not just mapping different things in the visual field. Yes. It never asks the question, what is this object? It asks the question, given this scene, where would a human drive the car? You can imagine scenarios. Have you ever been in a scenario where it's like a construction zone and like there's like a line in the middle of the road and it's painted and it's kind of a little bit confusing which to follow, especially if you're using any kind of hand coded detect the lanes policy. Our car without the use of lane lines does much better in scenarios like that. Also in scenarios of say suburban streets which don't have any lane markers at all, it will now stay to the right and it will move a little around for parked cars. So for example lane changes, there's nothing in the car that where we coded anything about how to do a lane change. All we did was labeled again automatically in the ground truth where humans made lane changes and said, this is a lane change, right? Nothing special but the word lane change. We could have called it, this was a black. So can you just explain specifically how is that different than what Tesla's doing? Tesla has a hand coded layer between perception and planning. Tesla has a layer where they're outputting an abstracted human understandable feature space from the cameras. We're not. So when we train, we train straight from pictures to plan. When Tesla trains, they train from pictures to labels and then have this quasi hand coded thing which goes from labels to plan. So you have no hand coding at this point? There's still a little hand coded but not really. We can now detect where you're supposed to stop for a red light or a stop sign and we don't do this by ever coding in a red light or a stop sign. We do this by asking where did humans stop and stay stopped for a while. And so when the machine looks at that scene, is there any way to see what it's seeing or like know how it's arriving at that decision? We can ask it that question but why do we care? The question is not like who cares if it stopped because it saw the stop sign or it stopped because it saw the word stop on the road. The only question we care about answering is how many stop signs did you successfully stop at? I don't care how you came up with the answer. I care that you got the right answer. And so does this make it more difficult if you want to let's say go back in and understand how your system made a decision, right? So Tesla has that layer I imagine so that they can go and be like well we didn't detect this fire hydrant or whatever it is. What I find funny about this is after a car accident you can ask the driver what happened. And the driver is not accessing their perception states at that time to figure out what happened. They're actually usually making up a story. A lot of times they're usually making up a story to make them look better. He came out of nowhere man, right? Listen to drivers talk about this. If you really want to figure out what happened in a car accident dash cam footage is incredible. If you have dash cam footage from both cars you can figure out what happened much better than asking the humans. So why should deep learning systems be any different? If you want to figure out why the car made a decision there is no why. Just look at what the car did and then you can say okay well we can figure out why it did that by reasoning about the system holistically not explicitly saying well there was a lane there and it's not a good way to do it. You consider Tesla like the iOS you know it's the it's the iPhone of the self-driving world you want to be the Android you want to be the other half of that market and work with all these different hardware manufacturers. How does open pilot compare to autopilot Tesla's offering at this point? We're trying to make driving chill. Maybe Tesla's slogan is look at this crazy feature. And one of the things that is like that is like right turns at an intersection right. But you would sort of look forward to this point being where we would start to see whole commutes and you know sort of longer trips that you know your system could take over. Isn't that a big part of commuting and like why is that a problem that you haven't solved yet? You may be able to get to an arbitrary level faster using hand coded stuff. We aren't targeting doing right turns or doing whole commutes. Say okay maybe we'll talk about the lowest abstraction level. If you're talking about something that hand detects a car right like I'm looking for taillights and I'm looking for wheels right you're working at like a really low level. Nobody would advocate that anymore. But by the way back in 2008 this was how people advocated for computer vision. It's like if you want to detect a car well we'll detect the wheel we'll detect the light we'll detect the hood nobody does that anymore right so nobody works at this level. They work at this level which is like a little bit higher up where it's like we're going to put a box around the car. And it's almost like they didn't learn that lesson. We take that lesson and go even a level up where like we're just going to build something that drives and then we're going to solve it in a holistic way not in a piece by piece sort of way. So you're saying that you're not going to specifically code for let's say making a right turn. Yeah. You're going to iterate the system so like let's say now it's like a teenager or maybe it's like a five-year-old and then once it gets to the to be a teenager then it'll make a right turn when that evolution happens the overall evolution. Yes. Is that any different like writing that type of code for this end-to-end holistic system versus an iterative you know or sort of a step function type system? Yes and I'll phrase this a little differently. If you asked me to code a chess engine and I had two hours I'm not going to build an end-to-end chess engine I'm going to build some you know hand-coded crap with some very basic you know board evaluation function and some very basic search and I could write you a chess engine and I could write a chess engine in two hours that can beat me a chess but I couldn't write a chess engine in two hours that's superhuman and in fact there's no path from the chess engine that I wrote in two hours to one that's superhuman the one that superhuman requires a radically new architecture. I wouldn't build anything about pawns and rooks because I'm not a chess expert I don't know how to code that board evaluation function but I do know how to code the win and the lose state in chess I can code that and I can code that perfectly so I would build something that learns how to play chess by playing millions of games against itself and I'm confident if you gave me a month I could build a chess engine that beats Magnus Carlson so it's the same thing with driving if you want to see amazing results tomorrow don't build any end-to-end stuff but if you want to solve the problem over a 10-year time horizon build this. And from a business perspective you know Elon Musk is sort of famous for over-promising or you know his timelines like I think the first time he said that they were going to have full self-driving within the year was maybe three years ago do you think that that feeds back into the way that they are approaching this problem where they have to you know sort of meet these individual intermediate deadlines and you don't. Yes I do I think that from what I heard from the autopilot team one of their main goals was to get Elon's commute to happen without a disengagement. Hence the right turns. Yes I don't think this is a good approach. I don't care if we solve self-driving cars in one year or 10 years or 20 years we want to do it in the correct way and in the correct way that's going to scale to way more than self-driving cars if we build something that references cars and lane lines things only probably useful for driving but if we build something that can generically learn problems by watching humans solve them we can solve a whole lot more than driving with the same code in the same way alpha zero didn't just solve chess but it also solved go and show you with the exact same code base. So the way that you sort of put together most of your model is through big data right so you have a lot of people that are out driving with commas they give you data and then you use that to train the vision system and you have far fewer users than Tesla's right there's hundreds of thousands of Tesla's out there so how are you competing with them in terms of having that amount of data to to train your models on. So we have 10 million miles of driving data sitting right over there and 10 million as 10 million minutes a 10 million minutes is about 20 years of solid driving this has to be enough and the reason I know this is enough is 20 year olds can drive cars right so remember that it's not just learning the driving problem it's also learning vision entirely humans take a lot of time to learn vision you're like well humans actually learn to drive in like two years and they do but humans have to learn all of vision and all of color and all of shape and physics and we have 20 years that should be enough and the data has empirically shown that we're actually only training on a million of those 10 million so we could 10x if we start to see improvement but it's been diminishing returns ever since like 400,000. And then for the edge cases I guess my main question is if you're taking this route don't you need sort of a pretty much a general AI like an AI that's capable of reasoning like a human because let's say that you know a ball comes across the road right and a human would know okay there may be a little kid coming after that ball right but a machine would not know that unless it sort of has a general knowledge base. Of course it would how many times in its data set has it seen kids run after balls and if it hasn't seen enough we need more data so okay so it's ultimately a data problem of course it's a data problem for humans right humans only know that kids run after balls because they've seen it at some point in their life it's all a data problem to believe that humans have any sort of general reasoning I'm not sure how true that is so you don't think that there's a distinction in terms of higher level cognitive reasoning in humans that you'd have to reproduce for this type of system no especially not for driving humans largely drive on autopilot right when you think about the kid in the ball it's instinctive it's not like it's not like using the long human reasoning pathway I mean even the long human reasoning pathway like you might use for chess or go yes that looks a little bit different from largely feed forward neural networks it looks more like search but we're pretty good at this too again like we're here to solve AI we're not here to write hacks in general the long-term vision that you have it like is this the right vehicle for what you want to do with solving AI still seems pretty right it's interesting to think about now what happens also after we solve self-driving cars I think that homo robotics is harder cleaning and cooking are harder than driving because there's a million correct ways to chop an onion there's only kind of one correct way to drive down a highway there's always there's always a joke about things being AI complete and it's like if you can solve this task it is AI complete meaning you can't solve this task without solving AGI I do not believe driving is AI complete AGI being artificial general intelligence so basically like you can recreate a human yes AGI is a program that can do everything a human is capable of doing we define humans as having general intelligence whether that's true or not it's a definition so I do not believe that driving is AI complete I believe that we can solve driving in such a way that we still can't solve cooking and cleaning now from this perspective it seems like if we solve cooking and cleaning it might be AGI but history has told you to bet against that over and over again and as that goal post starts to get closer you realize oh no there's actually a whole lot more to it wow we built robots that cook and clean really well but there's still a whole lot more to what it means to be a human and what historically are you talking about in terms of people thinking it needs to have general intelligence but it doesn't chess 1995 chess AI complete the only way you're ever going to be human grandmaster to chess is by solving AI this turned out to not be true five years later there's no way you're ever going to be able to like detect chairs and images until you've solved AI this was wrong 10 years later there's no way you're ever going to be able to solve go without solving AI that was wrong five years later people say this about driving watch we'll do without solving AI and so ultimately the reason that you're taking the approach that you're taking more than even for the self-driving application is that it will be useful moving towards AGI and now all these other applications like cooking and cleaning yes and I also do believe that it is the fastest way to get to superhuman self-driving cars though that's an empirical claim that I may be wrong about and what other applications can you foresee for for this like what's what's you're talking about cooking and cleaning robots after cars is there anything else that you know you would you would see as like one of the the first applications beyond self-driving cars building robots autonomous tractors true autonomous farming everything you can imagine being done by kind of um maybe the word is day laborers do you have an idea where you think the line is where an AGI is actually necessary there are some fundamental things to talk about where you can say that like for example humans have about 20 petaflops of compute if you want 20 petaflops now you need an entire rack of google tpu v3s you can get a human's worth of compute but it costs millions of dollars and it costs thousands and thousands of dollars per hour for the power to run it maybe we can say in like 2038 you'll be able to buy a human's worth of compute for a thousand bucks so you don't think that there's a fundamental difference between let's say a human and a bird in terms of like you know we talk about in raw compute right but cognition is generally thought of as you know sort of a higher level like we don't it doesn't flow directly from the structure that creates it right so do you disagree with that the only difference between humans and birds is humans think they're special that's the flip and answer but no i don't i think intelligence is continuous on a spectrum for you to posit a step function like like major step to get from like the animals to humans that seems like the thing that requires a justification and i've never seen a good justification for that maybe there are paradigms of learning that you can make kind of a qualitative distinction on the first paradigm of learning is genetic the only way bacteria learn is by evolution the second paradigm of learning is like a nematode worm learns and then you get into the mammals which all share this kind of third paradigm of learning you learn from your parents you learn from society you learn from culture maybe i'll posit a three and a half with writing like because we can learn from humans 2 000 years ago without them actually being here and then fourth paradigm is like i know kung fu so uh computers are fourth paradigm learners maybe that is a real distinction and i think a lot of this is what makes people uncomfortable with artificial intelligence or you know black box algorithms in general right is that we can't see into it to know how it's making a decision necessarily right um do you find that uncomfortable i heard a good quote people complaining about billionaires wasting money going to space are going to be shocked when they find out how much money governments waste people who are concerned about machines being black boxes are going to be shocked when they find out about humans maybe i ask those people whether the old people telling people what to think really had their best interests in mind or not this also is is sort of a fundamental discomfort that people have with distributed systems right because you can't necessarily control them i think this is almost a philosophical distinction between you know google's way of viewing the world it's like you know we're the cloud we own everything and like oh gotta decentralize it has this inherent advantage of trust everybody can look at the distributed decentralized system and trust it's also true with open pilot if you want to decide how much to trust your open pilot open pilot is open source code you can read it you can modify it if you want to decide how much to trust your waymo or do you trust google do you trust their organizational uh structure that you have no insight into and so how do you think that distributed systems uh you know bitcoin included or crypto but other things as well how are they able to offer more value to individuals than the existing structures the decentralized software is only so good at bitcoin network and only process a small number of transactions per second visa can process a lot more visa is technically better than bitcoin but in the long run this flips bitcoin gets better every year and visa gets worse every year whatever visa blocks payments to at first it's just you know julien assange and then it's sex workers and then it's right wing people and then it's whoever you know the new the new people visa you know decides to go after so visa gets worse more censored and like more scummy kind of every year they're like how can we extract more value from these people how can we raise the ap r in this weird clause that nobody's ever going to read in the contract another beautiful thing that i love about crypto is forks you can't deploy a decentralized exchange to ethereum and take 30% fees apple takes 30% fees in the app store because if you took 30% fees that's enough to fork now if you take 3% fees maybe not but it forces a much more sane dynamic this is free exit in countries too it's the same basic idea if you have you know you can't sustain tax rates that are insanely high because people will leave right and if you could fork the country people would leave even sooner this is a great thing about it's very easy to make more digital land i'm hoping that we're going to see a decade of decentralization and do you see the risk of that you know being slower or not happening as mostly political given that you know i think polarization and this group wants to ban these people and this group wants to ban these people and they all rely on a sort of central authority that is capable of doing that right you hear people like elizabeth warren and other politicians now saying that we need to ban or regulate crypto really heavily for this reason right uh because they don't know where the payments are going right it could be buying guns it could be buying drugs you never know um so do you see that as a significant risk this is why i don't do politics it all happens on the wrong timescale it happens on this small timescale they all act like the world is falling and then some new administration takes power some new country gains prominence uh and this has happened throughout history so no in the long march of technology technology wins seems like governments have been equally shitty throughout all of history but when you look at technology it's very rare that technology declines whereas governments and corporations fail all the time so what do i fundamentally trust more not organizations of people but technology the bitcoin software the ethereum software the blockchain software will improve year over year slowly these organizations some will rise some will decline but someday the software will beat them all and do you think that these systems that are distributed like you know bitcoin one of the promises of it is that it can't be regulated to a certain level right i mean we're finding out that there you know it's not quite as anonymous as people think it is but fundamentally do you think that it's possible to build systems that are not touchable by existing power structures no i think power structures will always find ways i think that how power structures come to prominence is usually by offering something the things that they will be able to offer will diminish with time as technology gets better you can imagine one of the things that power structures fundamentally offer you is protection from violence i wouldn't need that if i had a arm-based force field that could fend off bullets and knives and everything right the need for the power structure would decrease so no i don't think it's that power structures can't fundamentally regulate these things i think it's at the way power structures gain power and the way not just governments but also corporations gain power is by offering people value and as the value they can offer decreases their ability to rise in the first place decreases why do you think that this the 2020s are going to be fundamentally different and we're going to finally get to this decentralized ideal i can see comfortably by the end of the decade crypto money technology which is actually technically capable of replacing visa this is not true in 2020 it certainly was not true in 2010 and crypto didn't even exist in 2000 but by 2030 i could see it getting to the point where even extrapolating growth of payment networks crypto can actually exceed the technical capacity required i've also seen a general decline in a lot of organizational structures so one goes up one goes down lines cross at some point i'm open for this decade how do you see this playing out on the ground right has is comma gonna change do you think that the tech industry will change in a fundamental way i think one of the biggest changes is startups with this thing mark zuckerberg dropping out of college coming out to silicon valley was this narrative and he was kind of one of the first people to do it today if you drop out of harvard move to silicon valley and raise money from institutional investors you are not a maverick you are a copycat and it is fundamentally a different type of person who does the former and does the latter it needs to be a new type of thing it needs to be who actually is breaking the mold it isn't moving to silicon valley valley and taking money from institutional investors that is the power structure that is not disruption what does actual disruption look like uber and lyft are massively more expensive than they ever have been do you think that's a function of what we're just talking about where you know the bills come do and they've been floating on all this VC money for so long um is it the pandemic is it a combination of both something like uber is interesting people 10 years ago would talk about how uber shouldn't be this middle man they're just a middle man extracting profits actually 10 years ago the opposite was true uber was paying out the drivers more than the riders were paying there was no way to possibly compete with them but as that starts to shift and the more that starts to shift the better the peer to peer decentralized uber looks what do you think about you know the experiments that we've done on like a universal basic income or uh you know basically giving people free money right uh how do you think that that is going to play into how things evolve from here where's this free money coming from who has the authority to give free money that's the interesting question right if you have two shitcoins and in one shitcoin the owners have access to the mint function and just arbitrarily mint more whenever they feel like it and you have another shitcoin where the owners uh transferred ownership to the zero address so nobody can call mint well the value of these coins is probably going to go like this but there certainly are tokens like that and they don't gain adoption people don't like that what are those guys have access to the mint function and not me i don't feel that good about this tell me the mint function in bitcoin is like a hash rate competition and everybody can participate and hear the rules i feel better about that yeah that's that's a currency that i believe in more for the long term than the currency where it's arbitrary and guys are like ah but he really needs it mint him some coin all right so the fact that like in a decentralized network where people aren't operating at the behest of some centralized authority they choose not to buy coins that have unlimited printing do you think that that's analogous to the dollar i love competition i love seeing competition i love seeing competition for everything competition for money competition for televisions competition for self-driving systems and competition for uh governments all the competitions great