 Daily Tech News show is made possible by you, the listener. Thanks to you, Matt Zaglin, Kelly Cook, Scott Hepburn, and brand new patron, Ernesto. Everybody welcome Ernesto. Welcome Ernesto. On this episode of DTNS, Andrew Maine explains why thinking about AI in movies and TV shows may be thinking too small. Plus, Canva is taken on Adobe, and why chatbots may not replace search engines. This is the Daily Tech News for Tuesday, March 26, 2024 in Los Angeles. I'm Tom Merritt. And from Studio Redwood adjacent, I'm Sarah Lane. I'm the show's producer, Roger Chang. And joining us, founder of interdimensional former open AI science communicator and author of Dark Dive, a thriller. Andrew Maine, welcome back. Hey, thanks for having me back. Thanks for being here. We're going to have a really good conversation about what's going on with open AI and the video generation and some ideas that you probably haven't heard a lot of other folks talk about. So stick around for that. Also notable, Apple announced WWDC, and thus its AI strategy update will kick off Monday, June 10. And now, the rest of the quick hits. Later this year, Sony will add a community game help feature that lets PS5 players upload gameplay to help developers make hint videos. If you opt in, the feature will automatically capture game clips when you complete certain activities. Moderators will review those and then publish them as a game help hint for other PlayStation players to watch. The published version will not have audio from your Mic or webcam. You can remove the clips if you want. Community game help will also come to select games later this year. Microsoft's do a little shuffle at the top. If you recall, they recently created a new AI division and put Mustapha Salaman in charge of it. Mustapha Salaman just left Inflection AI, along with a lot of the staff, to come to Microsoft. And Salaman also is famous for being one of the co-founders of DeepMind, now run by Google. Mikhail Parakin has been in charge of co-pilot Bing and Edge pretty much all of Microsoft's consumer AI efforts for a while now and was going to have to report to Salaman. But apparently, he did not want to. Instead, Parakin has stepped aside from his AI role and will now report to CTO Kevin Scott temporarily while he explores new roles. Probably outside Microsoft. Maybe in a community theater, I have no idea. It appears co-pilot Bing and Edge will stay under Salaman's new AI division as part of the web experiences team. Meanwhile, what you're going to see in all the headlines is the announcement that Pavan Dhavaluri, previously responsible for the Windows and Surface teams, will now head a combined version of those teams called Experiences and Devices. Google will release a new native version of its Chrome browser for Windows on ARM this week. Users of Windows on ARM machines could use the X64 version of Chrome in an emulated state previously, but it had slower performance. The announcement comes in advance of the launch of Qualcomm's Snapdragon X Elite chip expected this summer. The chip will feature in multiple power-efficient ARM-based Windows machines. Speaking of ARM, they are joining up with Intel, Google, Qualcomm, Samsung, and a few other companies to found something called the Unified Acceleration Foundation, or UXL, in order to create an open-source software suite for AI development. UXL launches with Intel's One API, an open standard that does not tie coding tools to specific architecture, specifically NVIDIA's CUDA platform. That architecture is powerful, but it's also scarce due to demand, and it's only available from NVIDIA. UXL says it will focus at first on open options for apps and will eventually support NVIDIA hardware and code. Of course, NVIDIA is not part of UXL, at least not now. Neither is Microsoft or AMD, and they are rumored to be developing their own alternative to NVIDIA. Instagram and threads have introduced a setting that lets you opt in to receive more political content if you so desire. The algorithms on both platforms limit political content recommendations from accounts that you don't follow. In content preferences, you'll now find a new setting called limit political content from people you don't follow checked by default. You can uncheck it, of course. You can get more politics in your feed if you want. Adam Masseri, who runs both of these platforms said in the past, we want to do this by design, but sounds like they realize people have different wishes, especially in an election year. Meta defines political content as posts about governments, elections, and social topics. All right. Let's talk search engines. David Pierce has an article on The Verge called Here's Why AI Search Engines Really Can't Kill Google, and he took it upon himself to compare traditional search results from the likes of Google and Bing to AI-focused search engines like u.com and Perplexity, as well as Microsoft's Copilot. He broke down the comparison into what are considered by search engine professionals to be the three common types of queries. It's a great article. I think I would recommend everybody read it. We'll have a link in our show notes. But I thought for today, we could walk through those three types of queries, talk about what David said he found, and then what we think of his findings. The first one is navigation. This is the idea that you type in the name of the website you want to visit in the search engine, and then you go there. This is why Facebook is often, or facebook.com even is often the number one search term because people do that very literally sometimes. And not a surprise, Andrew, that AI wasn't as good at that as a search engine, right? Yeah, I mean, when you want to use a directory, use a directory. Yeah, and that's the one thing that I think search is going to continue to be advantageous at because it's simple. That's what search has been defined at. The second one is information. So things like, what time is it? What's the weather? Give me some sports scores. The results varied here. Google and co-pilot were both pretty good. So you had a search engine and an AI devoted engine that were both pretty good at this. Google did have the edge according to David Pierce by providing you a little more context. Things like stat boxes, knowing your location, et cetera. Those made the results more relevant, especially for things like weather. More evergreen information though, like how many weeks are there in a year was handled well by all of them? And very precise information was handled by the AI tools the best. Things like how do you do a screenshot on a Mac? Or what's the proper ratio of coffee grounds to water? Andrew, what do you make of these? This seems obvious as well, right? Yeah, I think that it's very early days as far as AI and search. And the example I gave where if you just want to know how to get to Google and Google.com, do you need a big LLM language model to do that? But I think the problem with search is that we've been conditioned through 20 years of Google search to think about search in a very narrow way. I just put a thing in there, my favorite demonstration to show you the limitations of traditional search is if you Google how many emancipated miners are there in the United States, you get a wildly wrong answer. And it's been like that for years. It's one of millions that are wrong because all Google does is says, hey, this is the most popular answer to the question. I don't know if it's right or wrong, but if nobody else has bothered to ask it, so there's no critical thinking that comes into play. And models have their disadvantages of this hallucination and stuff, but AI search has gotten progressively better over the last 12 months. And I think a year from now, it's going to be fantastically better, but traditional Google search, I don't know. And this answer, by the way too, if you use Google search, they'll get this answer from the web, then they'll feed it into their LLM and it'll give you the same answer, but no, they're not 20 million emancipated miners in the United States. Yeah, that would be a lot of emancipated miners. I think the information thing is interesting because something like what time is it is, I don't know. I mean, I ask my Amazon assistant that all the time. I don't know if my Apple watch isn't handy, but for that to kind of trip up the system, but something more evergreen, like how many weeks are in a year, it gets it right more of the time. It's kind of like, okay, do we, it almost goes back to the conversation about having an awake word. So it's like, let me tell you kind of how I want you to respond. It's this kind of question ahead of time. And that's where it's interesting to me that AI feels like it has a long way to go to get results that, like you said, Andrew, we're so used to getting by doing something very simple, but in a kind of non-human way for the query. Yeah, I mean, it's a good point about knowing what you mean. And that's the problem LLM struggle because when you ask how many weeks are in a year, do you want just 52 or do you want a precise answer? Because there's not exactly 52 weeks in a year because of the way the earth moves around the sun, et cetera. And models will struggle to be like, okay, well, did we talk about a leap year? Cause there's leap year, then we got another day, but also we're talking about like, we add a few seconds every year. So that's where models will sometimes give you these answers that feel like, is that off? Or is it just struggling to understand what I mean? And the more they know about us and how we want to answer, that's, you know, one of the features open that I put in for chat GPT is you can give it some background. I like these kinds of answers. I like this and like that. And then over time, all these models are going to be doing that. If you want to. I feel like the advantages that Pierce found for search are temporary advantages, things like knowing your location and context. Those are things that, that your chatbot could easily have access to if you want them to, right? Yeah, a big update had been adding the use of tools, which, you know, chat GPT, well, if you ask it a mathematical question now, it'll just create a Python calculation to answer that instead of using the LLM to do it. Or, you know, their GPTs that will use your location then ping, you know, Microsoft map services or whatever to give you data on that. And I think we're going to just see way more of that where the model just says, oh, well, let me just use a tool. You know, whether that be Bing, you know, chat GPT uses Bing all the time to get answers. Yeah. The final, the third of the three common types of queries and the least common type of query is exploration. Those are questions that don't have a single answer. So Pierce used why were chainsaws invented as an example or what is TikTok? He found that perplexity and u.com handled these very well with concise answers and citations for further exploration. It does seem like this is where LLMs shine the most, right? Yeah, I think that's getting into a lot of the reason we use an LLM, we use it, use search to say, tell me where I can find that information. You can use LLMs to say, bring that information to me. And that's a very different, and we're still learning to think that way. You know, my wife spends, you know, we'll spend a long time just going back and forth, having conversations about putting together meal plans and recipes and diets and stuff because it goes back and forth. And traditional search doesn't do that unless somebody wrote that blog post you're looking for. So I think the more we think about, oh, I can use it this way, the more capable it becomes. Yeah. And it's interesting. I thought of search engines in the 90s as find a website for me, right? I need a website that talks about this. And search engines are still the best at that and those are still the most common types of queries. But over the years, the search engines, I think, have tried to pitch themselves as we can answer any question. And that's not a thing search engines are all that good at on their own. And end chat bots and LLMs are better at that. Hotspot, give me some phantom in a spoilers. I remember it well. Hotwired, what was it? What was hotwired? Yeah, yeah, that's right. Yeah. Online design company Canva has agreed to acquire professional design software company Affinity. This gives Canva an online service with 175 million users and native apps for Windows users, Mac users and people who like Macs but also have iPads. Prices also are undercutting Adobe suite of products at least for now when you're comparing them side by side. Affinity Photo, Affinity Designer and Affinity Publisher can be sold in a bundle but it can also be sold individually. Not a subscription model though. It's a one time fee which obviously makes it pretty different than how Adobe does stuff. Andrew, I don't know if you use either Canva or Affinity products or both but what are your thoughts on this merge? Well, one, I wonder if they timed this till after Adobe got denied the ability to acquire Figma. You have to wonder if that would have been looking upon a different light for this. Canva certainly made a place for itself with a lot of different sort of tools to use. I find it sometimes I feel like it feels like it's about 15 years out of date. So I certainly think buying a platform that's available in all these different ecosystems is I think a very good idea. And I wonder if they are gonna shift to a subscription model. That seems to be like that might be where they wanna go to is they wanna get you to continue to justify paying a monthly fee and I wouldn't surprise me if that's what happens to Affinity. Yeah, I mean so if you do all Affinity products together, not annually, see, I'm doing it again. $115 for the three, each are otherwise sold for $49 a piece. Canva does have subscription plans though. So maybe the company sort of is able to say, we'll do it all depending on what you're looking for. Maybe Affinity, one-time prices go away. I think they're gonna fold it in. And I remember I was skeptical when Adobe went to the subscription model. I remember back in the day because I was also like they're like, oh, Microsoft wants to get a word and I'm like, I'm sorry, I'm never gonna pay a subscription for a text editor. But a model like I said, as long as Adobe keeps improving it by adding features, which they've done that, like year after year, it is a much better product. So I think that's what Canva's up against is the idea of like if you wanna have that sort of suite, give them a bunch of tools, get people in and then I don't question every year when I pay my Adobe subscription. I look at Canva and some of these things feel like I had my publisher sent something doing Canva and I got so angry, I wrote something that was better in like 40 minutes using chat GPT because it was just so frustrating how backwards it was. So I think it's a good move for Canva. Yeah, Canva has got a very popular online service and Affinity has very beloved offline versions of competitors to Adobe together. It makes sense on paper that they're a credible competitor to Adobe, but like y'all are saying, the proof is in what they do and it seems too obvious for Canva not to include versions of Affinity Photo Designer and publisher in a subscription which is gonna turn Affinity fans off. Canva will go bend over backwards to say we're not gonna get rid of the offline versions or the solo versions or the bundled versions and the universal licenses will stay but people will be very skeptical of that and they're gonna face backlash just against the idea of including it in a Canva subscription, which they're, I think you're right, they're gonna do, they're gonna have to do that's just part of what Canva does. I'm with you, Andrew. When Adobe first announced, you know, Creative Cloud, I was like, this is great. I'm gonna end up paying so much more over time which is actually true, but it's a better service for me. I can also bounce out for a couple of months if I'm like, you know, I'm really just not gonna need Photoshop right now but I'm going to, you know, when I pick up a new project on the road type thing it works better for me. The whole sort of pay once and be done with it which kind of sounds familiar to the conversation, Tom, you were having with Trisha Hershberger on the show yesterday, it's like, same thing with games. It's like, do you pay once and just play or do you maybe pay as you go for fun new features that might, you know, crop up as you go forward? I don't think there's a right or wrong answer here but I am interested to see if Canva sort of says like, hey, we're gonna keep everything as it is with affinity because you know they're gonna upset people if they change things right away or, you know, do they change things down the road or do they leave as is? Yeah, I think it's good to see a competitor to Adobe in place too, yeah. I agree. I think Adobe has got such a war chest to spin on things like AI and everything else like that. And yeah, I think that, I think this is probably the most strategic move they can do. Yeah, folks, if you have ideas of what you would like us to hear about on the show one way to let us know is our subreddit. We look at it every day for ideas and they are included in the show. Submit your stories and vote on them. Go right out there right now, dailytechnewshow.reddit.com. Last Friday, we talked about Bloomberg's report that open AI executives met with content studios, media executives, talent agencies, Sam Altman got to go to some Oscar parties and supposedly they were discussing ways studios could use open AI's tools. Makes perfect sense. Since then, select artists have apparently been given early access to open AI's text to video tool called Sora and open AI has been showing some examples of their work on its blog. There's a video called Airhead, which tells the story of a person with a balloon for a head. And of course the balloon is AI generated, we assume. The golden record is a fanciful journey into the creation of the golden record that Nessa affixed to the Voyager probe. Showing things that you can't show because there's no camera out there with Voyager. And beyond reality is a faux documentary about hybrid wildlife. If you like some crypto zoology, you might wanna check that one out. Andrew, the obvious thing to talk about with the generative video is Hollywood and TV production. But I know you think there's a bigger use for the technology as something called world simulators. Explain a little bit about what that means and how that differs from what we're talking about here. Yeah, I mean, the purpose of Sora wasn't just to say, hey, wouldn't it be cool if we could do video too? The purpose of Sora, and if you get into highly advised people take a look at, there's the blog post, and then there's the technical report which goes a little bit deeper into. When you wanna build an AI, remember, open eyes goal is to build artificial general intelligence, like super intelligent systems, which means they need to understand how the world works. And you can learn a lot through text, but you need to learn through other things, mathematics and being able to run code. Here the goal is, can you build a thing that's basically a physics simulator? Can you have a system that can all of a sudden understand if I drop a ball, what happens, et cetera? So that's really think about Sora as the way it's simulating the world to make predictions and try to understand and then learn from it. That also happens to be a really cool tool for making videos, but think of it really, how do you get to AGI? Just like you have a sense inside your head, if you can predict and you can create models of how things will turn out, that's what Sora can do. So this is different than a large language model. I think that's important to understand, right? Yeah, the way to think about it is, when you train a language model, what it is, it is part of it, like you can train them together. So when you train a language model, you take your words and you convert them into tokens. Basically the might be just like four numbers, right? A bigger word, a word like merit, might be two tokens grouped together, but you turn your words into mathematical numbers, just in the numbers, right? And then you predict those sequence of numbers. When you wanna build an image generator, you do the same thing, you take a photo and you tokenize it, you break it down into tokens and then you put that as a sequence and you say, okay, here's maybe the first half of the image, predict what the rest of the image looks like, or here's a label that said, dog sitting on birthday cake, what should those image tokens look like? So it's the same thing. And what they did with Sora is they take basically sequences of images and they break them down in each image into basically like these quadrants that call them space time regions. And so the model tries to predict, each quadrant tries to break what another, or each section tries to break what another section is going to do. And then what they're going to do over time. And that was kind of the big breakthrough here, rather than trying to predict frame to frame, each little section of that image of each still is trying to predict some other part and then going across time. And the example we're looking at here is great because it shows you when you increase the amount of compute and how much more accurate it gets and ability to predict that. Right, cause if you're listening to audio, it's off of the blog. They show like, it used to be the dog would sort of grow three heads while I was doing video. And then over time it got to look more like a dog running around, right? But it's still doing the similar thing to the LLM in that it is predicting what the next piece of data should be. And it is trained, like literally it is trained. You just give image tokens or video tokens and you sort of you put them in. So these like chat GPT with vision which is the version that understands photos is a LLM that's also said, hey, these are now image tokens predict what these do. And so it is, these are, they're called multimodal models because you can just give them whatever kinds of tokens you want and you explain what they are. It's pretty fascinating to think that that can then turn into predicting physics because it's not designed to predict physics. It's just something you can make it do. But you can do that with a language model though. Of a language model, if I give it enough examples and I say, what happens if I drop an egg? It'll say, it falls, you know, it will learn, it seems over time. Yeah, it, you know, it's, and our understanding of physics, remember is an abstract representation. You know, we just, we have a story or a narrative that allows us to make predictions about something and that's the same thing LLMs are doing. So what do you think a world simulator can do that we're not thinking about when we think, because most people think, oh, it's going to create a movie. Like what else should we be thinking about? I have an article coming out and one of the examples I use and a little bit lengthy, but I apologize but it's worth that I promise you. If we showed the camera, the first photograph was taken of people, like 1838. You know, two people were, one guy's getting his shoes shined, the camera happened to capture it. You had to leave the exposure open for so long but they managed to get people on a photo, there was DeGurre. And if you talk to somebody at that time, like, hey, you've talked to an artist, you say, hey, look, I took this photo of people. They might start to panic. Well, first they'd be like, ah, they're blurs, who cares? Well, a year later, look, they're very clear. They might be panicking going, what's going to happen to an artist? You know, in 1838, looking at a camera, you might be terrified but try to explain motion pictures to that person. Try to explain the film industry. Imagine trying to explain in 1838, somebody like James Cameron, who had a budget bigger than the entire United States defense budget in 1838 and not adjusted dollars, making a movie with a crew of people bigger than half the size of the United States standing army with a box office return that was greater than the entire GNP of the United States in 1838. That is, again, unadjusted dollars but the point is, that's insane. And that would be, you would be like, you sound like a lunatic. Why will our future be any different? In fact, we're accelerating. Things are moving faster. So that's hard for us to understand. And I try to avoid normally, like just getting into the realm of what it will be like in 20 years. And I think, you know, like Tom, like a lot of people hear like, you know, I've read books on futurism and all this sort of stuff. And the one thing I've learned is, we're really bad. I mean, the best estimates I find are often going much, much further back when people weren't biased by our sense of the present today. So what it'll be like, I don't know. But look at that 1838 example. Yeah, so in other words, we won't know until we get this tool and start playing around with it what we could do that we never thought we could do before. Think about Pong, somebody playing Pong and explaining Gran Thefato, you know, cyber Pong. Try to explain to that person that would sound just surreal to them. And then somebody else would be like, well, the amount of compute you would need for that would be impossible. You wouldn't be able to do that. You're like, for a game. Like, well, guess what, you know. I mean, even the concept of, you know, somewhat basic concept of using tokens because, you know, the prompts are really just pieces of math and predictions. That would have been really hard for somebody to, you know, wrap their head around. And this is like a tech person, like myself. Like five years ago, I'd be like, what? I mean, if you want something that's computer generated to do something that is fantastical, then you need, you know, some skilled artist to do that fantastical thing. It can't be done otherwise. And now we're getting into, you know, we're talking about like, well, it's really good at physics. And if you want the physics to work weirdly, then, you know, you have to do extra prompts. So it's this, it reminds me really of search. It's like, okay, well, that search result wasn't what I wanted. Okay, let me get more creative. And, you know, you iterate, you iterate, and then you finally get where you want to go. Well, Sarah, you know, one of the things we did, I was at OpenAI when we released Dolly. And the first thing we did is we went out to artists and gave them access. Now at the time, I started OpenAI as a prompt engineer. And I would be the prompt whisper internally when like, how do we get this thing to do the thing? I felt really good about my prompting skills. And so when I played with Dolly, I could do some cool stuff. Then we gave it to artists. Once they got it, just totally blew away anything I could do. And I realized, oh, wow. You know, when you put these tools in the hands of people who understand art, not just how to hold a paintbrush or what tool to use in Photoshop, but really understand what it means to create something, they excel. And I find that people who are incredibly talented artists thrive when you give them new tools. People who are more mediocre tend to be frightened. And I think we're gonna see this with Sora. We're gonna see exceptionally talented people just blow us away using the same tools that you or I might have access to. Yeah. Because they know the vocabulary better than we do because they work with it every day. So they have run up against those limits of what you can do with the current tools. And I think they see a little farther on where it could go. Yeah. I remember in art class being amazed when a teacher said, you know, air is not invisible. And I never thought about that. And I'm like, what? Yeah, yeah. That's a great example of like the sort of thing that you don't think about if you haven't studied it. All right, before we get out of here, let's take out the mailbag. We got a good one from Lee who writes in, I was in Phoenix in February. It made it a point to ride in a Waymo while I was there. Took two trips at night and the Waymo drove better than I would have. Yes, it was cautious, but not overly cautious. It has time to pull out after a stop sign. It would, as long as the car in the lane in front was at a safe distance and safe speed. It slowed for every speed bump, handled around about fine, watching the monitor was comforting to see it picking up every car and pedestrian in the area like many big cities. Lee says, Phoenix has a homeless problem. The Waymo picked up people sitting 20 or so feet from the road that I hadn't noticed. I had a great experience on both of my rides. I think of driverless tech as a teenage driver who will learn from every mistake and teach every other teenager that learned to lesson. After riding a Waymo, I look forward to having level four or five autonomy for the average consumer. Lee also says, I will be in Austin for Brian's eclipse party, looking forward to meeting Tom or anybody else that might also be attending. Oh yeah, if you haven't checked that out, the Founders' Day eclipse party at Brian Brushwood's place in Austin has happened in April 8th. I will look forward to saying hi to you when we're there, Lee, thanks for writing in. And you know what, Allison Sheridan also described her trip in a Waymo in Phoenix as a teenage driver. So I think that's pretty apt. And then Dan, oh, go ahead, Andrew. My first time in a Waymo, Openae had their dev day, their big one-day developer conference, and I was there with a group of developers I worked with. We went there, we went to the after party, and then we had time to leave, somebody called up a car, and we all got inside of a Waymo, and we're driving away, and we're like, we just left an AI event inside of a Waymo, and that was like, and my brain was trying to process it, because the night before, Devo was performing across the street. I'm like, there's so much intersection going on there. Yeah, a lot of synchronicity there. We had a Devo performing back in the tech TV days, if I recall. Christmas party. I think like a Christmas party, yeah. Dan wrote in on our Patreon, actually, and said, the amazing Trisha Hershberger makes an appearance, and I am an avoider of games with micro payments as well. It feels that most of those games never let you become great at them with only a free version. You want to be the top dog in the game? You must pay. As a marketer, though, I totally get the concept of micro payments and see why it's lucrative. You make an addicting fun game to play for free. You want them to excel at the game? You provide an easy and very low hurdle to jump to make that conversion. Thank you, Dan, as always, for commenting over there on the Patreon. Yeah, thanks to everybody who writes in. Patreon messages always appreciated as our emails feedback at dailytechnewshow.com. Andrew Main, you are also appreciated. Let folks know where they can keep up with you when you're not on our show. Easiest is on Twitter, KX, at AndrewMain. Also, AndrewMain.com, I got a blog. I talk a lot about AI and stuff and try to explain it in sort of normal person terms. I write books on Amazon, type in AndrewMain. You'll see one of my novels, my current one's Dark Dive, available now, which is a thriller in my underwater investigation unit series, which is basically about a police diver in South Florida who goes on some very interesting cases. Five books in that series, is that right? That series, yeah, five books, yeah. Wow, it's amazing the prolificness that you have of good stuff. Oh, you're going to talk, Tom, come on. Go check it out, folks. Look up AndrewMain at a bookstore near you. Patrons, stick around for the extended show Good Day Internet. Reactions to new tech these days seems more negative than positive. Is tech guilty until proven innocent these days? We're going to discuss. Just for a matter, we do the show live and you can catch it live Monday through Friday at 4 p.m. Eastern. That's 2100 UTC. Find out more at dailytechnewshow.com slash live. We're back tomorrow with Scott Johnson, Paris and Phil Spencer's take on the state of video games. DTNS Family of Podcasts. Helping each other understand. Simon Club hopes you have enjoyed this program.