 This 10th year of Daily Tech News show is made possible by its listeners, thanks to all of you, including Jeffrey Zilks, Tony Glass, and Phillip Lass. Coming up on DTNS, why is Google's barred having such a hard time keeping up? Atari beefs up its game title selection, and what does the future of AI mean for humanity? Researcher Ruby Justice Thillo shares thoughts with us. This is the Daily Tech News for Thursday, April 20th, 2023. From Studio Redwood, I'm Sarah Lam. In lovely Cleveland of the Ohio, I'm Rich Trafalino. From Petaluma, I'm Megan Maroney. And I'm the show's producer, Roger Chang. Megan Maroney, we're so glad to have you back on the show. Welcome. I'm glad to be back. It was nice to see your beautiful face. And in real life, the last time I was on the show, we actually got to see each other and breathe each other's air. It was amazing. We did. Megan and I saw each other IRL. It doesn't happen that often these days, turns out. And the rest of you, you know, will just stay remote. But before we get into the quick hits real quick, Twitter has begun removing blue check marks from legacy verified accounts. Mine was included. I am no longer verified unless I want to pay $8 a month, which I don't. Twitter said it previously promised to do this, so not a big surprise, but you never know what Twitter these days. Let's get into more quick hits. The chip shortage from the last few years has done a bit of a 180. TSMC forecasts that chip revenue this quarter will be worse than expected due to drops in demand for its chips across sectors from phones to servers. TSMC says it expects a continued decline in the second quarter, then improvement coming in the second half of the year. But it's not just TSMC. Bloomberg reports that Taiwanese export orders dropped 25.7% in March and orders for tech components like semiconductors fell 29.4%. That is the largest fall in 14 years. Microsoft announced that starting April 25th, its multi-platform smart ads product will no longer include Twitter as one of the platforms. Now, unless you're running internet ads all the time, you may be asking, what is that? The tool lets customers manage multiple paid ad campaigns, including on Google, Facebook, Instagram and LinkedIn from a single interface and used to support Twitter too. Microsoft will also remove Twitter from its social media management tool for advertisers called Digital Marketing Center. Decision seems to be a response to Twitter's increased API fees. Amazon announced that it launched its anti-counterfeiting exchange, or ACX, designed to help retail stores label and track marketplace counterfeits. This is part of Amazon's efforts to crack down on organized crime on its platform by mimicking data exchange platforms in the credit card industry to find scammers and then identify their tactics. Individual stores and Amazon marketplace sellers can both contribute information and record anonymously to flag counterfeiters in a third-party database or use the database to avoid doing business with shady partners in the future. Snap is releasing its MyAI chatbot for free to all Snapchat users. The bot is powered by OpenAI's models and was previously available to paid Snapchat subscribers, Snapchat plus subscribers. MyAI can also be added to group chats by mentioning it, and you can change the name and use a custom bitmode you for it as well if you just want to make it a little bit more personal. MyAI does things like recommend filters or suggest places to visit. It'll soon get the ability to respond with images as well. And Snap CEO Evan Spiegel said he has used MyAI to do things like create bedtime stories for his children and plan birthday activities for his wife. In other Snap news, they also opened their revenue sharing program for public stories content to all creators, as long as you have 50,000 followers and 25 million monthly views. It was previously limited to select creators. This shares revenue from ads that run between a user's stories content and fillings up our trio of Snap news here at the Snap Partner Summit. Kara Swisher asked Evan Spiegel if he thinks TikTok should be banned, and he said, we love that, but added it would be a dangerous precedent. Google announced it merged its two main artificial intelligence research units, Brain and DeepMind, into a single unit which will now be called Google DeepMind. The unit will be led by Demes Hassibus, who is the co-founder and CEO of DeepMind. Google purchased DeepMind for about $500 million in 2014, which sounds like a steal with numbers these days. All right, Rich, let's talk more about the future of AI. The future and how it's going to be built on, right? Since Google opened up its BARD chatbot last month, comparisons to other chatbots, maybe specifically OpenAI's ChatGPT, have been less than kind if you look at the aggregate. Now Bloomberg sources say Google employees have been well aware of issues with BARD. There are some pretty juicy quotes in the piece, including someone calling the system a pathological liar, and another calling it worse than useless. Generally, everybody was recommending Google maybe delay releasing it, or some engineers were saying maybe Google should delay releasing it, Google ended up still going forward. Of course, when it comes to BARD, ChatGPT, or really any other large language models, or LLMs, these all require a massive corpus of text to train on. So what's in these datasets is pretty important. It's how LLMs learn what words to string together in response to a user query. If it's really smart autocomplete, it's how it figures out what's autocomplete. To understand these datasets better, the reporters at the Washington Post and researchers at the Allen Institute for AI analyzed Google's colossal clean crawled corpus, or much boringly referred to as C4 dataset, which includes content from 15 million websites. Sarah, what were they looking at? What did they find? Okay, so looking at the site tokens, I'm only laughing because some of these naming conventions need to do something better than this, but okay, that's a different topic for another time. Looking at the site tokens, the Post found that the most frequently cited sites were patents.google.com, Wikipedia, and the subscription-only library, Scribe. Google filtered this content before using it for training, using the open source list of dirty, naughty, obscene, and otherwise bad words. Basically, you know, just trying to get the bad stuff out of there so people don't get unsavory content or don't try to make it give them unsavory content. And there are other filters as well. However, the Post found hundreds of pornographic sites and other sites associated with things like hate groups that were escaping the filter. So speaking of research, Megan, you've done a lot on this topic. So, you know, of what we've laid out here, what stands out to you about what's being handled well versus what isn't? Well, I'm super fascinated with this because it's the first time I've really felt like AI is about to steal my job, and it probably will. But what's fascinating about what the Washington Post looked at, they really, like, looked deep into the words that the sites that these generative AI tools are using to produce this content. And it's amazing. And there's so many interesting things. And with AI, like, so often it's like we're scared of the wrong things. First of all, like the copyright symbol appeared more than 200 million times in the data set, which is just interesting to me because, you know, like if you use any of this generative AI, it's not telling you that things are copyrighted. It's just giving you like an essay on 12th night or writing, you know, a press release or like just doing any of your work for you and it's not saying any of it is copyrighted. Like you said, it's, you know, it's some sites that are questionable Kiwi Farm, Stormfront, 4chan, those sites weren't blocked. And, you know, but some of the problems that they saw are just problems that have been around for, you know, decades when we're talking about like the internet. For example, like it could be blocking some non-sexual LGBTQ content, which I know like parental controls have been having that problem forever. Like someone is honestly looking for like sexual health information and it's blocked because, because it's associated with those like that long list of dirty words or some, you know, faulty thing that they put in there. I think what's interesting too is not necessarily what data it's scraping because a lot of this is just, I mean it's Google, we're talking about Google so they've been scraping this information for a long time, but it's how the AI is using it. So for example, like, I mean, some people are sometimes surprised to know that voter registration is public, but it's, you know, it is, it's public. You can look and see how, you know, what party people are affiliated with. But so scraping that info is one thing, but like then these models could use it in very unknown ways. Like if you ask for like, oh, I need a biography on Sarah Lane, like it might include your Democratic party. You know, I mean, your party. Your Democratic party. It's fine. Whatever party. Who knows, Sarah? I don't know. I know. You know, you would never once guess. But yeah, that's a really good example though. And I think the fact that this is public data. So there's nothing inherently wrong with that. But then is it going to be used to describe me in another way in a way where I was like, hmm, well, that's not, you know, I wouldn't have shared that kind of data. Willingly, you know, in a blog post or that sort of thing. But it's also, I think, you know, I think a lot, what a lot of people, myself included are still wrapping their heads around is, you know, the data that is collected is not stored somewhere, you know, like these models aren't just like, you know, you know, amassing terabytes, petabytes of data. They're simply trying to learn what should probably be said about a particular person. For example, myself, maybe I, you know, I lean in a certain way politically. Does that say something more about me based on the other data that has been collected. And that's what these models do. Yeah, and then there's the kind of the whole other issue that we're, you know, and Megan, you were talking about this one and we're talking about like data that's public like voter registration. Very specifically over in Europe, the training sets for large language models are really becoming like a super big deal. You know, we've talked about on this show, you know, Italy is is ordering open ad to stop processing data and they have a deadline of April 30 to meet some demands to make sure that they're not violating GDPR protections over there. And it's some pretty serious stuff. And we were talking about like potentially getting consent for people to have their data in those data sets for for to scrape that data, meeting right to be forgotten rules like really tricky stuff. And it's not, you know, it's not just the Italians on a lark here, regulators in France, Germany, and Ireland are also looking into open AI data collection specifically. And in fact, the European Data Protection Board has set up a GPT three specific task force to kind of coordinate on this. So I don't think any of these are going away. And it kind of speaks to this idea that under GDPR and its protections, you know, there is kind of a difference between data that is public and data that can be, you know, kind of used just because it's public. I mean, it doesn't mean you can use it, you know, without consent. And it's a very, it's a very different regime and a very, very different set of criteria that we're seeing in Europe for sure. I think also just the fact that we don't know what the accuracy like so many people because the way the whole like when you put something into chat GBT or barred like it just the way it like types it up, and it looks like magic and it's, you know, it sounds right. So when you think about where this data comes from like, so it ranked, you know, ranked all these sites in terms of subjects so like in technology medium is ranked 56 in tech content and you know full disclosure I used to work for medium. And that's part of how I know that most of it is unedited on fact checked user generated content like it's so you know that there might be a lot on medium that is from a very, you know, well known source it might some some stuff is fact checked some stuff was edited but not everything and so and I'm not sure that there's a way for this data set to really know in a way that it needs to when we're using it to know what what is accurate, especially if this is ranked the 56 and tech content. Well and to that point and to that point like I was really surprised to see how how much personal blogs in general like not, you know, like just, you know, your your everyday blog we're scraped and used for this. It makes a lot of sense I'm sure there was a lot of content there but again, that's like deeply personal in some ways that's performative in ways that you might not get from from a new site or from Wikipedia or something like that. And that may, you know, again, all of these are are put together to train these models to, you know, generate predict what text is going to generate and give you like a desirable response but it. The fact that that's so deeply personal stuff, and also not scraping like the major social media platforms things like Facebook and Twitter. It gives it a very specific I feel like performative set also to to build off of and speaking of accuracy yeah I mean these I mean that's the thing we keep having to repeat is like these models have no idea that they're even like supposed to like accuracy is not the goal of them it's the goal is to have like a convincing response, like accuracy is beside the point almost with these models in a lot of ways. Yeah, the models don't care about accuracy the models. That's not the model doesn't know what that is the model says this is what I have to work with. And this is probably the sentence that makes the most sense based on what I have to work with. And yeah, you know medium great example I mean not everybody is posting like super personal stuff on medium. And who's posted some super personal stuff on medium as of late you know like health things. I could have just made up a bunch of stuff and would that be part of you know the you know the data that becomes something that it's trained on and spits out to somebody else, so that that next person now is like, sounds like it's true. I mean that happens all the time. Well, we're going to talk a little bit more about AI and who to trust and who not to trust in just a sec. But just a reminder that you can join our conversation in our discord we'd love to have you there. You can join by linking to a patreon account at patreon.com slash DTNS. All right on the latest episode of a word with Tom Merritt Tom talked with cyber ethnography researcher and writer Ruby justice Thalo about humanities place in an age of AI enabled technologies. Where do we humans fit in. We have an excerpt from that conversation focusing on how people evaluate the veracity or truthfulness of information that they find on the internet so let's listen in now. I am very the idea that you would get information and that information would be just 100th century is a very new thing. You know usually if you grew up before the internet your research wouldn't you know probably check a book or two to get some information and there was some at least a little breath in how we acquire knowledge. And we've been primed really to go on one place and go and find one in piece of information and be like OK this is Wikipedia. This is true. I even remember when Wikipedia came out the teachers being like God don't trust Wikipedia. You still got to go to Incarta or whatever the library and yeah yeah. Yeah. And that was a big thing because they people were maybe they didn't believe that they're cold. They could be only a single source of truth especially on the internet. And so I think we've carried over that assumption that if something that has the stamp of Microsoft or that is something we see on the news. Why would somebody put something out that lies. You know I don't think it's I don't think it's wrong to expect that as a consumer. Even if they say oh it's not always going to be correct. The last paper that I read I think it was for the Lambda the Google the Google system had about 80 percent ground in this which means 80 percent veracity. You have similar things with GPT 3 and GPT 4. So four things out of five are going to be true one out of five is going to be probably a hallucination as they call it. That's a lot actually. No when you think about that's a lot one out of five. But I think we expect on the internet to be served at the very least a semblance of truth. And if it's not true it's ideal ideologically correct. No that's a great point that I don't know what the average veracity of my average conversation with my neighbors is. But I wouldn't be shocked if it was around 80 percent. You know I definitely hear people saying wrong things all the time. I know I've said things that I'm like oh that turned out to be wrong. But we're more forgiving in that interpersonal relationship than we are with the machine. I was really fascinated by what you said about the authority right. Oh it's coming from a machine we give it extra authority. Is that something we can get over. Is that something we just need to untrain ourselves from or is it just endemic and we need to adapt to it differently. The truth of the matter is is that we see the Internet as this repository of knowledge. And when we ask a machine that is connected to the Internet question we assume it can go and retrieve that information. That's just that's one of the affordances of the Internet is that a lot of knowledge is on it and through Google whatever you can access that knowledge. I think it's you know speaking as a designer it's a bit of a design flaw right. In the same way that a cigarette package would have indication on you know what the risk or dangers might be. There may be some more salient features or more salient design as to the fact that it is a tool in testing. This is not a factual you know what the AI says is not factual. They say it a little bit you know it says in the fine brand it says a little bit but it's not it is not a prominent feature of the design of the interface of these programs. It's not a habit for us yet. I guess that's what I'm wondering is like can we can we turn it into a habit of like oh no that's that's that's a chat bot of course I don't trust everything it says. I, there's something about also the maybe post fake news moment where these two things might converge and push us to a culture of double checking. We spend the last you know six since seven years since the election of Trump's just being well aware that things are the media or by the way that was the case before that as well but it became salient in the Exactly that the things we read were not always sure and people became very adamant about you know saying I got the fact checkers and this whole choreography of people looking at things that people say to make sure that they were true. And potentially one of the positive effects of that is that if we identify the culture that both the information we get from the media might be wrong and the information we get from the internet might be wrong and the information we get from. AI chat box might be wrong might push us to a culture of double checking verifying but of course the onus is on the user and the user is a person with limited amount of time and one can assume that they will only check things that are super important to them. That whole conversation just made me like my handle is Mr anthropology so that just talking about these tools being used in culture like this whole conversation is just fantastic. But what what got this really got me thinking is that to that point I remember that in college to have I've had professors tell me like in your bibliography do not put Wikipedia do not even start with Wikipedia like that was a moment when we as a like just even in an academic culture we're learning to like how to use this resource of the internet at that time. And so did to think about chat you PT and that way it can seem so disruptive and may very well be extraordinarily disruptive long term but like this is also the same way my cell phone is the same way a smartphone or computers or whatever like these are cultural products that like it's going to take some time to do that but this conversation kind of gave me I don't know some hope that you know like culture is extraordinarily resilient to these sorts of things I'm curious Megan like what what did this kind of get going in your brain. Well I mean definitely made me think about how we just spent the last 15 minutes talking about how inaccurate it was and like just the fact that like a lot is inaccurate and it always has been before like I went to college long before Wikipedia. So you know we were checking like we were sourcing from books in the library and I think as a journalist like exactly I mean part of what I do every day is you know like with my reporters like where did this come from where was this sourced like you know the job of a journalist is to collect facts and then put them together and you know tell the person what's important and I feel like that is absolutely not what chat GPT or any generative AI can do like they just can't do that so I think that that's what's scary because I just feel like there's no sourcing no one so it's not necessarily the inaccuracy it's the lack of saying where you've got that fact or like a journalist's name the byline is there you know like everybody knows the words I'm speaking the words were you know speaking on the show are all coming from us and you know with it just I don't know I'm I don't want to be all like get off my lawn chat but that is kind of what I'm saying. It is really interesting though that as we see this kind of generative I get into industry certainly like even just like the open a integrations that we're seeing these verticals that we're saying with Microsoft stuff, we are seeing sourcing being much more important like I still like I still just like one that's like say like chat GPT is like the tech demo for how this stuff works and like we're not even we're just starting to see like the idea of this at scale. And when we have seen this at scale like when we see it in business in industry. We are like I'm not saying that this will in any way be perfect or and that we will not fall down their faces on this but like sourcing is a big part of that probably we need to do. I don't know much better. But it is interesting to see that when like, hey it's being used for business for specific business products sourcing is at least a little bit part of that conversation that we're seeing right now. Indeed. Well, if you want to hear the rest of that interview be sure to head on over to a word podcast the letter a word podcast dot com to get the full episode of a word with Tom Merritt and his conversation with Ruby justice fellow. If you love Tom, you love him getting into deep conversations with cool people on fascinating topics. I have a feeling that you might if you're listening to this show, probably want to check it out. I know I love it. Check it out. I do too. I mean, this is Tom and his element. You know, Tom is great at this sort of stuff. You know who's also great at stuff at tarry. And that stuff is buying things back. The company announced it acquired more than 100 PC and console titles launched in the 1980s and the 1990s from companies like Info games, Info games rather and accolade also adding accolades trademark to its fault. Now you might say, well, hold on, didn't some of these games, you know, weren't they already Atari games? Yeah, so Atari now owns the demolition and some of these were Atari titles at one point. IP has moved around companies buy things from other folks. So it's a coming home of sorts. Last month Atari also snagged night drive studios and the IPs of 12 stern electronics arcade classics, including berserk and frenzy. Atari plans to re-release already existing games on modern consoles and create new adaptations of past story lines. Who's into it? I love this idea that Atari is this very specific like amalgam like it's this aggregator of IP that had value in a very, it feels like a very narrow window. I know some of these these titles go into the 90s, but like, you know, we're thinking like, like hardball or, you know, a demolition razor and that kind of stuff like seems like a very specific amount of them. They've kind of they're like, this is our lane. This is what we're going to do. If people think Atari, they think of a wood panel console that was sitting under your giant CRT TV. So like, like the name Atari is so so honest with that I think it's smart that whatever the incarnation of this company, it's kind of doubling down on that. It's like the vinyl community, right? Like Atari's kind of like, listen, you know, we know that some of you, not all of, you know, not every gamer, but, but, but many gamers, especially of a certain age really care about the stuff. There's nostalgia. It, you know, it invokes emotion and to have it under the Atari, Atari umbrella makes a lot of sense to me. All right. Well, let's check out the mailbag and see what you have been saying. We got one from Samir. This is based on our conversation about AI and drones helping kill weeds in farms. Yesterday was Scott Johnson. Samir says, first time feedbacker. So thank you, Samir. Please do, you know, keep it up. Samir says, I think a great use for this tech in your last episode would be to spray paint where spots are found or to clean buildings outdoors, glass or otherwise where needed. Samir's obviously talking about the idea that the precision of a, you know, drone that is trained on how a farm would work could also work for, yeah, like a skyscraper or, you know, a building that needs just a little bit of paint. The whole thing doesn't need to be painted, but just a little bit of paint. And that's a really good, really good use case for this as well. Send out the paint drones. Yeah, I kind of, kind of love that. The other thing I really love is Stephen wrote in and he had some thoughts about AI and using it for accessibility. And so he wrote in and said, I'm a blind, I'm blind and I use audio descriptions where available. However, due to licensing issues and the fact that it's a very niche area, the amount of content somewhat limited. This is, there is a surprising amount of content with audio descriptions available. You know, he says to be fair. Sometimes it's available, but not on a particular platform or maybe just not just this time around. For example, he says you could find a film with audio descriptions on one streaming platform one year and then it goes away. And when it comes back, doesn't have it anymore and vice versa. So my plan was to see how well AI chat GPT in distance would do well at writing descriptive scripts for a blind audience. He did some testing. I had to go. He said his first attempt, not bad, missed all of my favorite bits, but still not bad at all. Then in his second attend, he said chat GPT did the thing where it just makes stuff up, but I think I like the AI version better. So, all right, maybe we could feed that into once we have text to video. We can feed the good one that chat GPT did and then we can get a video and then the snaggle fully. Text to video exists, but yes. It's a long way to go. Long way to go when we're covering the first feature film next year. That's right. Yeah. Was this really a Michael Mann or was it AI? No one knows yet. Let's hear from the director himself. Well, while we're waiting for that to happen, we want to thank you, Megan Maroney, for being with us on the show today. Let folks know what you're up to these days. Well, I am freelancing mostly and I am working currently at HR Brew, which is part of Morning Brew Newsletter Company. And my content that I'm editing is mostly about workplace issues. So anybody who is in HR or is like, you know, kind of working in a company where you're the person in charge of dealing with all of the issues that have returned to work and, you know, just everything. Which are many? Yes, whether you should use chat GPT and it for work, all those things. So check it out. I think you would like it. There's lots of GIFs. Also GIFs. Also known as GIFs. Either way, I'm so. Yeah, listen. Our brand new bosses of which there are three who might say GIF or GIF. We don't know yet, but what we do know is that they are Ernest, Adonis and Bede. And they just started backing us on Patreon and that really made our day. Thank you, Ernest. Thank you, Adonis. Thank you, Bede. And Adonis, Ernest and Bede and all our other patrons. Remember to stick around. I implore you for our extended show, Good Day Internet. We'll be talking about Buzzfeed News pouring some out for it because it's shutting down and how that plays into other recent newsroom shutdown. So stay tuned. Just a reminder, you can catch our show, DTNS is live Monday through Friday at 4 p.m. Eastern, 2100 UTC. And you can find out more at DailyTechNewShow.com slash live. We're back doing it all again tomorrow with Len Peralta drawing the top tech stories and Rob Dunwood talking tech with us. Don't miss it. This show is part of the Frog Pants Network. Get more at FrogPants.com. Bob, I hope you have enjoyed this program.