 Daily Tech News show is made possible by its listeners. Thanks to all of you, including Dustin Campbell, Tim Deputy, and Brandon Brooks. Coming up on DTNS, Section 230 heads to the Supreme Court, sort of. A standard to make voice assistants more accessible. And Andrea Jones-Roy is here, helping us understand how bias creeps into algorithms. It's Halloween. This is the Daily Tech News for Monday, October 3rd, 2022 in Los Angeles. I'm Tom Merritt. And from Studio Ribbit, I'm Sarah Lane. I'm the show's producer, Roger Chang. And joining us, Andrea Jones-Roy, data scientist, comedian, circus performer, and host of majoring in everything podcast. Welcome, Andrea. Thank you. It's great to be here. It's good to have you. Let's start right off with a few tech things we all should know. Deliveroo opened Deliveroo Hop, its first brick-and-mortar grocery store in partnership with the chain Morrison's on New Oxford Street in central London. Customers don't have access to store shelves. Instead, they can use digital kiosks with orders picked by store employees and bagged within minutes or pickup orders placed in the Deliveroo app. Staff will also pick items from the service's delivery couriers. Stores will offer over 1,700 grocery items, including ready-to-eat meals. Let me like Instacart starting a grocery store here, I guess. YouTube TV rolled out support for you to subscribe to channels without having to sign up for the $65 YouTube TV plan. So you could just get HBO Max through YouTube TV, or NBA League Pass, or MLB.TV, or Showtime, or Starz, or Hallmark Movies Now, Cinemax, Epic, and more. No channels from the base plan are included in the a la carte options. Prices range from either $2 to $30 a month, depending on which channel you're adding. Each is available for a 7-day free trial as well, if you just want to see. Give me ESPN, and that is a done deal. It's the only reason I subscribe to YouTube TV. Twitter started rolling out the ability to edit tweets to Twitter blue subscribers in Canada, Australia, and New Zealand. This allows for editing a tweet up to 5 times within 30 minutes of posting if you really need to think about it. Tweets will display the timestamp when it was edited with edit history viewable by clicking through. So if you're curious, you can know. Twitter says that the feature will also come to the US soon. The Los Angeles Unified School District confirmed that a ransomware organization began publishing exfiltrated information about students online. The attack occurred over the Labor Day weekend, a few, like almost a month ago now, right, with the threat group issuing a ransom demand on September 22. The district decided they would not negotiate, they would not pay the ransom. Bleeping Computer reports folder names in the leaked data suggest it might include things like social security numbers, passport information, and some secret and confidential documents on students. NBC Los Angeles law enforcement sources say includes legal records, business documents, and even some confidential psychological assessments of students. Last week, several outlets, starting with the Daily Mail, reported that actor Bruce Willis had licensed his digital rights to a company called Deep Cake to make ads and movies and TV shows that starved digital versions of him. Deep Cake has used Willis' likeness in an ad in Russia, which we mentioned on Friday's DTNS. Over the weekend, a spokesperson for the actor told the BBC that Willis had no partnership or agreement with the company. Deep Cake told the BBC that, quote, he gave us his consent and a lot of materials to make his digital twin. Also clarifying, the wording about rights is wrong. Bruce could not sell anyone any rights. They are his by default. It sounds like Deep Cake got a little over their skis in promoting what they were doing, but otherwise, you know. Alright, let's talk a little more about two cases headed to court. What do we got, Sarah? Alright, so the US Supreme Court has agreed to hear appeals in two cases regarding an online platform's liability for messages posted by users. We've talked about this a lot in the past. One case, Gonzalez versus Google, involves a 23-year-old US citizen that was killed in Paris in terrorist attacks in November of 2015. Relatives of the victim accused YouTube of passing along content that encouraged the attacks and sharing revenue with them. The 9th Circuit Court of Appeals upheld a dismissal of that case. They found the claims against posting were all protected by Section 230 and ruled that the plaintiffs could not prove that revenue was a connection. Alright, so the family is appealing Section 230 directly there, but the second case, Twitter et al. versus Tomna, involves a Jordanian citizen killed in a 2017 terrorist attack in Istanbul. Relatives accused Twitter, Google, and Facebook, all three, of aiding and abetting terrorists in violation of the US Anti-Terrorist Act. That case did not touch on Section 230 issues, but it had more evidence of revenue sharing like AdSense accounts. They had the receipts basically, and so the 9th Circuit Court let that case proceed, and so the appeal is saying the revenue shouldn't be a connection. So the 9th Circuit decided these cases together, along with a third similar case that isn't being appealed, that they were decided together. So it's notable that the Supreme Court took both of them because they were similar, at least as far as what was being alleged, while Section 30 was deemed sufficient in one, but not relevant in the other, several of the circuit judges criticized Section 230 in their opinions. And given that the Supreme Court justices have previously written about their concerns with Section 230, kind of reasonable to expect that the Supreme Court will review the Section 230 element in these cases. So Tom, what is Section 230? Ah, I'm so glad you asked. Yes, I am. We have a whole episode of Know a Little More on this. If you want all the details, go to know a little more dot com, look for about safe harbor. But here's the short version. Since 1959, in a case called Smith versus California, the standard had been publisher versus distributor. So bookstore owners weren't expected to read every book before they sold it, but publishers were. So if you were a publisher of something, you were liable for what was in the book. But if you were a distributor, a bookseller, you weren't liable for what was in the book. When the 1990s rolled around, though, CompuServe and Prodigy provided interesting new twists on the question when they ran online forums and chat rooms. Were they publishers of the user comments and therefore should have known what is in all the user comments? Or were they distributors like a bookseller? CompuServe didn't moderate its content, which sounds wonderful. So it was deemed by the court to be a distributor and therefore immune. Prodigy, on the other hand, did employ moderators. And because of that, the court ruled, well, you're choosing what stays up and what doesn't. You're a publisher and now you're liable for what anyone says on your platform. Congress recognized that this, if they let it continue like that, would encourage platforms not to moderate their content, leaving them full of libelous and dangerous postings. So they included Section 230 of the CDA in the CDA, which says that, quote, No provider or user of an interactive computer service, basically the platforms, shall be treated as the publisher or speaker of any information provided by another information content provider, mostly users. In other words, if you try to make your forum a decent place to post things in, the government didn't want you to get punished. Now, the first test of the law was 1997, Zaren versus AOL. AOL was accused of not removing posts that tied Zaren's phone number to the Oklahoma City bombings. And he wasn't connected to those. So he sued. And the Fourth Circuit wrote that it would be impossible for service providers to screen each of their millions of postings for possible problems. That was back in 1997, mind you. And doing so would restrict speech, which was reverse of the intention of Section 230. So they said it's not AOL's fault that the stick got passed along. Go after the people that passed it along. Since that time, there have been exemptions implemented for Section 230. You're not immune from federal criminal liability under Section 230. If Facebook, you know, is liable for a crime, they can't get out of it by saying Section 230. You're not free from intellectual property claims. You're not free from facilitating sex trafficking. But anything else is largely under Section 230. All right, now back to the case before the Supreme Court this term. In a concurrence, Circuit Judge Marsha Burson noted that, if not bound by circuit precedent, I would hold that the term publisher under Section 230 reaches only traditional activities of publication and distribution, such as deciding whether to publish, withdraw, or alter content. She's basically talking about moderation as far as internet platforms go, and does not include activities that promote or recommend content or connect content users to each other. In other words, she's saying, leave moderation protected, but establish that recommendation is not protected. The Supreme Court is going to weigh in on that. Supreme Court hearings will take place this term. A decision will come sometime before the court recesses next June. And Andrea, I'm curious where you think this line would be drawn. If you were a Supreme Court justice, do you have an idea of where you would come down on this? Well, I think my main thought at the moment, and maybe this is unfair to the Supreme Court justices, is that there's no way they understand what's going on. Maybe I'm being horribly agist, but I hope they're bringing in consultation from people who study this stuff more closely. I would say that there's something, and this is getting to some of the stuff we'll be talking about later, there's a line in there somewhere related to the recommendations, because if you can show that you're more likely to see some kind of dangerous, harmful, something piece of content, gosh, how do you define that? I don't know. But if you can show that they're making money or they're profiting from eyes on the screen if they're promoting this kind of content, and they can link that content somehow to an event, I feel like there's a line in there, but there's a huge technological barrier. I'm not a lawyer, but I almost feel like the legal decision is going to be clearer than the implementation. Yeah, to me it's going to come down to, okay, let's assume that the content in question violates the anti-terrorist act. Nobody's debating that. Is Google liable because their algorithm that they're not supervising? Again, just like the posting in 1997, they aren't looking at it, but the algorithm kicks it out. Are they liable for that algorithm? And we're going to be talking a lot about algorithms today. And that is a very interesting question and one that I don't know that the courts are competent to really decide, not because they're old, but because the law doesn't contemplate that part of it. It doesn't contemplate algorithmic recommendations. You could turn it into something where it says if it were a person, would we allow, would we think that was problematic? We actually don't know that the algorithm needs to be unpacked in order to say, well, Google built something and Google is providing a service and Google is making us all more likely to see this piece of information now. And yes, if we can agree that this piece of information is tied against the counter-terrorism laws, then I feel like it's actually not that much of a stretch to blame. One of the things that we talk about in bias in AI is accountability, and I think actually Google should be held accountable for this. And that's what it's going to turn on is Google is going to say, oh, look, like it's not a person, it's an algorithm, and we don't always know what the algorithm is going to do. Obviously, if it was a person, we wouldn't have recommended this. And whether that's a sufficient defense or not, that is, of course, the question. Well, the University of Illinois at Urbana-Champaign. Tom, you're familiar with that college? Go on. Partnered with Microsoft, Meta, Amazon, Apple, Google, and several nonprofits, including the Davis Finney Foundation and team Gleasonon on the Speech Accessibility Project, seeking to improve voice recognition for communities with disabilities and speech patterns not factored into those AI algorithms that we've alluded to thus far. Speech interfaces are critical for communication and expression. If you're unable to normally or more usually, I should say, express yourself in other ways, and they are often not usable by those that would benefit the most from that. Building out a dataset that can reach a wide community required partnering with big tech companies on the infrastructure component of this. The university is going to recruit paid volunteers, collect voice samples from them, look to create a private, de-identified dataset out of that, and then use that to train machine learning models. Yeah, so instead of each of these companies building their own separate and maybe duplicative initiatives, the Speech Accessibility Project will provide a central dataset. At least that's what the idea is. U of I says it hopes to benefit those with amyotrophic lateral sclerosis. You might know that more commonly is Lou Gehrig's disease, Parkinson's disease, cerebral palsy, Down syndrome, and a wide range of medical and non-medical conditions that affect speech. Efforts will initially focus on American English. Now, this is, you know, just breaking today, Andrea, but I'm just curious, you know, your first glance response to it as a data scientist. What does this look like to you? I mean, I think it's very exciting. I feel like I came in hot being skeptical of algorithms and I will continue to be. But I think part of it is, you know, what are the consequences of using these algorithms and what are the stakes of the algorithms being incorrect. And I think the application seems very promising. And, you know, I think we all know folks, maybe if not personally friends of friends who would benefit from such a thing. You know, I've some friends and family with ALS and other diseases in my life who I think would benefit greatly. And so I think that it's great that it's being offered. Of course, I get nervous when I hear about, you know, de-identified data because there's a lot more to de-identifying data than just stripping out names. You know, a lot of times you might be able to build it back together and maybe come up with a way of triangulating where that data came from. But I would want to know more about how they're collecting it and how they're treating it to be anonymous. But generally, I think this is one of the more exciting and heartening applications of tech and AI that I've seen. Yeah, it feels like they're really pushing this as far away from controversy as they can. They're paying the volunteers and they have to volunteer to do it. So you have to actively consent. Then they're going to still try to de-identify it. I mean, at that point, they could just get people to say like, if they know who I am, I guess it's fine. But they're like, no, we're still going to try to keep it private. It's going to be shared amongst multiple places. It's going to be held by a university so that, you know, there's not a lot of motivations to, you know, keep secret what's going on with it. Yeah, I'm with you. I think there's a lot of good things about this. Devil's always in the details, though, of course. Right, right. I'm sure there's, you know, two years from now, we can all get back together and say, oh my gosh, they did this horrible thing. Maybe I'm being pessimistic about this, but part of it is our own ethics and understanding of how these things work change. But generally speaking, this is one of the few pieces of news about tech that I was just like, wow, great. Perfect. Yeah, feels good. I love it. This is a good thing. Yeah. Well, folks, if you have a good thing to tell us about, you're like, hey, you missed this good thing. Send us an email. Our email address is feedback at dailytechnewshow.com. In January 2023, a New York City law will go into effect requiring companies to conduct a bias audit. That's what it's called in the law, a bias audit on automated employment decision tools. And then once they've finished that audit, they must post the results of those audits on their website. Now you may ask, what is a bias audit? Well, the law says that a bias audit means an impartial evaluation by an independent auditor. And you may say, well, what does impartial mean? And what's the evaluation? And who counts as an independent auditor? And the New York City Department of Consumer and Worker Protection is working on all of those guidelines on these audits. But they don't have a timeline for when those guidelines are going to be available. Financial audits are old hat for businesses, and there's lots of accepted practices for what that counts like. But what does an AI audit look like, Andrea? Is there such a thing? Is there even a standard? If there's a standard, I don't know what that is. Part of the challenge is that, well, let me back up. This goes back to what I was saying earlier, where I think the law is actually easier than the implementation. Not to say that the law is easy, but I think the tech implementation, a lot of these models, there is no real way to validate or test them. Or the true test is to roll them out and then go back a year later and say, who did we rule in? And who did we rule out on hiring? And what do we, whoever this independent auditor, I agree with you on asking those questions, what did we think of it? So there's in some ways difficult to actually evaluate without actually implementing it, which goes against the entire purpose of getting ahead of the bias in the first place. On the other hand, it does depend on what type of model they're using. So I think part of the coverage of this that makes me nervous is they're saying, well, it's AI. You know, it's this black box AI thing and we don't really know what's going on, but every model is different. Every company that's providing these models to different organizations is different. The organizations are implementing them differently. I mean, if there's one singular test, I assume it would only be if every company is using the exact same algorithm in the exact same way. And I simply doubt that's the case. I will also go back to say, you know, anyone who's done any statistics will think back to, you know, we used to do, we still do regression diagnostics, right? A very simple linear regression where we say, what's the association with X and Y? And even then it's not easy to tell what's bias. We look at, you know, 10 different graphs and a bunch of different numbers, and we try to think really hard about whether we've got some issues in our model. That gets astronomically more difficult as you get to AI. So I don't know what an A test looks like. That said, there are examples of tests out there. They just take a lot of work and a lot of care and they tend to be very tailored to the actual algorithm. You just flashed up a study from ProPublica that carried out a very long investigation into bias. It took an extremely long time and, you know, if they can roll that out everywhere, great. But there's a lot of ifs wrapped up in that. And I think you hit on a lot of the big ones right up top. So let's assume that they come up with a way of an agreed series of methods. Maybe there's like several of them. I think a lot of folks in the audience are going to be asking like how is a machine biased anyway? It's a machine. Let's not anthropomorphize this thing. It's just cranking through. How is bias work there? Yeah. So I'm so glad that you asked. So I work with a lot of companies as a data scientist and in particularly around people analytics. So this is exactly the area that I spend a lot of time working with companies on, on hiring, recruiting, promoting, right? And all of that is about saying, all right, we've got some data about some humans and we wish we could run it through some computer where that computer could tell us which humans are most likely to be successful in the next role, successful at this company or whatever. And so I, you know, the other example that came up was the same sort of thing. We're making a guess about who's likely to recommit a crime if they're granted parole. Banks use this to say how could we, you know, figure out who's going to be more likely to repay their loan. So that's the kind of algorithm that we're talking about. And there's three big ways that I think about when I think about the bias that can go into them, maybe a fourth depending on how far we want to go with this, right? So the big one that we all think of first and that I believe this article and this possible legislation are talking about is the algorithm itself. So most algorithms, not all, but most algorithms that I've seen that are working in this space basically say, okay, we have patterns from the past. Let's learn, quote, learn those patterns, fit those patterns according to some set of rules like matching means or finding what's nearby or whatever it is that we're doing. Let's learn what those patterns were. Then we're going to take new data. So here are new incoming employees or people who have applied for the job and say, what features do these employees have that make them look like the employees that are currently doing well in the company? However, we've defined that. And the computer will say, aha, of these 100 employees, these 12 are most likely to succeed. What that algorithm is usually doing, not always, usually doing is figuring out who's most similar to the leaders. And that's okay in principle. But if you think in practice that humans in the past have probably been at least a tiny bit biased, all the algorithms are doing is learning the bias from previous data. So unless we're starting with some kind of bizarre engineered made up simulated data, which has its own wild problems, right? Of some like utopian equality, which I don't think is what we want to do, right? All we're doing is training the machines to replicate who's at the top. So the other piece of this is what are we measuring to begin with? And maybe that is what we want to do. And there's a version where that's not so problematic. But the second part of the bias is the data itself. So the data that we collect on humans reflects our own biases. So if you think of the data that we might have and the data that I've seen, we tend to have things like where did someone go to school? What did they major in? Did they work at a peer company that was successful? Did they get high performance evaluations? Did they get promoted quickly when they got hired? And all of those things might be indicators of great talent, but they're probably also picking up some other biases along the way. So the things that we're measuring and that we're mapping people to also tend to be biased in terms of what it is that we're even looking at, right? Maybe what we want is data on is this person a team player? Do they let other people disagree with them? Do they welcome con... There's probably stuff that we want to know and we'd want to match, but we don't have that in the data. The data reflects these other biases around like, well, if they didn't get an MBA from Stanford, then we don't want them, right? Right, yeah. They're not as good as the person who did. Exactly. So the third version of the bias is, you know, what do we do with that information once we have it, right? The algorithm spits out recommendations and says, okay, you know, here are the four people that you should hire. And then the challenge is that humans look at that and they say, well, it's from an unbiased source, so this is the objective choice. So even if you spot bias, I've seen companies do this with performance evaluations. They generate a bunch of data and the computer says, says, these people should be promoted. And then the leaders sit down and they say, actually, I think it's these other, but they're like, but it's the computer. And so they almost, they default to it, thinking that it's objective, which is its own warped thing. You know, there are more issues, but I don't know, those are pretty depressing ones. So the first two feel like problems with the metric, right? If you have a problem with your metric, which is okay, our metric is going to be the current leaders are examples of the ideal leadership. If that isn't true, then you don't want to use the current leaders as the metric. Or if it's the metric for a successful employee, are these list of things, if you're like, ah, but that isn't always true, then you don't want to use those things. And then the last one is a bias for machines. It's like, well, if a machine said it, I guess I should believe it, which if the first two are true, is certainly not going to be trustworthy. It kind of turns into something where I feel like, if I was a human that was hiring somebody, it's like, sure, there are all these parameters that you kind of have to look at. And then there are all these exemptions. There are just, you have a rogue situation where you have to treat it differently. And if a machine can't do that, then it becomes problematic. Well, and the other thing to keep in mind for all of these is that any machine learning AI or even a statistical model, I keep going back to statistics because a lot of the instincts we have for statistics, if any of us have instincts around statistics, apply to AI, they're all going to be wrong in some way. They're all going to make errors. If they're not making errors, we're overfitting and we have another problem and that's not great. So that means that we're going to get it wrong some of the time. And that's okay if what we're trying to do is sort of get a broader understanding of like, oh, generally people who majored in this tend to do well, even if that's problematic, we could still learn something, right? Or we might say, on average, we're pretty good at picking these people. But these are people's lives at stake. So our tolerance for error, I think, ought to be quite low. And some of these models, the example of parole in the article that was posted earlier, it was 68% of the time it was correct, which is kind of high for some models. But if you think about that, that means that people are wrongly being kept in jail and denied parole or put back out on the streets and recommitting crimes. That's not a very good rate. And we're really messing with people's lives every time we make, say, a Type 1 or a Type 2 error. Yeah, it feels like everybody's head is at, well, let's just improve the metrics in the data set that we're using to train the algorithm, which I'm not saying you shouldn't do. Right. But there's a lot that really needs to be done that I don't hear as many people talking about, which is let's also moderate our expectations for how useful this is. They use these kinds of algorithms in medical practice all the time, but the medical practitioners are very aware that they shouldn't blindly follow the algorithm or they're going to kill a patient. And so they use them. It feels like there's a better awareness of, this is a tool that I'm going to use to inform my decision, but I'm responsible for the decision that you're not seeing in the hiring practices. Exactly. And a lot of it is the fault, I would say, of the companies that are trying to sell this technology to these companies. And frankly, it's the fault of every advertisement you ever see on television or any YouTube ad where it says, thanks to AI, we're predicting the, we're all being fed this stuff that AI is this magical thing that can cure all our problems. And it's incredibly powerful and awesome, but it's still just a tool. And I think you're right that I'm not in medicine, but I'm hopeful that you're right that doctors think of it as a piece of information and another perspective. Whereas I think a lot of people who are professionals and have been in their own, experts in their own area and understandably don't interact with AI have no reason to say, well, why would I understand that this is bias? Because they're just being told it's not, and that is objective. Well, A, I may not be able to solve everything, but Sarah, I'm pretty sure B Real can. So we've been talking about popularity growing rapidly of B Real for a while. I'm still not on board, but some of my co-hosts are it's achieved another milestone though, recognizing it's growing status and that is in a Saturday Night Live sketch. And it's 48th season premiere, SNL more or less explains exactly how the app works. But they do it in a SNL way, showing hostages in a bank robbery demonstrating the app when they get a notification during a stickup. Everybody has to do a B Real because, you know, if you do it later, you're going to ruin everything and everyone's going to know that you didn't do it at the exact time. You have to do it right then where everyone knows you did it later, like much comedy. It's funnier if you watch the clip than me explaining it to you. But yes, I think B Real has officially arrived. Did you see this, Andrea? I haven't. I'm disappointed in myself. Are you on B Real? I'm not. Maybe I should be on it. Are you on it? I am. I am the other co-host that she was referring to. Who is on B Real. I wondered, yeah. You know my co-hosts. Some of her other co-hosts. Some of them really like B Real. Tom's been talking about B Real for, I mean, gosh, what are we talking about? Going on a year now? Yeah. And yes. And literal things I've said to Sarah about it are said by the hostages in the bank robbery situation in the SNL skit. They do a great job of covering things Sarah has said, things I have said. They're all in it. It's too invasive, Tom's like, no, no, but it doesn't have to be. They stole our lines. Maybe they were watching you all and then said, all right, here's our sketch. Here it is. I think they might have been. You could sue. Hi, Kenan. Yeah, B Real. Also, yeah, welcome back SNL. I love my Saturday nights. Also, thanks so much to you, Andrea, Jones Roy for being with us today. Let folks know where they can keep up with all that you do. Yes. If you can't get enough of me ranting about AI bias, you can follow me on the internet at Jones Roy J-O-N-E-S-R-O-O-Y on all the social medias and Jonesroy.com. Beautiful. Well, we're so glad to have you on the show today. Please come back early and often. Also, thanks to some new bosses that we got over the weekend. We've got Alex, we've got Al Spaulding, we've got Ogenefer Giro, Michael and Frank. They all just started backing us on Patreon. Big, big, big thanks to you, Alex, Al Spaulding, Ogenefer Giro, Michael and Frank. Ah, that's great. It's good to start the month strong with some new patrons. Yeah. Really appreciate that. Thanks, y'all. Feeling good, feeling good. Speaking of patrons, do stick around for our extended show, Good Day Internet. We roll right into it after DTNS wraps up, but just a reminder, we do the show live. And you can catch the show live Monday through Friday at 4 p.m. Eastern, 200 UTC. Find out more at DailyTechNewShow.com slash live. We are back tomorrow talking corporate network security with Rod Simmons. Join us. Talk to you then. This show is part of the Fraud Pants Network. Get more at FraudPants.com. Diamond Club hopes you have enjoyed this program.