 Daily Tech News show is made possible by its listeners. Thanks to all of you, including Dustin Campbell, Tim Deputy, and Brandon Brooks. Coming up on DTNS, section 230 heads to the Supreme Court, sort of. A standard to make voice assistants more accessible. And Andrea Jones-Roy is here, helping us understand how bias creeps into algorithms. It's Halloween. This is the Daily Tech News for Monday, October 3rd, 2022 in Los Angeles. I'm Tom Merritt. And from StudioRibbit, I'm Sarah Lane. I'm the show's producer, Roger Chang. And joining us Andrea Jones-Roy, data scientist, comedian, circus performer, and host of majoring in everything podcast. Welcome Andrea. Thank you. It's great to be here. It's good to have you. Let's start right off with a few tech things we all should know. Deliveroo opened at Deliveroo Hop, its first brick and mortar grocery store in partnership with the chain Morrison's on New Oxford Street in central London. Customers don't have access to store shelves. Instead, they can use digital kiosks with orders picked by store employees and bagged within minutes or pickup orders placed in the Deliveroo app. Staff will also pick items from the service's delivery couriers. Stores will offer over 1,700 grocery items, including ready to eat meals. Let me like Instacart starting a grocery store here, I guess. YouTube TV rolled out support for you to subscribe to channels without having to sign up for the $65 YouTube TV plan. So you could just get HBO Max through YouTube TV or NBA League Pass or MLB.TV or Showtime or Stars or Hallmark Movies Now, Cinemax, Epics and more. No channels from the base plan are included in the a la carte options. Prices range from either $2 to $30 a month, depending on which channel you're adding. Each is available for a seven day free trial as well if you just want to see. Just give me ESPN and that is a done deal. It's the only reason I subscribe to YouTube TV. Twitter started rolling out the ability to edit tweets to Twitter blue subscribers in Canada, Australia and New Zealand. This allows for editing a tweet up to five times within 30 minutes of posting if you really need to think about it. Tweets will display the timestamp when it was edited with edit history viewable by clicking through. So if you're curious, you can know. Twitter says that the feature will also come to the US soon. The Los Angeles Unified School District confirmed that a ransomware organization began publishing exfiltrated information about students online. The attack occurred over the Labor Day weekend a few like almost a month ago now, right? With the threat group issuing a ransom demand on September 22nd. The district decided they would not negotiate. They would not pay the ransom. Bleeping Computer reports folder names in the leaked data suggest it might include things like social security numbers, passport information and some secret and confidential documents on students. NBC Los Angeles law enforcement sources say includes legal records, business documents and even some confidential psychological assessments of students. Last week, several outlets starting with the Daily Mail reported that actor Bruce Willis had licensed to his digital rights to a company called Deep Cake to make ads and movies and TV shows that starved digital versions of him. Deep Cake has used Willis's likeness in an ad in Russia, which we mentioned on Friday's DTNS. Over the weekend, a spokesperson for the actor told the BBC that Willis had no partnership or agreement with the company. Deep Cake told the BBC that, quote, he gave us his consent and a lot of materials to make his digital twin. Also clarifying, the wording about rights is wrong. Bruce could not sell anyone any rights. They are his by default. Sounds like Deep Cake got a little over their skis in promoting what they were doing, but otherwise, you know. All right, let's talk a little more about two cases headed to court. What do we got, Sarah? All right, so the US Supreme Court has agreed to hear appeals in two cases regarding an online platform's liability for messages posted by users. We've talked about this a lot in the past. One case, Gonzalez versus Google, involves a 23-year-old US citizen that was killed in Paris in terrorist attacks in November of 2015. Relatives of the victim accused YouTube of passing along content that encouraged the attacks and sharing revenue with them. The 9th Circuit Court of Appeals upheld a dismissal of that case. They found the claims against posting were all protected by Section 230 and ruled that the plaintiffs could not prove that revenue was a connection. All right, so the family is appealing Section 230 directly there, but the second case, Twitter et al versus Tomna, involves a Jordanian citizen killed in a 2017 terrorist attack in Istanbul. Relatives accused Twitter, Google, and Facebook, all three, of aiding and abetting terrorists in violation of the US Anti-Terrorist Act. That case did not touch on Section 230 issues, but it had more evidence of revenue sharing like AdSense accounts. They had the receipts, basically. And so the 9th Circuit Court let that case proceed. And so the appeal is saying the revenue shouldn't be a connection. So the 9th Circuit decided these cases together, along with a third similar case that isn't being appealed, they were decided together. So it's notable that the Supreme Court took both of them, because they were similar, at least as far as what was being alleged. While Section 30 was deemed sufficient in one, but not relevant in the other, several of the circuit judges criticized Section 230 in their opinions. And given that the Supreme Court justices have previously written about their concerns with Section 230, kind of reasonable to expect that the Supreme Court will review the Section 230 element in these cases. So Tom, what is Section 230? I'm so glad you asked. Yes, I am. We have a whole episode of Know a Little More on this. If you want all the details, go to knowalittlemore.com, look for About Safe Harbor. But here's the short version. Since 1959, in a case called Smith v. California, the standard had been publisher versus distributor. So bookstore owners weren't expected to read every book before they sold it, but publishers were. So if you were a publisher of something, you were liable for what was in the book. But if you were a distributor, a bookseller, you weren't liable for what was in the book. When the 1990s rolled around though, CompuServe and Prodigy provided interesting new twists on the question when they ran online forums and chat rooms. Were they publishers of the user comments and therefore should have known what is in all the user comments or were they distributors like a bookseller? CompuServe didn't moderate its content, which sounds wonderful. So it was deemed by the court to be a distributor and therefore immune. Prodigy, on the other hand, did employ moderators. And because of that, the court ruled, well, you're choosing what stays up and what doesn't. You're a publisher and now you're liable for what anyone says on your platform. Congress recognized that this, if they let it continue like that, would encourage platforms not to moderate their content, leaving them full of libelous and dangerous postings. So they included section 230 of the CDA in the CDA, which says that, quote, no provider or user of an interactive computer service, basically the platforms, shall be treated as the publisher or speaker of any information provided by another information content provider, mostly users. In other words, if you try to make your forum a decent place to post things in, the government didn't want you to get punished. Now the first test of the law was 1997, Zarin versus AOL. AOL was accused of not removing posts that tied Zarin's phone number to the Oklahoma city bombings and he wasn't connected to those. So he sued and the fourth circuit wrote that it would be impossible for service providers to screen each of their millions of postings for possible problems. That was back in 1997, mind you. And doing so would restrict speech, which was reverse of the intention of section 230. So they said it's not AOL's fault that the stick got passed along, go after the people that passed it along. Since that time, there have been exemptions implemented for section 230. You're not immune from federal criminal liability under section 230. If Facebook is liable for a crime, they can't get out of it by saying section 230. You're not free from intellectual property claims. You're not free from facilitating sex trafficking. But anything else is largely under section 230. All right, now back to the case before the Supreme Court this term. In a concurrence, circuit judge Marsha Burson noted that if not bound by circuit precedent, I would hold that the term publisher under section 230 reaches only traditional activities of publication and distribution, such as deciding whether to publish, withdraw or alter content. She's basically talking about moderation as far as internet platforms go and does not include activities that promote or recommend content or connect content users to each other. In other words, she's saying, leave moderation protected, but establish that recommendation is not protected. The Supreme Court is gonna weigh in on that. Supreme Court hearings will take place this term. A decision will come sometime before the court recesses next June. And Andrea, I'm curious where you think this line would be drawn. If you were a Supreme Court justice, do you have an idea of where you would come down on this? Well, I think my main thought at the moment, and maybe this is unfair to the Supreme Court justices, is that there's no way they understand what's going on. Maybe I'm being horribly agist, but I hope they're bringing in consultation from people who study this stuff more closely. I would say that there's something, and this is getting to some of the stuff we'll be talking about later, there's a line in there somewhere related to the recommendations. Because if you can show that you're more likely to see some kind of dangerous, harmful, something piece of content, gosh, how do you define that? I don't know. But if you can show that they're making money or they're profiting from eyes on the screen if they're promoting this kind of content and they can link that content somehow to an event, I feel like there's a line in there, but there's a huge technological barrier. I almost, I'm not a lawyer, but I almost feel like the legal decision is gonna be clearer than the implementation. Yeah, to me, it's gonna come down to, okay, let's assume that the content in question violates the anti-terrorist act. Nobody, and nobody's debating that. Is Google liable because their algorithm that they're not supervising? Again, just like the posting in 1997, they aren't looking at it, but the algorithm kicks it out, are they liable for that algorithm? And we're gonna be talking a lot about algorithms today. And that is a very interesting question and one that I don't know that the courts are competent to really decide, not because they're old, but because the law doesn't contemplate that part of it. It doesn't contemplate algorithmic recommendations. You could turn it into something where it says if it were a person, would we allow, would we think that was problematic? I actually don't know that the algorithm needs to be unpacked in order to say, well, Google built something and Google is providing a service and Google is making us all more likely to see this piece of information now. And yes, if we can agree that this piece of information is tied against the counter-terrorism laws, then I feel like it's actually not that much of a stretch to blame, well, one of the things that we talk about in bias and AI is accountability. And I think actually Google should be held accountable for this. Yeah, and that's what it's gonna turn on is Google's gonna say, oh, look, like it's not a person, it's an algorithm and we don't always know what the algorithm's gonna do. Obviously, if it was a person, we wouldn't have recommended this and whether that's a sufficient defense or not, that is, of course, the question. Well, the University of Illinois at Urbana-Champaign. Tom, you're familiar with that college. Oh, all right. Partnered with Microsoft, Meta, Amazon, Apple, Google, and several nonprofits, including the Davis Finney Foundation and Team Gleasonon on the Speech Accessibility Project, seeking to improve voice recognition for communities with disabilities and speech patterns not factored into those AI algorithms that we've alluded to thus far. Now, speech interfaces are critical for communication and expression. If you're unable to normally, or more usually, I should say, express yourself in other ways and they are often not usable by those that would benefit the most from that. Building out a dataset that can reach a wide community required partnering with big tech companies on the infrastructure component of this. The university is going to recruit paid volunteers, collect voice samples from them, look to create a private, de-identified dataset out of that, and then use that to train machine learning models. Yeah, so instead of each of these companies building their own separate and maybe duplicative initiatives, the Speech Accessibility Project will provide a central dataset. At least that's what the idea is. U of I says it hopes to benefit those with amyotrophic lateral sclerosis. You might know that more commonly is Lou Gehrig's disease, Parkinson's disease, cerebral palsy, Down syndrome, and a wide range of medical and non-medical conditions that affect speech. Efforts will initially focus on American English. Now, this is just breaking today, Andrea, but I'm just curious, your first glance response to it as a data scientist, what does this look like to you? I mean, I think it's very exciting. I feel like I came in hot being skeptical of algorithms and I will continue to be, but I think part of it is, what are the consequences of using these algorithms and what are the stakes of the algorithms being incorrect? And I think the application seems very promising. And I think we all know folks, maybe if not personally, friends of friends who would benefit from such a thing. I've some friends and family with ALS and other diseases in my life who I think would benefit greatly. And so I think that it's great that it's being offered. Of course, I get nervous when I hear about de-identified data because there's a lot more to de-identifying data than just stripping out names. A lot of times you might be able to build it back together and maybe come up with a way of triangulating where that data came from. So I would wanna know more about how they're collecting it and how they're treating it to be anonymous, but generally I think this is one of the more exciting and heartening applications of tech and AI that I've seen. Yeah, it feels like they're really pushing this as far away from controversy as they can. They're paying the volunteers and they have to volunteer to do it. So you have to actively consent. Then they're gonna still try to de-identify it. I mean, at that point, they could just get people to say, like, if they know who I am, I guess it's fine. But they're like, no, we're still gonna try to keep it private. It's gonna be shared amongst multiple places. It's gonna be held by a university so that there's not a lot of motivations to keep secret what's going on with it. Yeah, I'm with you. I think there's a lot of good things about this. Devil's always in the details though, of course. I'm sure there's two years from now we can all get back together and say, oh my gosh, they did this horrible thing. Maybe I'm being pessimistic about this. But part of it is our own ethics and understanding of how these things work change. But generally speaking, this is one of the few pieces of news about tech that I was just like, wow, great. Perfect. Yeah, feels great. I love it. This is just like, this is a good thing. Yeah. Well, folks, if you have a good thing to tell us about, you're like, hey, you missed this good thing. Send us an email. Our email address is feedback at dailytechnewshow.com. In January 2023, a New York City law will go into effect requiring companies to conduct a bias audit. That's what it's called in the law, a bias audit on automated employment decision tools. And then once they've finished that audit, they must post the results of those audits on their website. Now you may ask, what is a bias audit? Well, the law says that a bias audit means an impartial evaluation by an independent auditor. And you may say, well, what does impartial mean? And what's the evaluation? And who counts as an independent auditor? And the New York City Department of Consumer and Worker Protection is working on all of those guidelines on these audits. But they don't have a timeline for when those guidelines are gonna be available. Financial audits are old hat for businesses and there's lots of accepted practices for what that counts like. But what does an AI audit look like, Andrea? Is there such a thing? Is there even a standard? I, if there's a standard, I don't know what that is. Part of the challenge is that, well, let me back up. This goes back to what I was saying earlier where I think the law is actually easier than the implementation. I think, you know, and not to say that the law is easy, but I think the tech implementation, you know, a lot of these models, there is no real way to validate or test them or the true test is to roll them out and then go back a year later and say, who did we rule in and who did we rule out on hiring and what do we, whoever this independent auditor, I agree with you on asking those questions, what did we think of it? So there's in some ways difficult to actually evaluate without actually implementing it, which goes against the entire purpose of getting ahead of the bias in the first place. On the other hand, it does depend on what type of model they're using. So I think part of the coverage of this that makes me nervous is they're saying, well, it's AI. So, you know, it's this black box AI thing and we don't really know what's going on, but every model is different. Every company that's providing these models to different organizations is different. The organizations are implementing them differently. I mean, if there's one singular test, I assume it would only be if every company is using the exact same algorithm in the exact same way and I simply doubt that's the case. I will also go back to say, you know, anyone who's done any statistics will think back to, you know, we used to do, we still do regression diagnostics, right? A very simple linear regression where we say, what's the association with X and Y? And even then it's not easy to tell what's biased. We look at, you know, 10 different graphs and a bunch of different numbers and we try to think really hard about whether we've got some issues in our model. That gets astronomically more difficult as you get to AI. So I don't know what a test looks like. That said, there are examples of tests out there. They just take a lot of work and a lot of care and they tend to be very tailored to the actual algorithm. You just flashed up a study from ProPublica that carried out a very long investigation into bias. It took an extremely long time and, you know, if they can roll that out everywhere, great. But there's a lot of ifs wrapped up in that and I think you hit on a lot of the big ones right up top. So let's assume that they come up with a way of an agreed series of methods. Maybe there's like several of them. I think a lot of folks in the audience are gonna be asking like, how is a machine biased anyway? It's a machine. Let's not anthropomorphize this thing. It's just cranking through. How does bias work there? Yeah, so I'm so glad that you asked. So I work with a lot of companies as a data scientist and in particularly around people analytics. So this is exactly the area that I spend a lot of time working with companies on hiring, recruiting, promoting, right? And all of that is about saying, all right, we've got some data about some humans and we wish we could run it through some computer where that computer could tell us which humans are most likely to be successful in the next role, successful at this company, or whatever. And so the other example that came up was the same sort of thing. We're making a guess about who is likely to recommit a crime if they're granted parole. Banks use this to say, how could we figure out who's gonna be more likely to repay their loan? So that's the kind of algorithm that we're talking about. And there's three big ways that I think about when I think about the bias that can go into them. Maybe a fourth depending on how far we wanna go with this, right? So the big one that we all think of first and that I believe this article and this possible legislation are talking about is the algorithm itself. So most algorithms, not all, but most algorithms that I've seen that are working in this space basically say, okay, we have patterns from the past. Let's learn, quote, learn those patterns, fit those patterns according to some set of rules like matching means or finding what's nearby or whatever it is that we're doing. Let's learn what those patterns were. Then we're gonna take new data. So here are new incoming employees or people who have applied for the job and say what features do these employees have that make them look like the employees that are currently doing well in the company, however we've defined that. And the computer will say, aha, of these 100 employees, these 12 are most likely to succeed. What that algorithm is usually doing, not always, usually doing is figuring out who's most similar to the leaders. And that's okay in principle, but if you think in practice that humans in the past have probably been at least a tiny bit biased, all the algorithms are doing is learning the bias from previous data. So unless we're starting with some kind of bizarre engineered made up simulated data, which has its own wild problems, write up some like utopian equality which I don't think is what we wanna do, right? All we're doing is training the machines to replicate who's at the top. So okay, so the other piece of this is what are we measuring to begin with? And maybe that is what we wanna do. And there's a version where that's not so problematic, but the second part of the bias is the data itself. So the data that we collect on humans reflects our own biases. So if you think of the data that we might have and the data that I've seen, we tend to have things like where did someone go to school? What did they major in? Did they work at a peer company that was successful? Did they get high performance evaluations? Did they get promoted quickly when they got hired? And all of those things might be indicators of great talent, but they're probably also picking up some other biases along the way. So the things that we're measuring and that we're mapping people to also tend to be biased in terms of what it is that we're even looking at, right? Maybe what we want is data on is this person a team player? Do they let other people disagree with them? Do they welcome con? There's probably stuff that we want to know and we'd want to match, but we don't have that in the data. The data reflects these other biases around, like, well, if they didn't get an MBA from Stanford, then we don't want them, right? Yeah, they're not as good as the person who did. Exactly. And so the third version of the bias is what do we do with that information once we have it, right? The algorithm spits out recommendations and says, okay, here are the four people that you should hire and then the challenge is that humans look at that and they say, well, it's from an unbiased source, so this is the objective choice. So even if you spot bias, I've seen companies do this with performance evaluations. They generate a bunch of data and the computer says, these people should be promoted and then the leaders sit down and they say, actually, I think it's these other, but they're like, but it's the computer and so they almost, they default to it, thinking that it's objective, which is its own warped thing. There are more issues, but I don't know, those are depressing ones. So the first two feel like problems with the metric, right? If you have a problem with your metric, which is, okay, our metric is gonna be the current leaders are examples of the ideal leadership. If that isn't true, then you don't wanna use the current leaders as the metric or if it's the metric for a successful employee are these list of things. If you're like, ah, but that isn't always true, then you don't wanna use those things. And then the last one is a bias for machines. It's like, well, if a machine said it, I guess I should believe it, which if the first two are true is certainly not gonna be trustworthy. It kind of turns into something where I feel like, if I was a human that was hiring somebody, it's like, sure, there are all these parameters that you kind of have to look at and then there are all these exemptions. There are just, you have a rogue situation where you have to treat it differently and if a machine can't do that, then it becomes problematic. Well, and the other thing to keep in mind for all of these is that any machine learning AI or even a statistical model, I keep going back to statistics because a lot of the instincts we have for statistics, if any of us have instincts around statistics, apply to AI, they're all going to be wrong in some way. They're all gonna make errors. If they're not making errors, we're overfitting and we have another problem and that's not great. So that means that we're gonna get it wrong some of the time. And that's okay if what we're trying to do is sort of get a broader understanding of like, oh, generally people who majored in this tend to do well, even if that's problematic, we could still learn something, right? Or we might say on average, we're pretty good at picking these people. But these are people's lives at stake, so our tolerance for error, I think, ought to be quite low. And some of these models, the example of parole in the article that was posted earlier, it was 68% of the time it was correct, which is kind of high for some models, but if you think about that means that people are wrongly being kept in jail and denied parole, or put back out on the streets and recommitting crimes, that's not a very good rate. And we're really messing with people's lives every time we make, say, a type one or a type two error. Yeah, it feels like everybody's head is at, well, let's just improve the metrics in the dataset that we're using to train the algorithm, which I'm not saying you shouldn't do, but there's a lot that really needs to be done that I don't hear as many people talking about, which is let's also moderate our expectations for how useful this is. They use these kinds of algorithms in medical practice all the time, but the medical practitioners are very aware that they shouldn't blindly follow the algorithm or they're gonna kill a patient. And so they use them, it feels like there's a better awareness of, this is a tool that I'm gonna use to inform my decision, but I'm responsible for the decision that you're not seeing in the hiring practices. Exactly, and a lot of it is the fault, I would say, of the companies that are trying to sell this technology to these companies. And frankly, it's the fault of every advertisement you ever see on television or any YouTube ad where it says, thanks to AI, we're predicting, we're all being fed this stuff, that AI is this magical thing that can cure all our problems. And it's incredibly powerful and awesome, but it's still just a tool. And I think you're right that, I'm not in medicine, but I'm hopeful that you're right that doctors think of it as a piece of information and another perspective. Whereas I think a lot of people who are professionals and have been experts in their own area and understandably don't interact with AI have no reason to say, well, why would I understand that this is biased because they're just being told it's not and that it's effective. Well, AI may not be able to solve everything, but Sarah, I'm pretty sure BReal can. So we've been talking about popularity growing rapidly of BReal for a while. I'm still not on board, but some of my co-hosts are. It's achieved another milestone though, recognizing it's growing status and that is in a Saturday Night Live sketch. In its 48th season premiere, SNL more or less explains exactly how the app works, but they do it in a SNL way, showing hostages in a bank robbery demonstrating the app when they get a notification during a stickup. Everybody has to do a BReal because if you do it later, you're gonna ruin everything and everyone's gonna know that you didn't do it at the exact time. You have to do it right then where everyone knows you did it later, like much comedy, funnier if you watch the clip than me explaining it to you, but yes, I think BReal has officially arrived. Did you see this, Andrea? I haven't, I'm disappointed in myself. This is the intersection. Are you on BReal? I'm not, maybe I should be on it. Are you on it? I am, I am the other co-host that she was referring to who is on BReal. You know my co-hosts. Some of her other co-hosts. Some of them really like BReal. Tom's been talking about BReal for, I mean, gosh, what are we talking about? Like going on a year now? Yeah. And yes. And literal things I've said to Sarah about it are said by the hostages in the bank robbery situation in this SNL skit. Like they do a great job of covering like things Sarah has said, things I have said, they're all in it. It's too invasive, Tom's like, no, no, but it doesn't have to be. No, they stole our lines. Maybe they were watching you all and then said, all right, this is our sketch. Here it is. I think they might have been. Hi, Kenan. Yeah, BReal. Also, yeah, welcome back SNL. I love my Saturday nights. Also, thanks so much to you, Andrea Jones-Roy, for being with us today. Let folks know where they can keep up with all that you do. Yes, if you can't get enough of me ranting about AI bias, you can follow me on the internet at Jones-RoyJ-O-N-E-S-R-O-O-Y on all the social medias and JonesRoy.com. Beautiful. Well, we're so glad to have you on the show today. Please come back early and often. Also, thanks to some new bosses that we got over the weekend. We've got Alex, we've got Al Spalding, we've got O'Janefragiro, Michael and Frank. They all just started backing us on Patreon. Big, big, big thanks to you, Alex, Al Spalding, O'Janefragiro, Michael and Frank. Ah, that's great. It's good to start the month strong with some new patrons. Really appreciate that. Thanks, y'all. Feeling good, feeling good. Speaking of patrons, do stick around for our extended show, Good Day Internet. We roll right into it after DTNS wraps up, but just a reminder, we do the show live. And you can catch the show live Monday through Friday at 4 p.m. Eastern, 200 UTC. Find out more at dailytechnewshow.com slash live. We are back tomorrow talking corporate network security with Rod Simmons. Join us, talk to you then. This show is part of the Frog Pants Network. Get more at frogpants.com. Time and club hopes you have enjoyed this program.