 Okay, good morning 800 people, who did that? So this is called Consequences of an Insightful Algorithm, and this talk is a toolkit for empathetic coding, and we're gonna be delving into some really specific issues and examples of uncritical programming, and the painful results from doing things in ways that were benignly intended. And so I wanna start off with a content warning, because I'm gonna be delving into examples that deal with a number of very sensitive topics, grief, PTSD, depression, miscarriage, infertility, sexual history, consent, stalking, racial profiling, and holocaust, and while none of those are the main topic, they are examples that are gonna come up, so anyone who feels the sudden need for coffee, please do, I won't feel it all offended. That's about 10 minutes in, so you've got some time to think about that. Algorithms impose consequences on people all the time, and we're able to extract remarkably precise insights about an individual, but the question is, do we have a right to know what they didn't consent to share, even when they willingly share the data that leads us there, and how do we mitigate against unintended consequences from doing that? So let's start by asking a very basic question, what is an algorithm? It's a step-by-step set of operations for predictably arriving at an outcome. That's just a very generic definition, and of course, usually when we talk about algorithms, we're talking about those of computer science or mathematics, patterns of instructions, articulated in code or in mathematical formulas, but you can also think of algorithms in everyday life. They're patterns of instructions in all sorts of different ways articulated, for example, as recipes or directions or even a crochet pattern. Deep learning is the new hotness right now for data mining, essentially it's algorithms for fast, trainable, artificial neural networks. This is a branch of machine learning that's been around since the early 80s, but mostly it's been locked in academia in part because of difficulties with scale. Very recently, there's been a number of breakthroughs that have made it possible to finally put this in real production, so it's become realistically possible to extract really meaningful insights out of big data, and I'm talking about in production. So another way to think of this is that deep learning relies on an ANN's automatic discovery of patterns within a training data set, and those are applied to drawing intuitions about future inputs. So just in terms of process, what this means is those inputs that we call training data can be various things, an array of words, images, sounds, objects, concepts, and when I say an array, it's like a few terabytes. So you have inputs and then execution is simply running a series of functions repeatedly on the array, and that iteration is referred to as layers, and with each layer it's getting more and more precise, more fine grained, it's running each of those algorithms on the next set of findings, and so it's really getting down to a very precise level of thousands of factors, and this is all without having to label data, categorize it, you're just throwing the data at it. And notice what that means, because deep learning is premised on a black box. The ANN has drilled down to those tens or hundreds of thousands of factors, but they're really subtle, and its belief has predictive value, but we don't know what it's basing that on. So right now, this is a technology that's driving major advances in a variety of areas, medical diagnostics, pharmaceuticals, predictive texts, including Skypes, voice activated commands like Siri, fraud detection, sentiment analysis, language translation, and even the self-driving cars. And more specifically, today we're gonna look at some really concrete examples of those, including ad targeting, behavioral prediction, image classification, and facial recognition, but all of this sounds a little abstract, so let's look at a really sort of simple, concrete example that's kind of fun. This is MARIO, it's an ANN that teaches itself how to play Super Mario World. It starts with absolutely no clue whatsoever, and I mean about its world, about even the concept of rules or gaming. All it does is manipulate numbers, and it notices that sometimes things happen. It notices that some of those things cumulatively produce outcomes. So it's learning movement and play via a purely self-training session, and it does this in those layers for 24 hours of experimentation that leads it to identify patterns and use those patterns to predict insights such that it's actually able to play the game. So speaking of games, let's play one right now. It looks something like bingo, but it's called data mining fail. Insightful algorithms are full of pitfalls like these, so by looking at case studies, we're able to explore some of the things on this board. So, are you ready? All right, so here's some that we're gonna play with. The first one is target. In the retail sector, the second month, the second, I'm sorry, trimester of pregnancy is referred to as the holy grail. And the reason is because it's one of the few times in our lives where all of our shopping habits suddenly come up for sale again. So our buying loyalty, our store loyalty, our brand loyalty, everything we do in terms of spending suddenly we're rethinking, and it's a great opportunity, obviously, to create a lifetime customer, potentially even a family lifetime customer. So this is valuable stuff. Most retailers are only able to use data to find just about the third trimester. So one day, Target had a few marketers who walked across the office and asked one of the programmers a really simple question. If we wanted to figure out if a customer is pregnant, even if she doesn't want us to know, could you do that? This is a really interesting challenge, right? I mean, would you think about it? Would you wonder and experiment a bit? So he actually did come up with an algorithm and it turned out to be very reliable. So they started sending out ads for maternity, ads for infant care. One day a man comes into one of the stores and he's very angry. He's yelling, how dare you? You sent this to my teenage daughter. Are you trying to tell her I have sex? Are you trying to tell her to get pregnant? Now the store manager obviously is not in charge of the national mailers. Nevertheless, he apologized, the man went away. The man came back the next day and he said, I owe you an apology. It turns out there's some things I didn't know about going on in my household. My daughter is pregnant. Other people complained too and Target took a lesson from that. They thought about it real hard and they decided the best thing to do was to deceive and manipulate people. So they do the same ad targeting but now they couch those ads in unrelated products. They don't care about those. They're still sending those same ads but the customer doesn't know. So you might as well have something like for instance diapers and cologne in the same ad. And I think this is really great because as long as pregnant woman thinks she hasn't been spied on, as long as we don't spook her, it works. Please don't do that. Shutterfly was in a somewhat similar situation. They sent out emails saying, as a new parent, congratulations, time to send thank you notes for your birth announcement. Not everyone who received those announcements actually had a baby. This was a little awkward although some people found it quite amusing. Not everyone did. Thanks, Shutterfly, for the congratulations on my new bundle of joy. I'm horribly infertile but I'm adopting a cat. I lost a baby in November who would have been due this week. It was like hitting a wall all over again. Shutterfly responded that the intent of this email was to target customers who've recently had a baby. Well, duh. That's not an apology. It's barely even an explanation. This caused real harm. A few months ago, Mark Zuckerberg excitedly announced that he's gonna be a father and he wrote about a series of miscarriages that they dealt with as a couple. He said, you start imagining who they'll become and dreaming of hopes for their future. You start making plans and then they're gone. It's a lonely experience. Just a position. Facebook year in review. It's been around for a few years. It was considered essentially beta and it was mainly something that was self-selecting. You could pick through your posts from the past year and create a little sort of memorial. This past year, they decided to do this algorithmically and so your newsfeed would fill with the wonderful moments in your past year. What they failed to take into account is that our lives are constantly changing. The relationships we've had, the jobs we've had, our memories don't necessarily stay the same. Things that were joyous back then may not be now. Eric Meyer coined the term inadvertent algorithmic cruelty and he defines it as the result of code that works in the overwhelming majority of cases but doesn't take other use cases into account. So why does Eric get to be naming this? Because he's one of the people it happened to. This is a picture of my daughter who is dead who died this year. The year in review ad comes up and it keeps coming up in my feed, rotating through different fun and fabulous backgrounds as if celebrating her death. And there's no obvious way to stop it. Eric calls on us to increase awareness of and consideration of the failure modes, the edge cases, the worst case scenarios. And obviously I hope to do that here today but I also would really like you to carry that forward to others and with that in mind, I'm gonna give you my first recommendation. Be humble. We cannot into it interstate emotions or private subjectivity. Not yet anyway. Eric's blog post was just last December and garnered a lot of attention because hey, he's Eric Meyer and it got attention within the industry obviously. It also got it from the mainstream media. And there's really this question of how do you avoid blindsiding someone with unpleasant stuff annually? And obviously Facebook must have done a bit of introspecting on this problem because it's not easy. Three months later they introduced a change and it's called On This Day. It's daily reminders. Of fun, trivial stuff. Five years ago today you became Facebook friends with somebody. Two years ago you went hiking. A year ago you had dim sum. And then notice he said, you know, we care about you. The implication is we get it this time. Here's a memory from three years ago and we think you'll like it. Aw puppy. On this day you posted a picture of your dog. Thanks Facebook for picking today to hit me with this dumb feature and remind me that my dog died three years ago today. Sometimes Facebook's On This Day sends me memories from high school and you know it's triggering. I did not enjoy high school. I want a need to forget. You do not get to decide what parts of my past I should keep fresh in my mind and which parts I walk away from. Fuck you. We as programmers have to learn from mistakes. We need to learn from ours, we need to learn from others. We need to decide that harm full and harm less are not consequences that balance each other out. So when Fitbit started out, as you know it essentially gamifies various personal aspects, you know, how many jumping jacks you did, how far you walked, your weight loss and weight game. It also had another feature. It had a sex tracker and it was default public. Can we talk about Jeff for a minute? Remember the generic definition of algorithm. Step by step to a predictable outcome. The algorithm here was treating all data as the same. The outcome was making everything public because we're all competing, right? On all of our metrics, this is the fail. They didn't know. This was public without warning. Some data is different from others. We don't get to just fill it all out in public as if all data is equal. Uber. All right, so obviously most of us need some sort of internal ops tools, right? Maybe for monitoring, performance tuning, business metric stuff, okay, so we know this. Uber's is called GodView. Uber did not limit access to the admins or restrict it to operational use. Employees, including drivers, could freely identify any passenger and monitor the person's movements. Drivers had access to GodView's records as well. Even a job applicant was welcome to access these private records. Managers felt free to abuse GodView for non-operational purposes altogether, such as stalking celebrities' rides in real time and showing it as party entertainment. This isn't negligence, this is abuse of an algorithm. So you might remember a few years back, the research group at OKCupid used to blog about things that they were learning from aggregate trend data, and that blog focused on sharing insights into simple ways that you could, as an OKCupid user, use the dating site well, that's its purpose, right? Uber used to blog about its data too with a crucial difference. It's not about improving customer experience of the service. In fact, if you look real closely, Uber can and does track your one-night stance. This is purely invading people's privacy, not for business purposes, but purely for the sake of judging and shaming people. That is not a predictable consequence of signing up for a rideshare service. Google AdWords. So a few years back, there was a Harvard study. It looked at two different sets of data. One on AdWords itself, and one on a site hosting AdWords. Both had the same ad templates. And what they did was they threw some names at it. Ones that were very correlated with black people, and ones that were correlated with white people. And they did searches for people with those actual first names, and who are real people with other last names to see what would come up. And what comes up was this. A black identifying name was 25% more likely to result in an ad that implied that they had an arrest record. Examples like this. AdWords algorithm focuses on predicting what we click on. That's it, right? The real world is irrelevant to AdWords. Its job is simply to figure out what makes us click. And it's based on what it knows about us, and it's based on the activity of other people, what it knows about them, what's observed. What we see in this is our collective bias, being both reflected to us and reinforced. Data is generated by people. It's not objective. It's constrained by our tunnel vision. It replicates our flaws. It echoes our preconceptions. Twitter. Accidental algorithmic run-ins. A phrase obviously familiar, but just a little bit different than what Eric was talking about. Joanne McNeil coined this term. She didn't give it a formal definition, but this can be roughly summarized as classifying people as similar where careless prompts create scenarios harder to control and prepare for. Typically we're talking about some form of version of recommendation engines. And essentially what it means is that you're trapped by a recommendation system that is determined to show you someone similar that you'd actually rather avoid. It's a false positive that can't easily be detected algorithmically or corrected by the user. Sometimes that similarity factor is pretty trivial, right? My boyfriend's ex. Sometimes the factor connecting you to the person is intensely upsetting. If you are stalked by a former coworker, Twitter may reinforce this connection algorithmically boxing you into a past that you're trying to move on from. Your affinity score with your harasser will grow and grow with every person who follows them at Twitter's recommendation. And notice something, just like AdWords, the algorithm doubles down on its false certainty with every action that third parties take. You're not in control of it. Similarity algorithms become in effect a proxy for harassers. And many of these systems don't give any way to turn that off. Flickr and Google Photos. All right, so yeah, when facial recognition started becoming mainstream, we saw plenty of humorous examples like this, right? Lots of mistakes. Early versions of iPhoto helpfully detecting faces in your baked goods. That's a harmless mistake, right? It's a false positive, but it doesn't really matter. Here's another one. You probably remember this one from six months ago. This is Microsoft's HowOld.net. And it's actually using deep learning itself to take facial recognition to the next level. It's drawing intuitions about age based on nothing but your photo. And it's assigning tags for gender, also looking at nothing but that photo. And of course, inevitably, this is pretty new for them. It's gonna make some mistakes along the way. And these ones look pretty harmless too, right? You're not really using them for anything. Some false positives are not funny like this, such as the next one, which is also from just six months ago. Flickr classified this as children's playground equipment. This is Daco, the concentration camp gate with the infamous motto, work will set you free. Notice something else here. The gray tags are the photographer's manually added ones. The white tags are flickers. This is a consequence of algorithmic hubris. Treating human understanding as irrelevant for machine intuition. Treating data as inherently neutral, as we know, obviously, especially with something like this, it isn't. Flickr tagged him as an animal. Originally, it also tagged him as an ape. And that's a comparison that we all know has a particularly troubling history in America. So let's be clear here. This isn't a criticism of machine learning. This isn't about anyone company or coder. We are all subject to the same pitfalls that they fall into. Here's Google Photos just a month after that last one. Google Photos, you fucked up. My friend's not a gorilla. All right, so we've seen this happen at least twice, right? How does it happen? For one answer, you had to go all the way back to the 1950s. Kodak was developing color film. And they optimized for white skin. Black skin was not a relevant market for them. So lab technicians every day had to calibrate equipment, based on a card they called Shirley Cards, optimizing for detail in white skinned women and in white accessories, making sure that this is where the detail was showing up. All of which means that our development processes have to respond to this legacy data. It's generations old, but the tools used to make film, the science of it are not racially neutral. We are throwing terabytes of data at it and some of them are old. They carry this with them. And even today, as we moved into digital sensors, think about it, no one was gonna allow an entirely different way of exposure. We would have said these cameras suck, forget digital. They had to mimic the same kind of exposure that we were used to for decades. So here we have what essentially is sensor noise and inadequate sensors, contaminating data even now. Black skin is not represented in detail the same way. And that's a hard problem to deal with. And that makes it really tempting to just avoid thinking about it at all, affirm. Might not be as familiar with them. They are a consumer lending company. They extend smallish amounts of credit for buying particular consumer goods. They've also expanded now into loans for things like coding schools. Their application is super simple. They take just a few factors. All they're looking at is your name, your email, your mobile number, your birthday and your social security number. And from that, they go hunting. They ask for a bit more information sometimes, like for instance, your LinkedIn or GitHub profile. What's wrong with that? Practically everyone here has one, right? 2% of open source contributors are women. This immediately is introducing bias just by asking the question at all. Behavioral factors are looked at, including how long it takes you to type or how much you pause while reading the terms of service, which must be awkward for people like Stephen Hawking. These are algorithms for reinforcing privilege. Because remember, that algorithm is just a procedure for reliably coming to an outcome. This is reliable. It's up to us to take into account the impact that the outcomes are gonna lead to. The outcome here is reliably identifying privileged people and reliably excluding most people who don't have an abundance of privilege. This isn't about their creditworthiness. It's about their ability to have access to the privilege of paying at all. Deep Learning looks at a deluge of random data points and learns how to assign labels to them. How about factors like this? The assumption of how long it takes you to read something? Well, you know, sometimes it's because you have a kid that you're chasing after. There's lots of reasons that have absolutely nothing to do with your creditworthiness, and yet you're being judged by it. How does this give comprehension of meaning and context? Without these, bias is always gonna run rampant. The immense power of machine intuition is absolutely irreplaceable. I'm not questioning that. It's not a replacement for comprehension, though. Alan Turing reminds us that if a machine is expected to be infallible, it cannot also be intelligent. So a firm analyzes the applicant's social media accounts, including things like Facebook. So do some other companies. In 2012, Germany's biggest credit-reading agency considered evaluating applicants based on their Facebook relationships. That's really weird, because Facebook friend is not necessarily at all the same as friend-friend. And what about when the Facebook friend really is a friend-friend? Facebook recently defended a patent that pushes further than a firm does, making credit decisions about a person based on the unrelated credit history of your Facebook friends. This is an algorithm with potential to deeply intrude and alter a person's relationships if you have to buy out of knowing them and interacting with them just to be able to get your own loan. This is about being financially shamed and punished by an algorithm. It's important to maintain the discipline of not trying to explain too much, says the CEO of a firm. Adding human assumptions, he noted, could introduce bias into the data analysis. What the fuck? Dude, data's not objective. Data always has bias inherent at minimum from how it was collected, how it was interpreted, and every one of those flaws and assumptions in that first data training set and the original functions it was passed to, of course are having influence on the algorithms that are being generated, and thus the outcomes of the next data thrown at it. The firm says it assesses 70,000 factors, none of which it knows for sure. How many of those have potential for discriminatory outcomes? How would anyone even know? It's not like someone can tell you what criteria led to a decision. This is completely different from the credit lending we do now. Rationals from the algorithm can only be seen from inside that black box. So I took a photo for you of the inside of a black box. Making lending decisions inside of a black box is not a radical new business model. It's disruptive, but it's a regression. What's disrupting is fairness and oversight. Algorithms always have underlying assumptions about meaning, about accuracy, about the world in which that data was generated in the first place, about how code should assign meaning to it, underlying assumptions, influence, outcomes, and consequences. Right now we're in an arms race, a data mining arms race. Major players have been making big bets right now on deep learning and those opaque intuitions. And yeah, for the moment, quality varies like Microsoft's example, but we need to remember that deep learning is all about iteratively drawing predictive intuitions at extremely fine grained levels. All these things matter, which means that they're growing both more precise in their correctness and more damaging in their wrongness, which presents a dilemma for us. I really believe that we can flip the paradigm because we do care about getting this stuff right. We do wanna be empathetic coders. So the question becomes, how do we flip that paradigm? I'm gonna give you some starting points. One, consider decisions potential impact on others. How might a false positive affect someone, such as those Shutterfly customers or those Twitter users? How might a false negative affect someone like being denied a loan? How many other ways can an algorithm's intuition be superficially correct and yet deeply wrong in the human context, like that photo of Daco, or Eric Myers being reminded of his daughter? We can do things like project the likelihood of consequences to others and minimize negative consequences to others. And if you notice, I keep repeating two others because this is the kind of stuff that we're really good about thinking about for ourselves and our companies. We're not so good about thinking about impact on the users. And you could think about this much like the Hippocratic oath doctors pledging to first do no harm. Let's try that. We also need to be honest and trustworthy and obviously we're gonna try to do those things just because they're good. But because in this case, we also really need to be able to be trusted when we make mistakes because we're definitely gonna make them. We need to be able to say it was a mistake. It was an honest mistake. We will correct this. It will not happen again. We apologize and be believed. This is the value of making sure that this is part of our development in the first place. And that also means that it's really important to build in recourse for someone to easily correct when we do get a conclusion that's wrong. We have to provide others with full disclosure of limitations and call attention to signs of risk and harm to those people. And we need to be visionaries about creating more ways to counteract. Counteract bias data, bias analyses, bias impacts. We need to anticipate diverse ways to screw up. As long as the teams who are charged with defining data collection and use and analysis are less diverse than the intended user base, we will keep failing them. Black people know that pictures are not so great for them. It's not a surprise. We must have decision making authority in the hands of highly diverse teams. Culture fit is the antithesis of diversity. It's superficial variations being allowed to exist, but their unique perspective is being suppressed because the point of culture fit is to avoid group think. Unidimensional variety is not diversity either. Diversity is wildly varied on as many dimensions as possible, differing origins, different ages, differing assumptions, differing experiences. Diversity is where there is no majority that you can identify. We need to ask for permission with the default being no, you don't have permission. We need to focus on the many who are eagerly willing to share themselves and who are enthusiastic about giving consent. They wanna be known, they wanna be served better. We can serve them well. And when I say permission, I don't mean add to the terms of service and the mile-long privacy policy. It doesn't have to be elaborate. It could be something as simple as putting the algorithm in the hands of users, like letting Twitter users decide that they announce, hey, you should really follow this person. And as long as we're doing stuff like Follow Friday, how hard is it for something like Twitter to simply extract that and turn that into the list of recommendations? We're already making these recommendations. The algorithm is ignoring them. It's things like just giving a checkbox. A pregnant person is not, in fact, the only person who has a stake in the pregnancy, right? Like here's one person shopping, but there's many. There may be partners, grandparents, neighbors, friends who happily would buy stuff for that pregnant person and for that baby. And think about what a win that is. A checkbox here is really a whole lot better than trying to invade someone's privacy. We need to audit outcomes constantly because, as I mentioned, big black box. Now what I mean by that is this is a solution that's most commonly used in things like auditing, housing discrimination, job discrimination. The idea is really simple here. You put an application that are identical, except varied on just one factor. For instance, race or age or income, and look at the outcome. If you get a different result, then you know that there was discrimination based on that factor. You don't have to look at the algorithm. The outcome itself tells you what you need to know. And this is something that we're gonna need to use a lot to check our work. And part of that means that we need to commit to data transparency and algorithmic transparency. Both of them. And I know you're thinking that this is the really hard conversation to have internally and it feels like an unrealistic one. Too many companies keep thinking that proprietary is the only way to win. You know, I think back that it was really not that long ago that we were fighting for open source in our toolkit. And you know, we really pushed back on companies that insisted that proprietary was the only way to go. And we were right. We're professionals. We know that transparency is crucial for drawing insights that are genuine and useful. So please start the conversations. Argue for increasing transparency because it's for the sake of better product. Cleaner features, fewer bugs, stronger tests, happier users, public trust. Because we build stuff that matters. Amy Hoy is really harsh on this, but she's right. If your product has to do with something that people are deeply affected by, either care about it or quit and go live in a cage and stop hurting people. It is so easy to unthinkingly build an app full of data mining fails like these. Building differently requires us awareness, critical thinking, and most of all, deciding as a whole team to take a stance to say, listen, here's the deal. We do not build things here without understanding consequences to users. This is the way we work. This is good. This is our process. We're hired for more than just a code. We're not code monkeys. We're hired as professionals to solve problems, apply our expertise and judgment about how to solve problems. Code isn't what we do. Code is how we enable. Our role is to be opinionated about how to make code serve a problem space well. So when we're asked to write code that presumes to intuit people's internal life and act on those assumptions, as professionals we're gonna have to be people's proxies, be their advocates, say no on their behalf to using their data in ways that they have not enthusiastically consented to. Say no to uncritically reproducing systems that we're biased to begin with. Say no to writing code that imposes unauthorized consequences onto their lives. In short, refuse to play along. Thank you.