 is a talk that is a toolkit for empathetic coding. And we'll be delving into some specific examples of what that means and uncritical programming painful results from doing things in ways that were benignly intended by us. And I wanna start with a content warning because I'm gonna be delving into some sensitive areas. Examples dealing with grief, post-traumatic stress disorder, depression, miscarriage, infertility, racial profiling, Holocaust surveillance, sexual history and consent. And while none of these are the main topic, I will be touching on them. And if that's something that you are uncomfortable with, you're gonna have about five, 10 minutes before that stuff starts coming up. So you've got a little time to think about it. Algorithms impose consequences on people all the time. We are able to extract remarkably precise insights about an individual. But the question is, do we have a right to know what they didn't consent to share with us? Even when they willingly share the data that leads us there. And we have the question of how to mitigate against unintended consequences of that. So thinking about what an algorithm is and a really basic generic level, it's just a step-by-step set of operations for predictably arriving at an outcome. So obviously usually we think of this in terms of CS or perhaps in mathematics where it's patterns of instructions articulated in either code or formulas. But you also can think of algorithms in an everyday life, patterns of instructions that are articulated in all sorts of ways, including recipes, maps, crochet pattern. Deep learning is the new hotness right now in machine learning. It is essentially algorithms for fast, trainable, artificial neural networks. And this is a technology that's been around for decades, at least since the 1980s, but mostly up till now in more or less theoretical scale and locked into academia. Due to much more recent advances, just in the past two, three years, deep learning has now become realistically able to extract insights out of the vastness of big data in production. And it's a particular approach to building and training artificial neural networks that you can think of as just a decision-making black box. And what that means in particular is you've got inputs that is just an array of numbers representing something. It might be an object, it might be words. It can be much more abstract concepts. And then you're running a series of functions repeatedly on the array. And when I say repeatedly, essentially, we're talking iterations in which it's getting more and more fine-grained in what it's analyzing. And then you have output that is a prediction of properties in all of that that will be useful for drawing intuitions from future data sets as long as they're similar to that training data set. So this is driving major advances in a number of areas, including data analysis, data visualization, NLP, computer vision. It's even being used right now for things like self-driving cars. So today we're gonna be looking at some examples of practical uses, including behavioral prediction, image classification, face recognition, and sentiment analysis. And if this is starting to sound intriguing, you can try it out a little bit. ConvNetJS allows you to do some deep learning experimentation in the browser. Obviously, this is not going to get you the speed benefits, but it's an opportunity to kind of try out some different models. There are a number of frameworks and libraries available. Ruby is one of the few exceptions right now. So I hope that if this is something that you find interesting, maybe one of you will be the one to start getting this something that we can use as well. So we lost, okay. So deep learning relies on that artificial neural nets, automated discovery of patterns within that training data set, and then applies those discoveries to draw intuitions about those future inputs. And that's a little bit abstract to think about. So let's look at a really concrete little example. This is MARIO. It's an ANN that teaches itself how to play Super Mario World. It starts with absolutely no clue. It doesn't understand its world. It doesn't understand rules. It doesn't understand gaming. All it does is manipulate numbers and then notice that sometimes things happen. The ANN learns movement and play. It learns it via that self-training session over and over again in a 24-hour period of experimentation that leads it to identify patterns and then use those patterns to start predicting insights. By the end, it truly can play the game. And speaking of games, let's play one right now. It looks something like this. It's kind of like Bingo. It is the craziest game of Bingo you've ever played, but let's go to shot. Insightful algorithms are full of pitfalls. By looking at case studies, we can explore some of the ones that are on this board. So let's go ahead and try this. In the retail sector, the second trimester of pregnancy is referred to as the holy grail. And the reason is because in the second trimester of pregnancy, women start to change their purchasing habits and change them in a big way. So brand loyalty, store loyalty, all the habits that they've built up are suddenly up for grabs. And for retailers, this means the opportunity to literally get someone to lock in, to not just purchasing decisions during this period, but potentially for the rest of their life and their families. Target is a US department store chain. And it came up with a predictive algorithm that pretty reliably detects when someone is their second trimester, just from the purchasing habits that they're starting to create right there. And that is powerful because most retailers couldn't find that out until around the third trimester. So they were able to get this jump start. Until one day a man came into the store and he was really angry and he's yelling at the manager, how dare you send coupons to my daughter full of thanks for pregnancy? Are you telling her to get pregnant? She's just a teenager. This is outrageous. And the manager who is not in charge of a big national chain of stores said, I apologize, I'm so sorry. Obviously, we don't intend that. And the man went away, came back the next day and he said to the manager, I apologize because I spoke to my teenage daughter and it turns out there's some things I didn't know. Was she ready to tell him that day? Was Target putting her in a situation that she wanted to have a conversation with someone who is angry and confused? So here they had those ads and they're full of stuff about pregnancy but they started to realize that people like this were not okay with that. And so they took a lesson and the lesson was let's hide things better. So they changed the ads. And now you have ads for all that pregnancy stuff, next to things like lawn mowers and aftershave, all sorts of things to give the impression that it was purely by coincidence that you got some of these ads, some of those ads. There's no meaning behind it. In fact, there is meaning. It's the exact same targeting but done more subtly. And the reason is because what they concluded is as long as a pregnant woman thinks she hasn't been spied on, as long as we don't spook her, it works. Shutterfly, you may recall, is a photo processing service that they use to make various things such as cards, calendars, et cetera. They sent out this message, essentially congratulations on your new bundle of joy. It's time for you to buy a bunch of cards from us announcing how wonderful it is that you now have a baby. Some people laughed at this and said, I haven't been pregnant. Also, I'm a dude. Other people responded very differently. Thanks, Shutterfly, for the congratulations on my new bundle of joy. I'm horribly infertile, but thanks because hey, I'm adopting a kitten. So I lost a baby in November who would have been due this week. It was like hitting a wall all over again. Shutterfly responded, they said, the intent of the email was to target customers who had recently had a baby. Yes, duh, we could tell. The point is that they had false positives in there and those false positives had impact. A few months ago, Mark Zuckerberg announced with considerable excitement that he's going to be a father soon. He also wrote about a series of miscarriages that he and his wife had dealt with as a couple. He said, you feel so hopeful when you learn you're going to have a child. You start imagining who they'll become and dreaming of hopes for their future. You start making plans and then they're gone. It's a lonely experience. Facebook Year in Review, many of us are familiar with this. It's been going on for a number of years. This past year they decided to get more algorithmic about it, looking at various criteria that probably indicate that there was an exciting development, lots of likes, lots of activities, stuff like that on particular posts. What they failed to take into account is that our lives are constantly changing. We have relationships that change, jobs that change. Not every memory stays the joyous one that it once was. Eric Meyer coined the term inadvertent algorithmic cruelty and he defines this as the result of code that works in the overwhelming majority of cases but doesn't take into account other use cases. So why does he get to be the one that names this? Because he's one of the people that happened to. This is a picture of my daughter who is dead, who died this year. He said the Year in Review ad keeps coming up in my feed, rotating through different fun and fabulous backgrounds as if celebrating her death. And there's no obvious way to stop it. He asked us to increase awareness of and consideration of the failure modes, the edge cases, the worst case scenarios. And I hope that's what I can do today and that we can spread beyond here. Without a mind, here is my first recommendation for all of us, be humble. We cannot intuit interstate emotions, private subjectivity, not yet anyway. Fitbit, when it started out, it had a sex tracker. And the thing about this sex tracker is it was public. I heard ouch. Vigorous effort. All right, so the other thing I do is I'm a certified sex educator. I look at this and there's really two options here. Congratulations or we need to talk. People did not know that it was public. They were unwittingly sharing this information. This is an algorithm to remember, predictably arrive at an outcome. This was a result of engineers not thinking about the different meaning of data, just collecting and sharing it. We might be willing to compete on things like how many steps we've taken, what kind of calories we've taken in. That doesn't mean that everything is something we wanna share and compete over. Some things are just private. And that was not thought through in making that kind of control panel. Uber. All right, so most of us, if not all of us, need some sort of internal ops tools. Monitoring, performance tuning, business metrics, whatever. Uber is called GodView. It basically allowed them to track cars, track passengers. In part you could see things like where things are clustering and be able to send out more cars. But they didn't limit access to just admins or restrict it to operational use. Employees throughout the company could freely identify any passenger and monitor that person's movements in real time. Drivers also had access to GodView records. Even a job applicant was welcomed to access those private records. Managers felt free to abuse GodView for non-operational purposes altogether, such as stalking celebrities' rides in real time and showing it off as party entertainment. And just to give you an idea of how horrifying GodView really is, here's an actual code excerpt. This is so knowingly inappropriate. Seriously. Auto-play true? What the hell? And of course there's also this other reference to a background image. And that's pretty telling too. So if you recall, OKCupid, the dating site used to blog about things that they were learning from aggregate trend data. And that blog focused on sharing insights into simple ways that OKCupid, sorry, OKCupid users could use that dating site better. How to be more effective at the goal that you have in using this site at all. Uber used to blog about its data as well. Crucial difference. It was not about improving customers' experience of the service. If you read that, Uber can and does track your one night stands. Why are they even looking at this? This is purely invading people's privacy, not for any operational reason, not for any service reason, but purely for the sake of judging and shaming. And if you think back to all those people who had access to those records, that becomes all the more problematic. This is not a predictable consequence of signing up for a ride share account. No one ever bought into this. Google AdWords. There was a study at Harvard of AdWords. So they looked at differing templates offered for essentially the same service. And what they did was they came up with two lists of names, the first names. One that's highly correlated with black people, one that's highly correlated with white people. So on the first list, you might have names like say La Tanya, and on the other list, you might have names like Jill. And then they looked at when you search for real professors with that first name, what do you find? AdWords returned very different results. A black identifying name was 25% more likely to return an ad that implied an arrest record. And you remember what the algorithm does. It's just focused on predicting what we click on. It's not meant to in any way reflect the real world. That's irrelevant to what AdWords does. Its job is simply to figure out what makes us click. And then based on what it observes about us, what it's observed of other people, it then starts to refine that. So originally it starts with all possible templates and then it starts to whittle down to the ones that we've shown the most interest in. So what we see here is our own collective bias being reflected back to us and reinforced and validated. Data is generated by people. It's not objective. It's constrained by our tunnel vision. It replicates our flaws. It echoes our preconceptions. Let's look at Flickr and Google Photos. So as I said, deep learning deals with things like image analysis, and that includes a variety of things, including facial detection. And we've seen this technology around for a while. You remember how it was in the beginning? Pretty funny mistakes, right? You know, a harmless, funny mistake. It's a false positive, but you know, whatever. More recently, Flickr classified as children's playground equipment, this. Dot go. The white tags you see there are Flickr's. The gray tags are the photographers. This is an algorithm treating human knowledge as irrelevant to machine intuition and treating data as inherently neutral. Flickr tagged this man as an animal and an earlier version also as an ape. Now that's a comparison which in the US has a particularly ugly history and that's what makes it all the more damaging and upsetting. This, unlike that iPhoto picture, is not in the past. This was four months ago. Three months ago, there was this from Google Photos. How does this even happen? Well, for one answer, you have to go back all the way to the 1950s. When Kodak was first creating film stock, they optimized for finding detail in white skin. They weren't interested in black skin people as customers. So, you know, to make sure that they're capturing detail in whites, every day, photo lab technicians had to use these or called Shirley cards to calibrate equipment, make sure that it is accurately representing as much detail as possible in whites and that white is very true, black skin here is being treated as completely irrelevant to film stock. The problem is that this algorithm has been reproduced all these decades later and even in digital, we're still replicating this algorithm because when we turned to digital photography, it's not as though suddenly we could have made completely different rendering and people would have just said like, wow, this thing is so broken, you know, this sucks. Of course, we were continuing to make images exactly the same way and the problem got lost. A firm is essentially a credit lending agency. It specializes in just a few consumer goods and its target market for the most part is young, fairly well off people. They make an assessment of credit worthiness based on just a few factors originally. The only thing you have to submit is name, email, mobile phone number, birthday, and the last few digits of your federal identification number. And from there, it really goes to work. It does things like look for how long it takes you to remember that information. So Stephen Hawking really sucks as a credit risk. So do people like parents, anyone who is routinely distracted is gonna be treated as less than. They look at things like GitHub activity. Why is that a problem? Well, 2% of open source contributors are women. If you start getting into people of color especially women of color, the numbers become fractional. This algorithm is just reinforcing privilege because you have to remember an algorithm is just a procedure for reliably arriving at an outcome. That means it's up to us to take into account what impact those outcomes are leading to. A firm analyzes applicants social media accounts including GitHub but also others. There are other companies doing similar things and going beyond that in 2012, Germany's biggest credit rating agency considered evaluating applicants' Facebook relationships. And this year, Facebook itself has defended a patent that pushes even further down this road by making credit decisions about a person based on the unrelated credit history of your Facebook friends. They understand that friends in real life and Facebook friends are not necessarily related to each other at all. These are not necessarily overlapping circles. The CEO of the firm defended this saying that, you know, gee, you can't really look at this too closely because then you'd start to introduce bias. What? Data's not objective. It always has bias. It's inherent a minimum in how it was collected and interpreted. Every flaw and assumption in a data training set and those original functions are, of course, having unrecognized influence on algorithms and those outcomes that they generate. A firm says its algorithm uses 70,000 factors. How many of those have potential for discriminatory outcomes? How would they know? How would an applicant know? It's not like someone can tell you what criteria led to a decision. Rationals for that algorithm can only be seen from inside the black box. Which is why I took a picture from inside a black box. Making lending decisions inside of black box is not a radical new business model. It's a regression. What is disrupting is fairness and oversight. Many countries, financial institutions, and their regulators are paying close attention to these new models. I love this. Sarah May went on a rant about this recently and as she says, many regulations aren't placed to correct for systemic structural bias against minority groups. We need to be asking ourselves, do we want that to be disrupted? Right now we are in an arms race, a machine learning arms race. Facebook, Google, Apple, Microsoft, they have all made huge investments right now in companies that are doing this and in the technology underlying it. They're making big bets in opaque intuitions. And for the moment, quality varies, but we need to remember this. Deep learning is all about iteratively drawing intuitions at extremely fine grained levels. Which means they're growing more precise and correctness but also more dangerous and damaging in wrongness. So that's a dilemma for us to take seriously as developers. Algorithms always have underlying assumptions about meaning, about accuracy, about the world in which data was generated, about how code should assign meaning, underlying assumptions influence outcomes and consequences. But we do care about getting this stuff right. We want to be empathetic coders. We wouldn't be here otherwise. So the question is how do we flip the paradigm? We can do some things like taking lessons from professional ethicists. It turns out our profession has some. They are from the Association of Computing Machinery which I'm sure every one of us has heard of. So I've adapted a few of their guidelines as well as from some other professional ethicists. We need to consider decisions potential impact on others. Asking questions like how might a false positive affect someone? For instance, those Shutterfly customers. How might a false negative affect someone? For instance, being denied a loan. And how might an algorithm's intuition be seemingly correct and yet deeply wrong about human context? Like that photo of Dachau. Like the memory of Eric Meyer's daughter. We need to reject the likelihood of consequences to others and minimize the negative consequences to others. And you'll notice I keep saying others. We're very good at thinking about consequences to business. Not so good about thinking about consequences beyond it. We need to be honest and trustworthy. Not just because these are, you know, the right things to do, but also because we will fail from time to time. And we're gonna need to earn the right to say that was an honest mistake. We're so sorry. We're gonna fix this and be believed. And part of that is it's also really important to build in recourse from the beginning so that someone can easily correct our mistake and conclusions when they're wrong. We need to provide others with full disclosure of limitations on things like these and call attention to signs of risk or harm to others. And this is a really important one. We need to be visionaries about creating more ways to counteract bias, bias data, biased analyses, biased impacts. And finally, we need to anticipate diverse ways to screw up when teams are charged with defining data collection, data use, data analysis. If they're less diverse than the intended user base, we're gonna just keep on failing them, which is why we must have decision-making authority in the hands of highly diverse teams. And what do I mean by highly diverse? Well, culture fit is the antithesis of diversity. It's superficial variations being allowed to exist, be tolerated, but their unique perspective is suppressed. Unidimensional variety is also not diversity. Diversity is wildly varied on as many dimensions as possible, differing origins, ages, assumptions, experiences. Diversity is when there's no such thing as a detectable majority anymore. So we also need to ask for permission, with the default being no. This is the nature of informed consent and enthusiastic consent. We need to focus on the many people who eagerly are sharing themselves and are enthusiastic about giving consent to be known more and serve better. There were plenty of people who would love to have coupons for the things that are happening in their lives. We don't have to be nosy in order to serve them well. We can ask some questions about what do you want in the future? What do you want right now? And would you like us to be really aggressive about giving you the best possible things for that? We also need to audit outcomes constantly. Auditing data is used widely in monitoring for discrimination in fields like hiring, housing. So the idea here is that you send in essentially two identical inputs. That might be resumes that are completely identical, except for one variation. For instance, a name that implies something about that person. And if you send in those identically qualified resumes or applications, whatever it is, the output, the decisions should be the same if the system isn't biased. Test our systems the same way. Did we find different outcomes? If so, there's a problem. Because remember, all we have is the inside of a really black box. There isn't anything to consult from the inside, which is why we also have to commit to data transparency and algorithmic transparency. And I do mean both. And this is the part that's such a hard conversation to have internally. So it wasn't that long ago that we had to fight for the legitimacy of open source in our professional toolkit. I remember those arguments. And we did push back and we were right to. We are professionals. We know that transparency is crucial for drawing insights that are genuine, that are useful. We can argue for increasing transparency because it's for a better product. Cleaner features, fewer bugs, stronger tests, happier users, public trust. These are vividly legitimate arguments to be making. And we need to do it because we build stuff that matters. This is gonna sound harsh, but Amy Hoy says if your product has to do with something that deeply affects people, either care or quit and go live in a cave and don't hurt others. We're hired for more than just a code. We're not code monkeys. We're hired as professionals to apply our expertise and judgment about how to solve problems. Our role is to be opinionated about how to make code serve a problem space well. When we're asked to write code that presumes to intuit people's internal life and act on those assumptions as professionals, what we can do is be proxies. Be people's advocates. Say no on their behalf to using their data in ways that they have not enthusiastically and knowingly consented to. Say no to uncritically reproducing systems that were biased to begin with. Say no to writing code that imposes unauthorized consequences into their lives. In short, refuse to play along. Thank you.