 And joining us, or joining me, I'm Colin, it's really lovely to see you all. I'm going to be talking about data ethics and how to apply ethical reasoning to our own work around data in this session. And pretty much every talk in this track has been a real slayer, so I have really four tough, very hard acts to follow. But it's been really good because this talk is a little bit of connective thread between them. And that says developers, we've got the chance to do work that has a lot of impact on humans. Sometimes the software we write is really transformative and sometimes it's really dangerous and a lot of the time it's somewhere in between or both. So this talk is about what that means for us in our day jobs and how to handle that responsibility. And just as a flag, even though this talk includes the cool thing of lying about Pringles, it's a little heavy at times. So we're going to talk about a few things in the course of this that I want to flag. We're going to be talking, we're going to lead this talk off by talking a little bit about death. And eventually we're going to discuss the work of the DC abortion fund, an organization that works in abortion access. I want to flag that in advance just in case anyone is in a position where that's going to be upsetting. I'm not here to ruin anybody. It's a good time at this conference. And I won't be offended if anybody wants to tap out based on that. That's totally chill. So let's dive in. A philosopher in the 90s had this thought experiment. Imagine an invention that would guarantee personal freedom of movement for everyone that would let individuals personally go anywhere they wanted near instantaneously. Like imagine teleportation or the Star Trek transporter being a thing thanks to this invention. A lot of the ways the distance would prevent us from doing stuff are just now no longer a factor. Think of how incredible that would be and think of how that would impact your life. But there is a cost. And the cost is really not pretty. The fuel powering this invention is actually mass human sacrifice. In order for everyone to teleport, we have to randomly select 30,000 people for the invention to violently kill off every year. Now if you were in a position to write software for this invention, what do you do? Do you go in full steam reasoning that the freedom it enables is worth the trade-off of randomly killing off one of its users every couple of hours? Do you decide that the cost of human death is just too great for you and you want no part of exposing people to a risk like that, despite the very real and very incredible benefits it stands to give everyone? Do you maybe try and land somewhere in the middle, maybe by restricting who can use it, which reduces the death toll but also means only very wealthy people are going to be able to teleport around? More broadly, do you feel as a developer like you are prepared to work through this question with what's in your toolbox right now? So the point of the thought experiment is that that's basically how cars work. They let us go basically anywhere wherever we feel like it and that empowers us with a really incredible amount of personal freedom. There's real power in that. It lets us pursue opportunities and escape from terrible situations and a lot of people die in car accidents every year, pretty much at random and that's more or less an accepted consequence we have of having that freedom of movement. Now, relatively few of us are ever going to work on something that reshapes the fabric of the world the same way cars have and generally our software is not so extreme as to rely on violent death to operate. But many of us spend a lot of time working on things that take humans and data they generate as fuel of some kind. It's way less extreme, but we're exposing people who use our stuff to some risk in some way. When humans and their data are involved in stuff we build, ethical questions are naturally going to pop up. What that means for us as working developers is that if we're using human data, we need to be sure we're doing right by the people generating that data for us and we need to be able to go through questions that our work raises. And really there's never been a better time to start thinking seriously about how we use people's data as developers. There's a lot going on in our industry right now and a lot of it does not leave the general public with a positive impression of technology or technologists. There's a great blend of mistakes compounded by cover-ups, lots of pulling some kind of switcheroo on how someone's data is used, or people watching RoboCop and thinking that that's an awesome idea on some level. One common thread between a lot of these scandals and what makes them scandals instead of garden variety business oopsies is that they're all very ugly misuses of data from people that puts those people at some kind of risk. Experian has made a business of analyzing people's individual financial records as a business service. Facebook's entire profit model is that it knows enough about people to make advertising on its platform more effective than elsewhere and Palantir is writing algorithms toward predicting what neighborhoods should have more cops on them. So when we're talking about these scandals, what we're talking about are developers implementing stuff using people's data on some level and that leads us to our other common thread which is that business people, management, didn't write this code. Developers like you and I did. Developers wrote these predictive policing algorithms. They wrote a permission structure that allowed third-party data mining. Working programmers like you and me are the hands that build these systems out. So we stare down these ethical questions all the time. I am a believer that it's completely normal to run into these problems on the regular and that we should really be running at them headfirst. Systems that use human data have pretty some really incredible life-improving stuff and it's honestly pretty sweet that our day jobs give us these opportunities and also we shouldn't shy away from the fact that there are costs and risks involved in doing this work. My own opinion and a premise of this talk is that these human ethical problems are in the same class as technical problems. Just as your code is going to have consequences, if you write n plus one queries all over the place, it's going to be really iffy if you don't account for the risks you expose people to when you use their data to power your systems. We are implementing these systems and we should be actively thinking about these consequences and risks. So I want to get our brains moving so that when we're faced with a difficult ethical decision about how to use data, we aren't staring it down like a deer in headlights. And here's my rough game plan for trying to get us started on that. We're going to start by defining ethics as developers to help us figure out what questions to ask about our work. Next, we're going to talk about identifying our values so that when it comes time to do some ethical reasoning, we know what's important to us as individuals and as a group. Different groups have different values, so we're going to start broadly from kind of a professional level and then narrow things down to a team level. And finally, we're going to walk through a few real decision points from a project at the D.C. Abortion Fund I worked on recently and apply this way of thinking to the very thorny problems that came up over the course of our work. And as part of that, we're going to talk about how to handle it and what changes when money starts to enter the equation. So please do not leave this talk and tell any philosophy professor, friends of yours that I said this, but this is a working definition of ethics we can use in the course of this talk. When we're facing ethical questions in our work, what that means is that we're making a decision on how to act in the context of our shared and personal values. And let's break that down a little bit. There are two things about this that really jump out at me. One is the shared and personal values and the other is the word act. With respect to shared and personal values, what I think that means is that ethics is about doing the right thing, where the right thing is something we have to tease out of the values of our community, our profession, and also ourselves. But what this definitely means is that ethics is never a hard, like a fast, hard set of rules we can apply. We are firmly in the arena of best practices and rough consensus rather than a natural law like gravity or Ohm's law. And as for the word act, I think what this gets at is that we're in a position of real power to do things according to what we want. And to illustrate this point, I'm going to pull up a tweet by a drill from the Internet and use this as an excuse to drink some water real fast while we internalize this. So this is a great joke, but it's also kind of a great example of what I'm talking about when we're talking about the power to act one way or another. Even when we have very linear ABC directions like an online survey asking us what we think about products or Pringles, it's still up to us to hit the button that says we know what Pringles are. And by the same token, it's on you as a developer to act ethically with data and you can't pass that buck around back up to management. Even when you're getting ordered around or on a tight deadline on some level, you are in the position of power to actually carry this out. And I know I keep really hammering on this, but I think there's real power in being the people who actually are responsible for implementing these systems and that gives us the chance to do a lot of shaping of the world as we want to see it. And another quick note about the shared values tip to bring us back a step. This is the kind of thing that can mean a lot of different things to a lot of different things. To a lot of different people. If you're like me and always looking for a good baseline, I would strongly recommend the association for computing machinery or ACMs, code of ethics and professional conduct. It's a pretty in-depth document, but it answers the question of what shared values might mean for us, specifically as developers, and gives us something to build on top of. So we're going to go through an exercise now of establishing our values. First is a profession, then as an organization, and then as a team. Starting with the association of computing machinery. I won't review it blow by blow here, but I would summarize the core parts as follows. One, acknowledge that everyone is a stakeholder in computing because modern life pretty much runs on computers more or less. Two, avoid doing harm and be respectful of people's privacy. And then there are three or four or five sections that boil down to that YOLO is not a good ethical approach to programming when it comes to security infrastructure or protecting people and stuff. So this is a baseline we can use as developers that we can supplement with organizational and personal values or team values. Organization wise, did you see abortion funds values flow straight from its founding mission to remove cost barriers to abortion? In other words, it's here to help individuals pay for abortion care who couldn't otherwise afford it on their own. Sometimes abortion is a hard choice for people, and sometimes it isn't. But people should be able to make the call on their own independently of whether or not they have a few hundred dollars handy. And here are a pair of relevant values we can extrapolate from that mission. One, in the course of funding abortion care, we exist to empower people to carry out their own choices. And two, it's okay to ask personal questions if we think that there's a good chance it's going to help us work better. Now, you know how the Rails doctrine involves some real trade-offs like sharp knives as opposed to protecting programmers from like monkey-patching base classes? This is the same kind of idea, and we prioritize these values at the expense of something else that's really important. The value on intrusive questions is a really good example of this in practice. What we lose from having this be a value is that it's really stressful to have someone ask you personal questions, and that's not a great position to put people in, even though they're getting money and it's not going to be impacted by their answers to these questions. But there's a real power imbalance there. At the same time, it's extremely important that we know about the population we're serving, so if we have something we want to know that we think is going to help how we deliver service, it's worth asking these questions very gently and kindly, but still asking them. Finally, as an engineering team working on this project, we had our own set of values. First, we acknowledged as a team that collecting data is kind of risky, so for each individual data point we're collecting, we needed to have a good reason for why and weigh that reason against the additional risk we expose our patients to. This is at the expense of being able to hoover up tons of ancillary data and have analysts swim around in it after the fact. Second, because we're working with a population experiencing financial hardship, sometimes that are in difficult or scary situations, it's important we not experiment on them. And this is tough because it means there are optimizations we're never going to be able to make, things about the population we serve that we're never going to be able to learn, and that's a tough pill to swallow. But in the course of development, we have a team agreement to not get too wild, which is tough, but them is the brakes, and it's really important to us. So having identified our values, both as a profession and organization and as a team, let's get into some case studies. We're first going to get, so here's a very rough framework of questions that we can ask about these case studies, which I'll descend from the values we've identified. This is not really a hard and fast formula, but a set of questions descending from our values that help us make decisions on whether or not we're handling data ethically. One is what we're doing or what we're thinking about doing in line with our values is a profession, an organization, and a team. And if there are conflicts between our values on this, how do we balance that? This is here, in part, because it's important to step back and gut check things sometimes. So coming back, is asking people for data in the first place fair? In my opinion, the answer is no, not really. People reach out to abortion funds because abortion funds have money and they don't. So there's always going to be some kind of power imbalance. But unfortunately, we need to know some information about patients if we're going to be able to help them out and get better at helping others in the future. So this concern takes a little bit of a backseat to our value of empowering clients to carry out their choices. And that's something that's so important to us that we're willing to move forward anyway. Second, what's the potential harm in doing what we're thinking about doing? This could be anything from making people feel kind of lousy to an actual real and scary threat to their safety. And that has to be out in the open if we're going to make fully informed judgment calls. And this descends pretty directly from the ACM code premise of avoiding harm. And also our team value of data collection being very risky business. So, and in addition, if we told someone we were doing this, would it come as a surprise to them? Would they be mad about it? Is this something they think they signed up for by working with us? This question comes from the team value of not doing med science and also the ACM value of protecting people's privacy. For decaf, we decided that we were ethically obligated to shred information that... We decided that people would be kind of surprised if we told them we were going to hold on to their data forever. So we decided we were ethically obligated to shred anything resembling personally identifiable information after a certain amount of time and not to collect things unless they were tied very directly to the thing we wanted to use to evaluate our operations. Fourth, what happens if you blow it? What happens if you screw up somehow with somebody's data? What bad things happen and is there a way to fix it? We kind of get this from the ACM caution about not yieldowing your way through development, but in decaf's case, we decided that because we were shredding data, we were implementing good effective guardrails that could get us out of a lot of trouble in worst case scenarios. So all told, the potential power dynamic and TARM were something we weighed pretty heavily here, and they made us change how we implemented a lot of stuff. We probably wouldn't have stripped personally identifiable information if we weren't concerned about select stars into Pay Spin, and it was firmly in line with our organizational values of empowering patients, and nothing was really scary enough to be a red light, so we went for it. So any of these calculations change when you're trying to keep a business afloat? I think to an extent it kind of does. The DC Abortion Fund is mission driven and has the luxury of optimizing for service, and that's not necessarily the case when you're a for profit business. I think that's also totally fair because having a job is pretty important, and we're all trying to provide for our families, loved ones, and ourselves here. It's not a get out of jail free card, but it probably does shift your values a little bit. I think that if the DC Abortion Fund were a for profit operation, we'd be a little more willing to bend the team value about collecting data being risky all the time. Probably this means information that's not immediately useful for problems that are right in front of us, but which seem pretty safe, is on the table now. And I think there are some things where the harm will never outweigh the potential benefit, but we might allow collecting data that we don't need for immediate analysis, whereas we wouldn't before. So speaking of data collection being risky, our next case study is something our engineering team revisited and reversed course on. Our decision to track whether or not a patient was in the United States legally. This is a piece of information that is legit really dangerous to have out in the open for many people, but also a piece of information which can vastly improve service delivery at the same time. We were collecting it at first and six months later because we thought the worst case scenario was getting a little too close for comfort. And to be clear, this is an example of something where I think we had a blind spot and we screwed it up because of that. So let's first talk about why we would want to collect this piece of dangerous data. Social services organizations often work with people who are undocumented, and the D.C. abortion fund is no exception. And to effectively serve them, we need to meet them where they are. That can mean working with them in their preferred language or through an intermediary organization or suggesting providers who are known to be friendly. Imagine a Salvadoran immigrant who moved to Virginia a few months ago. Probably we want to be able to point them to the type of abortion provider that has the resources to serve that person. Well, maybe that has Spanish speakers on staff. Maybe that's worked with immigrants before. Or maybe that's willing to cut a discount for people who don't have health insurance. So if people were willing to divulge it, and we of course didn't ask and they didn't have to, whether or not someone was undocumented was a really useful piece of information for getting them better service than they would have gotten otherwise. We can make sure to send people to an abortion provider that was a better fit for them. But organization-wise, it was really useful to know how many undocumented people we were seeing and whether or not it was worth making an investment in that part of what we were doing. So the first time we considered this around launch, it passed the empowerment value test with flying colors. The harm that it could potentially do seemed pretty remote at the time. And it seemed unlikely that in the event of pastime, we were going to care that much. I think in retrospect, this is where our blind spot was. I think we underestimated, I think we really underestimated the threat from that, and we didn't get the risk right. And I want to be honest about that as a mistake we made. So here I am admitting it to you, a bunch of strangers on stage. So when we rolled this out, the D.C. area was not really experiencing banner headline immigration enforcement, so we decided to let it ride. But perhaps you remember an event that happened on or around November 9th, 2016, that had a large cascading impact on how undocumented people are treated in the United States. You may also recall this resulted in public policy decisions that led to immigration and customs enforcement, or ICE cops conducting a lot of raids and arresting a significant amount of people in a very public way. Prior to this, I could have reasonably argued that while collecting the status was risky, the benefit we gained from it on empowering patients outweighed the risks, because the risks seemed so small, at least based on our understanding at the time. Two things made us reevaluate that and come down on the other side. The first was we expanded past just D.C. Another city's abortion fund adopted our software as their case management system, and they had a much different experience with ICE in their neck of the woods. So they were significantly more reticent to keep tabs on immigration status than we were. To them, this was not tracking useful information so much, but handing them a Swiss Army knife and saying, use this for everything except for the bottle opener, which you must never, ever, ever use. And that's just an accident waiting to happen. Also around this time, ICE activity picked up everywhere and really hard. All of a sudden, ICE is like checking papers and arresting people outside of courthouses, and people are legitimately having discussions about whether ICE agents can arrest people at church and stuff like that. The worst case scenario was always really terrifying, but it started to feel a lot more plausible. And so the development team decided that the risks to patients were now greater than the benefits, and it was time to cut the data. So we did. For some people, this is going to mean we're serviced and they'd get otherwise, which is a huge bummer, but we make up for that by not having a filterable list of targets cops would be interested in if raw patient data ever makes it onto Payspin. Even though our engineering team kind of ate mud on this, and it's very embarrassing for me personally because it happened because of a blind spot, I think this example raises a particularly important point about ethically handling data. Just as every scrap of JavaScript I've ever written has been completely obsolete and useless six months later, the rest of the world changes along with it. The safe-ish option one day can turn into a powder keg under your nose a month or two later, and it's really important to have some kind of mechanism to revisit this to see if your ethical reasoning holds up over time. So if you're a for-profit operation, again, this calculation might change some for you. It's much more possible that business requirements can dictate you work with dangerous personal information. My opinion is that if so, you're on the hook for some additional things. One is that you need to account for putting people in harm's way by having your app security very tight and carefully considered and not get anywhere near yolo-ing it. The other is that if you blow it, I think you need to be able to realistically offer some kind of redress, some way to realistically protect people and offset the harm you've caused if possible. An example of this from outside of software is that I drive a Ford and they recently found out fusions they shipped in 2007 have this cool thing where they spray shrapnel out when they deploy, which is not a great feature in an airbag. So the car company's problem to fix, they have systems in place to fix it through manufacturer recalls, and by the same token, if something goes wrong and you put people in data, it's your problem to fix and you need to have a game plan for that. It's reasonable to ask you to plan accordingly. Finally, a thing we considered doing very seriously was using some of the data we collected and doing some predictive modeling on it. What if we went beyond just forecasting financials and were able to have some kind of individual level guesses on whether someone might benefit from additional care? This sounds potentially very useful, but it didn't make the cut, and this is why. Value-wise, it's really in line with empowering clients, but very directly conflicts with the team value of not doing mad science. So while it could potentially result in higher quality service, and that's real, this felt a little icky to our engineering team and we decided not to do it. I also don't think it passes the surprise someone test. I think if we told people we were doing predictive modeling with their data, they'd be a little surprised. I think it's borderline, but I think this is just out of bounds enough that we're not comfortable doing it. So we wound up dunking this idea in the trash, and the reason for that was the potential harm and not really having a good way to redress if we got it wrong. We weren't super comfortable having an opaque algorithm be a factor in how we dealt with, or how we helped patients along. Even if it resulted in some better service for a portion of that, it didn't feel super fair for us to have everyone not be on equal footing when they called us. And again, if you're a for-profit operation, how does the calculus change here? I think the X factor is what happens when you get it wrong. What's the impact and how are you supposed to fix it? If you can offer a good answer to that question, it's time to think really hard about whether you want to be responsible for that. I have one of those internet banks and they think I'm really interested in buying a house, but I live in D.C. and houses are too expensive, so I can't, and their model's wrong. The impact of that is that they waste some money and I recycle some mail every month. Compare that, though, to Palantir and New Orleans secretly implementing predictive policing algorithms. They flag some hotspot neighborhoods to put more cops in, and as a result, they've really cranked up police presence in those neighborhoods. Another similar thing in this domain is modeling gang activity for individuals. If the model is wrong or needs some adjustment, there's not really a realistic way to fix it for the people who are affected by it. And for individuals in neighborhoods swept up in this, the damage is pretty permanent. All of this is to say, probably don't be moving fast and breaking things if the worst case scenario improves some permanent damage to people. We've covered a lot of ground here, so let's try and boil it and do a few takeaways. The first is that as working developers, we need to be ready to work through ethical problems the same way we work through technical problems. So we don't accidentally blow up somebody's world by accident. The second is that a very viable way to work through these ethical problems is to think very hard about our values and how power works where we are and account for that as we figure out how we're going to implement things in our day jobs. Finally, it really sucks that there's not a hard and fast formula we can use for this kind of thing, like Big O Notation or Ohm's Law or whatever. But these are questions we're going to encounter a lot and as we work through them as individuals and groups, we do wind up getting better at it and let's face it, we're not doing any favors by shying away from these problems or punting on them. So let's not shy away from ethical questions as we use data and instead make it a point to run into them head first because otherwise we're going to do wrong by other people. That is pretty much all I got. I want to say thanks real quick to Jamie, the curator of the ethical decisions track. This has been really a full day of fantastic talks pretty much from start to finish and a lot of hard work went into that, so I want to recognize that. I also want to say thank you to the technical staff. We have the conference person back there and the sound person Eric back there. Thank you very much for making this go off without a hitch. So since you all said through a long talk about data ethics, please enjoy this picture of Olga, a cat who really enjoys attention. As you can see, my contact information is up there so you should feel free to hit me up if you want. Nobody really wants to see me awkwardly stumble through questions, but I'll be chilling out for a minute if you want to talk about anything. Thank you all very much for coming. I really appreciate it.