 Live from Stanford University, it's theCUBE. Covering Stanford Women in Data Science 2020. Brought to you by SiliconANGLE Media. Hi, and welcome to theCUBE. I'm your host, Sonia Tagare, and we're live at Stanford University, covering the fifth annual WID's Women in Data Science Conference. Joining us today is Lucy Bernholz, who is the Senior Research Scholar at Stanford University. Lucy, welcome to theCUBE. Thanks, thanks for having me. So you've led the Digital Civil Society Lab at Stanford for the past 11 years. So tell us more about that. Sure, so the Digital Civil Society Lab actually exists because we don't think Digital Civil Society exists. So let me back, take that apart for you. Civil society is that weird third space outside of markets and outside of government. So it's where we associate together. It's where we as people get together and do things that help other people. Could be the nonprofit sector, it might be political action, it might be the eight of us just getting together and cleaning up a park or protesting something we don't like. So that's civil society. What's happened over the last 30 years really is that everything we use to do that work has become dependent on digital systems. And those digital systems, so I'm here, I'm talking gadgets from our phones to the infrastructure over which data is exchanged. That entire digital system is built by companies and surveilled by governments. So where do we as people get to go digitally where we could have a private conversation to say, hey, let's go meet downtown and protest X and Y or let's get together and create an alternative educational opportunity because we feel our kids are being overlooked or whatever they have. All of that information that could exchange, all of that associating that we might do in the digital world, it's all being watched. It's all being captured and that's a problem because both history and political science, history and democracy theory show us that when there's no space for people to get together voluntarily, take collective action and do that kind of thinking and planning and communicating just between the people they want involved in that, when that space no longer exists democracies fall. So the lab exists to try to recreate that space. And in order to do that, we have to, first of all, recognize that it's being closed in. Secondly, we have to make real technological process. We need a whole set of different kind of digital devices and norms. We need different kinds of organizations and we need different laws. So that's what the lab does. And how does ethics play into that? It's all about ethics and it's a word I try to avoid actually because especially in the tech industry I'll be completely blunt here. It is, it's an empty term. It means nothing. The companies are using it to avoid being regulated. People are talking to talk about ethics but they don't want to talk about values. But you can't do that. Ethics is a code of practice built on a set of articulated values. And if you don't want to talk about values you're not really having a conversation about ethics. You're not having a conversation about the choices you're going to make in a difficult situation. You're not having a conversation over whether one life is worth 5,000 lives or if everybody's lives are equal or if you should shift the playing field to account for the millennia of systemic and structural biases that have been built into our system. There's no conversation about ethics if you're not talking about that thing. Those things, as long as we're just talking about ethics we're not talking about anything. And you were actually on the ethics panel just now. So tell us a little bit about what you guys talked about and what were some highlights. So I think one of the key things about the ethics panel here at Woods this morning was that first of all it started the day which is a good sign. If it shouldn't be a separate topic of discussion we need this conversation about values about what we're trying to build for who we're trying to protect how we're trying to recognize individual human agency that has to be built in throughout data science. So it's a good start to have a panel about at the beginning of the conference but I'm hopeful that the rest of the conversation will not leave it behind. We talked about the fact that just as civil society is now dependent on these digital systems that it doesn't control data scientists are building data sets and algorithmic forms of analysis that are both of those two things are just encoded sets of values. And if you try to have a conversation about that at just the math level you're going to miss the social level. You're going to miss the fact that that's humanity you're talking about. So it needs to really be integrated throughout the process. Talking about the values of what you're manipulating and the values of the world that you're releasing these tools into. And what are some key issues today regarding ethics and data science and what are some solutions? So I mean this is the Women in Data Science Conference. It happens because five years ago or whenever it was the organizers realized say women are really underrepresented in data science and maybe we should do something about that. That's true across the board. It's great to see hundreds of women here and around the world participating in the live stream. But as women we need to make sure that as you're thinking about again the data and the algorithm, the data and the analysis that we're thinking about all of the people all of the different kinds of people all of the different kinds of languages all of the different abilities all of the different races, languages, ages you name it that are represented in that data set and understand those people in context in your data set. They may look like they're just two different points of data but in the world at large we know perfectly well that women of color face a different environment than white men, right? They don't walk through the world in the same way and it's ridiculous to assume that your shopping algorithm isn't going to affect that difference that they experience in the real world that isn't going to affect that in some way. It's fantasy to imagine that it's not going to work that way. So we need different kinds of people involved in creating the algorithms different kinds of people in power in the companies who can say we shouldn't build that, we shouldn't use it we need a different set of teaching mechanisms where people are actually trained to consider from the beginning what's the intended positive what's the intended negative and what is some likely negatives and then decide how far they go down that path. Right, and we actually had on Dr. Ruhman Chowdhury from Accenture and she's really big in data ethics and she brought up the idea that just because we can doesn't mean that we should. So can you elaborate more on that? Yeah, well it just because we can analyze massive data sets and possibly make some kind of mathematical model that based on a set of value statements might say this person's more likely to get this disease or this person's more likely to excel in school in this dynamic or this person's more likely to commit a crime. Those are human experiences and while analyzing large data sets that in the best scenario might actually take into account the societal creation that those actual people are living in trying to extract that kind of analysis from that social setting first of all is absurd. Second of all it's going to accelerate the existing systemic problems. So you've got to use that kind of calculation over just because we could maybe do some things faster or with larger numbers are the externalities that are going to be caused by doing it that way. The actual harm to living human beings should those just be ignored just so you can meet your shipping deadline? Because if we expanded our time horizon a little bit if you expand your time horizon and look at some of the big companies out there now they're now facing those externalities and they're doing everything they possibly can to pretend that they didn't create them and that loop needs to be shortened so that you can actually sit down some way through the process before you release some of these things and say in the short term it might look like we'd make X profit but spread out that time horizon two X and you face an election in the world's largest longest lasting stable democracy that people are losing faith in. Is that the right price to pay for a single company to meet its quarterly profit goals? I don't think so. So we need to reconnect those externalities back to the processes and the organizations that are causing those larger problems. Because essentially having externalities just means that your data is biased. Data are biased. Data about people are biased because people collect the data. There's this idea that there's some magic de-bias data set is science fiction. It doesn't exist. It certainly doesn't exist for more than two purposes if we could, and I don't think we can de-bias a data set to then create an algorithm to do A that same data set is not gonna be de-biased for creating algorithm B. Humans are biased. Let's get past this idea that we can strip that bias out of human created tools. What we're doing is we're embedding them in systems that accelerate them and expand them. They make them worse. They make them worse. So I'd spend a whole lot of time figuring out how to improve the systems and structures that we've already encoded with those biases and using that then to try to inform the data science. We're going about, in my opinion, we're going about this backwards. We're building the biases into the data science and then exporting those tools into biased systems. And guess what? Problems are getting worse. So let's stop doing that. Thank you so much for your insight, Lucy. Thank you for being on theCUBE. Oh, thanks for having me. I'm Sonia Tigari. Thanks for watching theCUBE. Stay tuned for more.