 Hi, my name is Casey Canfield and I'm an assistant professor in engineering management and systems engineering at Missouri University of Science and Technology. I'm here to talk about framing transparency as an ethical responsibility and PhD data science. I teach a graduate level course called advanced engineering management science. This is a new course that I developed when I joined the faculty in 2018. And so far I've taught it twice. This course was largely inspired by the applied data analysis course I took from Alex Davis at Carnegie Mellon University. For the students, this is their first exposure to statistics, data science and programming in R. It's a small graduate level course that is required for all engineering management PhD students. In the second week of class, when they're still getting their feet wet with R, I introduced the concept of ethics and frame the importance of learning statistics in the context of scientific integrity. We talk about how data science doesn't just predict the future, it causes the future, which is a quote from Kathy O'Neill who wrote weapons of math destruction. She talks about how the most dangerous algorithms are important, scalable and secret. I encourage my students to work on things that are important and scalable because that's how you make the world a better place, but it should not be secret. So if we want to avoid secrets, we want to be transparent. As Richard Feynman put it, scientific integrity entails bending over backwards to show how you might be wrong. We want to document everything and make it public. I introduced the concept of the nine circles of scientific hell. Neuroskeptic drew this comic inspired by XKCD and Dante's Inferno. The levels range from overselling in level two, which is related to overstating the results of our analyses, to inventing data in level nine. In class, we talk about how there's this gray area between unethical data analysis and misconduct that involves intentional deception. We want to do ethical data analysis, which involves not deceiving our audience with statistics. Sometimes we have to worry about accidentally deceiving ourselves. There's lots of research about heuristics and biases that influence human perception. For example, there's confirmation bias, which is a tendency to focus on evidence that is consistent with our expectations and hindsight bias, which is a tendency to perceive events as predictable after they have occurred. These biases affect all people, including scientists. Before class, I asked students to read Silverjohn's paper, Many Analysts, One Data Set. I really like this paper because it goes to great lengths to show how there are many different ways to approach a single data set and even the most conscientious data analysts can disagree. This emphasizes that there's a structural problem that can best be solved with transparency. The graph shown here is from their nature paper that summarizes this work. Basically, they had 29 teams analyze the same data to determine if dark skin players are more or less likely to receive red cards in professional soccer games. Of the 29 teams, 20 found a statistically significant correlation between skin color and red cards. While discussing this paper, I highlight the implications of drawing conclusions, effective uncertainty, and how this relates to how they are going to do data science. So to address this problem related to all the degrees of freedom that a data analyst has, we need structural solutions. This isn't about getting rid of the bad apples. We talk about all the challenges for reproducible science and how there are opportunities to shift the environment in which we work to be more transparent. For the purposes of the class, we focus on developing skills related to pre-registering studies, understanding methodology, employing checklists for reporting, and making our code readable by other humans. After class, I have the students knit their first R notebook and ask them to write a paragraph about which of the nine circles of scientific hell sounds like the hardest to avoid and why. Most students focus on overselling and post hoc storytelling. I'd love to hear your ideas about how to teach ethical data analysis. I can be reached at canfieldci at mst.edu. I've also included the full references here. Thanks.