 My name is Simon and I'm an undergrad in statistics and sociology at Reed College I'm also currently interning with our studio working on the tiny models our packages all the source code for the analyses the full paper in These slides is on my github at Simon p couch So imagine that a data set appears in the wild whatever that looks like for you and there's a column called gender in it What would you expect the most common entries to be? Let's suppose instead that the title of that data column was actually sex. Does that change your answer? In my experience those columns are almost inevitably filled with entries along the lines of male and female Regardless of column name this project tried to quantify how our conceptions of social difference are reflected in the data structures We choose as data scientists and package maintainers I'm just going to share results related to sex and gender here, but I discuss other social categories in the paper So this first figure I'll show gets at the question. I asked a minute or two ago Is there a relationship between the name of the data column and the entries inside of it? I would hope that columns called sex would contain labels that are understood to describe sex categories such as intersex female and male The same goes for gender. I might hope to see entries like non-binary woman and man To address this question I downloaded 2,500 packages from CRAN and searched for columns Describing sex and gender effects inside of the data they export Regardless of column name the entries in those columns are nearly exactly the same entries along the lines of male and female are dominant It turns out to that even though the entries in those columns are the same regardless of the discipline a data set comes out of Datasets intended for use in biological applications were more likely to refer to columns measuring sex and gender effects as Sex as compared to data sets not tagged as for use in biological applications Illuminating the supposed separability of sex and gender in the heart sciences. I know this was incredibly brief But I only have a couple minutes So I want to end with a quote from one of the many intersectional feminists whose work formed the theoretical foundation of this project Given the elevated social status of the fields we engage with as statisticians and data scientists It's especially important in the heart sciences that we're clear about what we mean when we refer to social categorizations We can't claim objectivity in our research if our practices as foundational and elementary as data structures do not reflect this nuance This is a quote from whiteness's property by Cheryl Harris When handling the complex issue of group identity, we should look to purposes and effects consequences and functions What must be addressed is who is defining? How is the definition constructed and why is the definition being propounded? If you're interested in learning more or wondering how on earth the data Underline that graph could have been generated. I encourage you to read the link to paper. Thank you