 My name is Malia Mnogulik and I'm presenting today with Marla Schluter, our study tools for determining equitable representation of women and LIS publications. A 2017 article, Gender in the Journals, Publication Patterns in Political Science, inspired this current study. It noted the underrepresentation of women authors in 10 top political science journals compared to the number of women in the field of political science. We wanted to know if there was a similar gap in publication in academic librarianship, a profession dominated by women, as there was in a profession dominated by men. These are the three main tools we used for this project. EBSCO host integration toolkit and OpenRefine enabled us to obtain data from EBSCO's API. Once it was in OpenRefine, we were able to easily reorganize and clean it, and R made it possible to predict author genders and collect and analyze statistics. Our first step was to determine the top journals in the field of academic librarianship and get citation data. Together, these journals cover major subfields of academic librarianship and are fairly prominent. While we could have obtained this information through our library discovery tool or publisher websites, getting all the articles for every issue for every year for 15 years for 10 journals manually would have been laborious. So we knew we had to find a way to automate getting the data. The precedent study gave us some ideas on how to do the project, but we made some changes. Both web scraping and using an API are different ways to automatically collect citation metadata the articles we were interested in. We used EBSCO host integration toolkit because through EBSCO we have access to all of the journals we are focused on. This is a tool that helps users make requests to the API. The info page provides access to lists of the shorthand ways to refer to things like ISSN or publication dates, so a properly formatted request can be made. The search page is where requests are made. The parameters of the request are plugged in and it can be used to pull up the records in question, but really the URL it forms is what we were interested in. Once it's clear how the fields from the search page are formatted for the URL, the URLs can be tweaked for all of the records and be used independent of this tool. We used the URLs to make API requests from OpenRefine, so the data would appear in an environment where it could be easily cleaned and reformatted for the next steps in our process. This is a view of OpenRefine. After we obtained the data, some cleanup was inevitable. Our automated method meant that data was structured in XML instead of being in a table. Again, this is why we used a cleanup tool capable of parsing our XML fields into their own columns. Eventually, even article rows with many authors had a separate column for each one. Once the project was edited to our satisfaction, we used the extract button to collect the JSON script for all of the cleanup commands used on this project. The script was unsaved and applied to our projects for other journals, so they were cleaned quickly and exactly the same manner. We then pared down the dataset to remove non-scholarly work such as editorials, reviews, correction notices, letters to the editor, speeches, and table of contents to focus our study on abstracts, articles, biographies, case studies, proceedings, and reports. While getting the article data manually would have been too time-consuming, the same was true for obtaining author genders, since each article had anywhere from 1 to 20 authors. So, the process of deriving or predicting gender had to be automated as well. This was done in the precedent article by analyzing author first names to predict gender. This method predicts gender by comparing the first names in one dataset to another authoritative dataset of first names and associated genders. This is a commonly used method for determining gender, but it treats gender as binary, posing the question, what about non-binary individuals in academic libraries? The only source of information we could obtain on non-binary gendered individuals working in academic libraries was from an unpublished ACRL survey done in 2018. It showed individuals who identify with a non-binary gender made up 1% of the overall pool of respondents and those who preferred not to respond regarding gender made up 2%. Because our study is focused on the number of women in the field, a majority figure, we decided not to alter our method for the reported 3% value. This is one of the limitations of the study. One of the driving forces when considering the question of equity is not looking to see if there is an equal number of authors. Rather, it is looking to see if there is equal representation when considering the proportion of genders within the profession to proportion of genders of authors represented in the literature. While this cannot be a perfect one-to-one comparison, it can be one measure of equity in academic librarianship. ARL statistics are often used because they are produced annually and provide detailed demographic information. However, we would argue that they are not necessarily representative of the profession because they are only research libraries and as you will see, compared with both ACRL and ALA, there is a potential for overrepresentation of men. This is a view of our studio. In order to predict gender for the authors, we use the gender package in AR which has several data sets or methods available. We use the SSA method which contains United States Social Security Administration data from those born between 1880 and 2012. For each person recorded by the Social Security Administration, there is a birth year, a first name, and sex. So for each first name and birth year or range of possible birth years we pick, the SSA data set will return a gender based on the sex most often associated with that name, as well as columns that tell the percentage or proportion of individuals with that name who were associated with a particular sex. So the idea is to first build a data set that has all of the author first names from our full data set in one single column along with a range of birth years and then compare our data set to the SSA data set. After running the package using the SSA method, several columns have been added to our data frame. Gender, which is the predicted gender, proportion male and proportion female. Genders with a proportion of less than 70% or names that were not identified from the SSA data set were looked up by hand from online bios through authors affiliated institutions, which often included preferred pronouns. I'll be talking about results in three parts. The overall results. Next I'll show the results of authorship and journals. Finally, I'll show the results of one journal compared with other studies of journals and gender over time. We analyzed the results in three ways. The total sample, the sample with at least one author affiliated with the US institution and the sample with authors only affiliated with US institutions. We broke out the sample because the goal of the study was to compare gender composition of authorship with the gender composition of the profession. However, it was challenging to find international statistics to reflect the total sample. First, the total sample has a lower rate of overall women authors at 56%, which further decreases with looking at solo authorship at 54%, but increases with multiple authors with a woman as primary author at 58%. This is compared with sample articles with institutions with at least one author from a US affiliated institution. This shows an overall rate of 61%. In the same sample, 62% of women as primary authors and solo authors at 58%. The last sample, US only, overall women make up 62% of authors, 63% of primary authors. However, there is a 10% decrease when looking at solo authorship in this sample at 52%. There is another way of looking at the overall results, with the average of women in the profession added in. What is interesting about looking at the data this way is that solo author publications, no matter the sample, are lower than the overall sample of authorship. However, what the results also show is that US affiliation of authors does have an impact on the percent of women authorship, whether it be all authors or at least one author affiliated. This can be due to a number of factors we have not isolated, but could include, depending on geography, the non-US affiliated authors are coming from LAS programs rather than libraries, which could have a more even distribution of men and women if they are all similar to US institutions. It could also be due to the journals we have selected, which are more US-based, which would also skew results. No matter the breakdown or sample, there is no authorship combination that comes close to the average of women in the profession, which is 74%. Overall, we can see over the past 16 years, four of the journals have met ARL's gender proportions of 63%. However, none of the 10 journals in our study have come within 13% of ACRL's 77% reporting, or the average of 74%. We wanted to see if there was a difference within the time frame of the study since it was a large amount of time. This chart shows the results broken out into five-year time chunks to show changes over time of gender representation. Five of the 10 journals had continual increases over 15 years, four had initial increases then decreased, and one decreased over the time span. This journal, Library to High Tech, never reached 50% representation of women authors in our study. When we look at it in smaller time spans, there are more indications of improvement than just the overall results might indicate. In fact, library quarterly shows a significant increase in the last five years and reaches 70%. However, none of the 10 reached the threshold of 74%. We do get an increase of four to five journals that meet ARL's threshold of proportionality within the last five years. But overall, if the goal is equity, there may still be some work to do. Finally, we wanted to show an example of one journal over time with the incorporation of older studies. Though they use different methodologies for determining gender, it can show a pattern over the course of 30 years of increasing representation of women authors by 20%. From our work, CRL is one of the highest representations, though if we go by the profession, not entirely representative. However, we see a stall after 2014 after a big push in the 90s and the beginning 2000s. This is a trend we saw with four of the journals in our study. Initial increases and then decrease. There is more to be learned from the large sample we have. We are in the process of coding the articles by subject to determine the publishing trends, such as our women publishing on certain topics within certain journals more than others. We hope this will help get to some of the wise. The precedent article we discussed also had a series of subdiscipline articles by several authors if anyone is interested. We are also wondering if what has been reported about the impact of COVID-19 on faculty productivity and gender will impact LIS research. Thank you for your time and we welcome your questions.