 started. Thanks for being with us today. I'm Cliff Lynch. I'm the director of the Coalition for Networked Information and I welcome you to the last of the project briefings for the second day of week two of the CNI fall 2020 virtual member meeting. And we have quite a quite an interesting session to bring the day to a strong conclusion. A couple of things before I introduce our speakers and describe the session a little bit. We are recording this. The recording will be available later. There is closed captioning and please turn that on if that's helpful to you. We do have a chat running and feel free to use the chat to comment or to introduce yourself as you wish during the session. We have four presenters and at the end of the session Diane Goldenberg-Hart from CNI will be managing a question and answer, moderating a question and answer session. You have a Q and A tool at the bottom of your screen and you can queue up questions at any time. Please feel free to do that as they occur to you and we will take all the questions at the end. And with that let me introduce our presenters, Mary Feeney, Professor Anita Huizar Hernandez, Jeffrey Oliver and Megan Sensensi. This is a very multi-perspective set of speakers which is really appropriate for a multi-perspective topic like this. We have a little bit of many things that CNI are interested in woven together here. The creative use of the digital corpus as part of pedagogy in the classroom, the application of data science techniques to interrogate and explore this corpus as part of a part of a digital scholarship activity, engagement across libraries, faculty and other groups within the institution and I think that everybody's going to enjoy this greatly. With that I thank our presenters for joining us and I'm going to hand it over to Megan who will lead off the presentation. Thanks for being with us and thank you all. Thank you Cliff. So can a library's distinctive collections activate and invigorate faculty networks for collaborative pedagogy? Can our collaborative efforts be mutually beneficial for librarians, disciplinary faculty and their students? Can we introduce diversity concepts into a wide range of undergraduate classes in the humanities and humanist sciences by converging all of our efforts around a single shared data set? These were some of the driving questions that were motivating our team to apply for a collections data grant last fall and a year later I am pleased to report that all signs point to us. Today we'd like to share with you four different perspectives from our newspapers' data project. I will provide a general overview and I'll situate our work within the broader context of the collections as data initiative. My colleague Mary Thaney will discuss her experiences managing the projects and also provide some additional context for our selected collection. Jeff Oliver will describe our approach to creating what we call a walled garden for text analysis by processing the underlying data and create user-friendly computational environment. And Danita Huizar Hernández will reflect our experiences as a faculty member who's been embarking on a collaborative digital pedagogy project in the midst of a pandemic. We will close with some thoughts on the sustainability of the outputs of the project, some future plans for using these data in the classroom, and some opportunities to replicate this model in the future. Many of you have probably been following along with the two major collections as data projects that have been led by Thomas Padilla in collaboration with a number of his library colleagues. Both collections as data projects have been focused on fostering approaches to developing general heritage collections that support computationally driven research and teaching. In collection data part to whole, the project initiative by awarding funding project teams committed to developing models that support collections as data implementation and holistic reconceptualization of services and roles that support scholarly use. So at the start of 2020, our team was awarded one of six grants for cohort two of the part to whole project, which is generally silly funded by the Andrew W. Mellon Foundation. Each of the funded project team's proposals were required to demonstrate commitments from librarians, administrators and disciplinary faculty to establish use and implementation models, as well as a new collection is data. Further emphasis is given to data that represent and amplify the perspective of minoritized groups, which are often sorely underrepresented in library collections. We're one of two projects that are focusing on multilingual newspapers, and I believe we are unique within our cohort in choosing to focus on pedagogical uses of collections as data. To understand a bit more about who we are and what motivates us, I'd like to share a little context about the University of Arizona, located in the city of Tucson, 65 miles from the US-Mexico border. We reside on the ancestral territories of the Atom, more specifically the traditional homelands of the Tahono Atom, which are historically shared with the Yom of the Pasquayaki tribe. As a land grant institution, we serve 22 federally recognized tribes within the state of Arizona. In 2018, the university was designated as the Hispanic serving institution. And in the libraries, we acknowledge how important it is to a student's sense of belonging to see their culture and heritage represented in our collections. We also recognize that we are located at a centuries-long site of colonial contact and conflict, which is witnessed border drawn and redrawn through the 19th century with the Treaty of Guadalupe Hidalgo and the Gadsden Purchase. The stories of this region when fully told are composed of indigenous voices, canyx voices, and the voices of Spanish and Anglo-Settler colonists and Mexican citizens and American citizens and many more. The University of Arizona Library's special collection seeks to amplify the voices that are so often underrepresented in these stories through our dedicated focus on the border lands. So when the collections' data call was released, we naturally turned an eye toward our border lands collections. And we quickly honed in on one in the historic Mexican and Mexican American press collection and another set of regional papers that were digitized with funding through the National Digital Newspapers program. Mary will share a little bit more about the rational behind our selections and the idiosyncrasies of our data. But for now, I'd like to note that what we really liked is the idea of making these resources more computationally accessible for a range of different reasons. First, they're a vital part of our regional history. Second, they demonstrate the cultural and linguistic diversity of the region. And third, we were already aware of a lot of faculty on our campus who are actively using these resources for their research and teaching. In prior documentation of collections' data implementation models, we had observed the University of Miami's approach to making more accessible by organizing and sharing plain text files through a GitHub repository. And they were looking toward facilitating text analysis with common digital humanities tools like AntConc and Voyant. We were also interested in the possibility of using text analysis methods to explore our collections. So with all of this in mind, we convened a group comprising library specialists from our research and learning department, the Office of Digital Innovation and Stewardship, and special collections, along with five disciplinary faculty members that we knew had already been working with these collections. And all together, our group began brainstorming our potential collaboration. We immediately liked the idea of using that single-shared dataset in different ways. And we liked how the dataset served as a point of convergence and also as a catalyst for cross-pollination. None of our disciplinary faculty self-identified as digital humanists, but everyone on our team was really eager to explore new methods and technologies. Several faculty in our group are responsible for teaching methods classes within their program, and others expressed interest in incorporating a module on text data mining in courses that cover subjects that benefit from the use of historic newspapers. For the librarians focused on data science and digital scholarship, this represented a golden opportunity to introduce data literacy concepts into a host of new classrooms. And as we continued to refine our idea, several requirements emerged. We needed a solution that could process messy newspaper OCR that's optical character recognition in both English and in Spanish. We needed to integrate flexibly into courses in history and rhetoric and composition and journalism and Spanish and Portuguese. And we also wanted to introduce basic text data mining concepts with minimal technical troubleshooting. So based on these requirements, we ultimately chose to pursue a technical implementation model centered on the use of Jupyter notebooks, and Jeff will share more on that shortly. After a flurry of really enthusiastic brainstorming about all the different potential use of the data, some of which we ultimately end up as new collaborative ventures within the group, we decided to focus for our classroom purposes on text analysis activities that track trends over time. Now at the start of 2020, we had no idea what trends would emerge over the course of our project. So I'd like to hand the presentation over to Mary now who will talk about managing the project, and also speak a bit more about our selected collection as data. Great. Thanks, Megan. Hi, everyone. Before I get into more details about the data set, I do want to talk a little bit about our project team and discuss some of the challenges with project management. Next slide, please, Megan. So I'm the news research librarian and liaison librarian for history and journalism. And as Megan mentioned, I'm also the project lead, but I want to emphasize what a truly, truly collaborative project this has been. We, as Megan said, we have colleagues from different departments in the library. We have faculty participating from four different academic departments, some of whom you see pictured here in our pre-COVID times, obviously no masks and sitting right next to each other. So this is pre-COVID times, but that's brought different perspectives and knowledge to this project, and it's been really exciting to see what we've been able to do together. Next slide, please. So managing the project was met almost immediately with the challenge of the pandemic. The UA campus closed in mid-March soon after we had officially started, and we had to reconfigure activities such as an in-person workshop for the disciplinary faculty that was to be held in the spring, classes being delivered wholly online in the fall, and a student symposium in December that's been reconceived as a virtual asynchronous presentation in January. We also needed to make adjustments to the grant budget and make multiple contingency plans back in the spring when we didn't even know what the fall was going to look like. So for example, some of our budget was for in-person events, bringing in a speaker and travel to conferences. Well, most of that's not going to happen, so we had to think about how to adjust our expectations, plans, and where the funding would go. I just want to note how flexible and supportive the collections of this data, part-double grant administrators have been through all of this. It's been amazing. There was also the challenge of staying focused and on track during a stressful and uncertain time, from scheduling to Zoom fatigue, working from home in different circumstances, and the university mandated furlough and pay cut on top of all that. But luckily, we had all met together several times during the proposal phase, so we had established a good working group. And this is a really engaged group of colleagues who have now been meeting monthly on Zoom. Next slide, please. So now about our data set. Next slide. So newspapers are used in research across many, many disciplines, whether as sources of current news, as corpora for content analysis, or as primary sources about past events, places, people, and topics. Newspapers have also been used for computational analysis to reveal trends across time and across publications in projects such as Viral Texts, America's Public Bible, and Mining Dispatch, just to name a few. Next slide, please. Our data set is comprised of selective newspapers, as Megan has mentioned, from the UA Library's historic Mexican and Mexican American Press, which you can see the webpage here. And that's a digital collection we created several years ago, about eight years ago. And then we also had newly digitized Arizona newspapers from our partnership with the State Library of Arizona through a national digital newspaper program grant, just that ended last year. So there were several digitized newspapers we could have used from both of those collections. In our recent NDP grant partnership, a real big focus of our grant proposal was to add newspapers from underrepresented communities in Arizona. So that also became a large part of our focus for this project. Next slide, please. So you've seen this slide before. Titles include newspapers from African American communities in Phoenix, Arizona, Spanish language newspapers from Phoenix and Tucson. And then for comparative research, we also included newspapers from predominantly white English speaking communities, all published in the Southwest Borderlands. We also decided to focus on two time periods, 1915 to 1922, and 1941 to 1959, and explore topics within those time periods such as women's suffrage, the Mexican Revolution, the Bisbee deportation, the 1918 flu pandemic, immigration, and World War II. We chose the newspapers for this data set for a few reasons. They're all published within the Arizona Borderlands, which we really wanted to focus on and highlight, as opposed to all the other newspapers that were available to us, published in other parts of the state. And two, they enable students and scholars to engage with voices that were previously unavailable in digital archives. Some of these newspapers had never been seen outside of microphone and physical collections. And then also, thirdly, through our collaborative discussions with the disciplinary faculty during that brainstorming phase about what they teach, what courses they teach, we were able to narrow it down together to these eight newspapers. Next slide, please. Now I want to talk a little bit more about each of the papers and just to give you a feel for what we're working with. And to do that, I'm going to show you pieces of two different tutorials created for the project. These lessons, along with other information about the project and links to the data set that Jeff's going to talk about are all available on our project website, just freely available and open to use. So part of that page you can see here, along with the URL to access it. So one lesson was part of the workshop that the library faculty delivered online for the disciplinary faculty in the spring. It was meant to be in person. We ended up doing it over a period of a few weeks on Zoom. The other lesson I'm going to talk about was created for the students to learn about the newspapers prior to actually using them for text mining. Next slide, please. So if we were live, this would be an interactive timeline, but these are just some screen images. So in this lesson, the timeline shows each of the newspapers in chronological order of when they first started publication. And I'm just going to zip through fairly quickly through these, but you can learn about more about each title if you're interested from our website. So this first one is the border of a debt. It's a white-owned newspaper published in Nogales, Arizona, which shares a border with Nogales, Sonora, Mexico. Next slide, please. And then next, we had the Bisbee-Daler review, also a white-owned paper which was published in what was a booming mining town at the time. Next slide, please. Here we have El Tucsonense, which was really the cornerstone of the Mexican American Press digital collection, and it's also the cornerstone of our collections as data project. It was the longest-running Spanish-language newspaper in Tucson, and it was published for most of the first half of the 20th century, from 1915 to about 1959. This was an important newspaper in the Mexican American community in Tucson. It covered local, national, and international news, and it carried advertisements for the many Mexican American-owned businesses in Tucson. Next slide, please. Another newspaper we've included is the Phoenix Tribune. This is the first newspaper in Arizona published by and for the African American community, and it started in 1918. Next slide, please. El Sol is another Spanish-language newspaper we've included in the dataset. This one was published in Phoenix for many years by a married couple who were very involved in the community, both civically and politically. Next slide, please. So another newspaper of the African American community in Phoenix was the Arizona Sun, which was published for more than two decades from 1942 to 1965, so it covered a really big chunk of mid-20th century Arizona history. Next slide, please. We've also included this newspaper, The Apache Sentinel, which was only published from 1943 until early 1946, for the Black soldiers who were stationed in a segregated unit of the Army at Fort Huachuca, which is located only about 15 miles north of the U.S.-Mexico border. And next slide. And then finally, the Arizona Post, which is self-described as the first Anglo-Jewish publication in Tucson, Arizona, started in 1946, and it's still being published today as the Arizona Jewish Post. Next slide, please. Now this image is just another representation of the newspapers in the dataset, so the same information that is in the timeline but represented in a different way, and this was from the lesson for the students to focus on the locations of where the newspapers were published and emphasize their place in the Arizona borderlands, since that was such a key part of what we were doing. So as a student would scroll through the story map to read more about each newspaper, they can see where it was situated in southern Arizona geographically and in relation to the other newspapers. So you can see in this newspaper dataset, we have a range of dates, a range of geographic points within southern Arizona, the borderlands, and also voices from some of Arizona's diverse communities. Okay, next slide, please. So we recognize going into the project that there is important context for using newspapers just generally in research, but more specifically some inherent issues associated with text mining. And we thought that was really important to talk to both the disciplinary faculty and the students about these limitations. So we included that in our discussions and in our tutorials. Next slide, please. So for example, in the broader context of newspaper publishing in Arizona, the newspapers in our dataset are a small subset. In 1915, for example, where our dataset begins, there were over 60 newspapers and periodicals published in Arizona, according to the the air directory. Our dataset includes three titles from that year. Our dataset is roughly 90,000 pages. So for context chronicling America, which I'm sure you're familiar with as a whole has over 17 million pages. We're also working with the mix of daily and weekly newspapers and papers of varying runs from a few years to decades and different lengths in each issue of each paper. So how would we account for these variations in analyzing the results of text mining? Next slide, please. Another aspect that we knew was really key for students to understand, and this is any time one is going to use newspapers for research. But of course, here they're looking at word frequency is what terminology is used in historical newspapers. Different words may have been used than what we use today, or terms may have meant different things at different times. But also historical newspapers reflect the language and attitudes of their time. So they contain sensitive, offensive, racist, and outdated terms and images. So how might that affect what students see in their text mining, and also what words would they need to choose to use to find word frequency? Next slide, please. And so finally, we considered the issues related to digitization and optical character recognition. You can see a little bit of a snippet from this newspaper front page from El Sol. So what the image looks like, and then on next to it is a clipping of the OCR text. And so we knew we were going to be dealing with misspellings because the text was misread. There are breaks and words due to narrow newspaper columns and other idiosyncrasies of OCR. So we know they're these errors, but while they're not 100% accurate, they still enables us to do useful searching and mining of text. But that is something else that we had to keep in mind while teaching the students about computational analysis. So now that you've gotten a quick overview of the newspapers in our dataset and more about some of the things we had to consider, I'm going to turn it over to Jeff. Thanks, Mary. So as Mary was talking about, we have this dataset. And in order to actually start using it, we needed to do a little bit of preparation. So the next slide outlines some of the steps that we took to actually do this preparation. We have the next slide. There we go. So the data, as Mary mentioned, is that right now, when we started the project, they're in single text files, one for each page. And they're all hosted, at least the ones we have so far, hosted on the Library of Congress's Chronicling America project. But to address this challenge that Mary just outlined about making appropriate level comparisons between papers that were published at different frequencies, what we wanted to do is we wanted to create a file for each issue or each volume, whether that was published on a daily basis or a weekly basis. Because what that can allow us to do then, it can allow us to actually calculate something, the relative word frequencies, for an individual issue. And I can talk more about that in a little bit of a little bit of time here. So how did we actually do this? The next slide outlines some of the steps that we took. We took great advantage of some Python libraries that allowed us to use the Chronicling America API. So we went from 90,000 individual page files programmatically retrieved those from the Chronicling America website, and then consolidated them into little over 12,000 issue files or files that represented the text from a single issue of a newspaper. We also made all of these data available on our recently launched University of Arizona Research Data Repository. This is great because while individual files themselves are not that large, the entire bulk of the dataset combined makes up for some hefty, hefty sizes. And so the Research Data Repository is a really nice place that we can host these large data files. I should also mention that all of the code that was created to actually do this retrieval and assembly is all available on the GitHub page we're using for this project. And I should say that Python made life easier. This is one of the challenges with these sorts of projects when you're code switching because the language I normally work in is R. And so I spent a lot of time creating problems for myself by forgetting how Python indexes different from R, but I managed to work through it. But one of the things that made this project a lot easier on the next slide is actually the API and the setup that Chronicling America has. They have a really well-documented API for navigating the collections. And then also, and sometimes more importantly, there's a very responsive support team. So there were a couple of questions I had about how best to use the API and the folks at Chronicling America were quick to respond and very, very useful for actually. Helping provide great solutions. So next on the next slide, so we've got our data. We've taken those scanned texts and converted them to text files. So what's next? So the next slide, we had to do actually a little bit of preparation ourselves. So the first thing is to actually train ourselves. So there were a few library-led workshops. So Mary led a great workshop on overviews of the collections. And then Megan provided a really great introduction and context for what text data mining is and what is it actually good for. And I think that a lot of folks didn't have that much exposure to it. And so this was really great for the rest of the project team to learn about this. And then, as Mary mentioned before, there's LibGuide created for the project, which is really nice. I think Mary might have undersold it earlier, but there's really, really good resources in that LibGuide that was created. And so the next slide is we had some idea now of what text data mining is. And the question is, how can we actually use these resources? How can we use this approach in the classroom? So we held a, if we go back one slide real quick, we had a great ideation session that was based on these three resources, the language of the State of the Union, a big data project on the 1918 flu, and then also text data mining on how the Bible is used in U.S. newspapers and public quotations. So these are examples of how text data mining was used in practice and really provided faculty partners concrete examples of how you can actually use text data mining. So from this, the next slide was, from this was, what are we going to do in classroom? So the goal was to do a text data mining using the Python programming language in this classroom. So that presents a couple of challenges. One is that a few of the students had any experience with Python. If any of you have had the opportunity to have a class where you ask folks to install something like Python, you know that it's not always the easiest thing to do, and it's even harder when we're in a remote modality like we are now. But thankfully, there's a platform called Jupyter Notebooks that allows you to create a environment that students can use and that we have a resource available to us through the CYBERS, the NSF CYBERS program, which is called Atmosphere, which allowed us to create virtual machines and host these Jupyter Notebooks. And that way, the students didn't have to worry about installing anything. The only thing they needed was a web connection and a web browser with, of course, the caveat that that's not necessarily given for everybody and that it still remains a little bit of the challenge for this. So this was our goal. And so in the next slide, what we had to do is we actually had to test things out. And so the faculty partners were really good sports about this because we sort of tested this workshop out on them, started with a two-hour workshop on text data mining in Python. It was very important to us to involve the faculty in this before we went to the classrooms because the faculty actually know the students' capabilities and what they are going to need in the classroom. And so it was really good. We had a nice two-hour workshop getting into text data mining. I should say that all of the lessons that we developed are available through a couple of hosting services. The MindBinder is used for most of them. And so these are actually Jupyter Notebooks that are up and available in public. So if anybody wants to get in there and use them and actually do some text data mining themselves on the collections, they're available for use there. And the faculty were really good for providing feedback for better improving these workshops. And so what we did is if we get the next slide, we took the feedback from the faculty and we streamlined lessons. So gone from a two-hour lesson where we did the crazy thing, started talking about regular expressions. We decided let's skip regular expressions for the one-hour student lesson. And we also restricted the dataset down to three years and only three titles for both sort of a pragmatic reason of only trying to wrangle that much data in the classroom setting. And there were also some cyber infrastructure issues that we were encountering that sort of kept the dataset somewhat small. So we delivered this workshop twice synchronously, so in classes while students were there. And then we also recorded the lesson because some of the students weren't able to attend. And we also wanted to have this resource available for anyone who missed the class. And this shorter lesson is also available through the MindBinder hosting service, which has proved quite useful. So the next slide, though, because it's 2020, things happen. So one of the things that happened is that the resource we were using, the atmosphere platform, set it to refocus what they were going to support. And so it looked like we probably were not going to be able to use that resource to host our lessons. So after a mild panic on my part, a couple of my library colleagues, the research data team of Fernando Rios and Chun Li, suggested another option, which was to host the notebooks to put them on GitHub, which they were already on GitHub, but to link them into a service called Binder, which provides basically a free hosting service for lightweight Jupyter notebooks. And so we did that, and it was great. It actually ended up being a little bit easier to work with than the initial implementation. So that's working great. And the next slide, what we did is after the lesson that we went through with the students, we also created one more Jupyter notebook, which was sort of a walled garden. And what it is, is it allows the students to do some sort of canned analyses. They still are being asked to do things like manipulate Python code, change variable settings. So to some degree, get their hands dirty with some programming, but not asking them to do something like write a Python program from scratch. It wouldn't be fair to the students. And I don't think it would be very successful learning outcomes. The other thing about this walled garden notebook is that it did allow access to all of the data. So all eight titles for all of the years that we have data. What we did is we just embedded code that used the research data repository's API to actually retrieve those data. Because as I mentioned before, the files may not be large, but the entire dataset is, and it's a little bit too big to be asking GitHub to hoist that lift. So we just have the retrieval of the data be a dynamic part of the lesson. And those canned analyses, there's three things that the students were allowed to do. They're allowed to look at relative work frequencies over time for a single newspaper. They could also do the same thing in a pair of newspapers if they want to do a comparative analysis. And then you can also do comparative analyses looking at different terms through time. And I would just mention that we're looking at relative work frequency. So rather than just counting the frequency, so counting the number of times a word occurs, what we do is we correct for the length of that issue. So ultimately we're getting the proportion of that issue that is dedicated to the term of interest. And what this does, this allows us to compare proportions across different titles or different terms, rather than relying on single counts. And it also doesn't, we sort of don't have the issue of the differences in lengths between newspapers. So a weekly newspaper may be much longer in number of pages than a daily paper, but by looking at proportions rather than rock counts, it allows us to not have that problem with making inappropriate comparisons. So this is great. Of course, in the next slide, this is actually a quote that we see, we saw from one of the students, because again, it's 2020. This hosting service binder, the problem was is that we were maxing out the memory. And so I went to the binder community, and this is the important part where you really rely on the community, they suggested another hosting service, which is provided by the Liebkin Institute for the Social Sciences, which allowed us for a much, much larger ram on the notebook. And it's really allowed things to go a lot further. And one of the things that I think this quote does a really good job of highlighting is it really demonstrates this need for more to see in our educational opportunities, because I think it's one of these problems where the students are, to some degree, petrified of breaking things. And in reality, they're not breaking things. It actually was great because it revealed a bug in our code. But I think, unfortunately, a lot of times when faced with things like the Python programming language, immediate responses apprehension. And so whenever we can build in opportunities in classrooms to engage and provide accessible learning opportunities for these sorts of resources, I think it's a great opportunity to take advantage of. And that's one of the great things about this project. So at this point, the next slide, I'm going to turn it over to Anita to hear about how this has actually been going in practice in the classroom. Thanks, Jeff. Okay, so next slide. I'll just talk a little bit, bridging sort of what we just covered that we as faculty were also students. And so we had some good lessons learned, I think from that in terms of how we're going to implement in the classroom. So one was just navigating the challenges of a virtual environment. We had originally imagined these workshops as happening in person. And so I think by us experiencing what it was like in a virtual environment, we would have a better sense of how to repair our students for that experience. The faculty that are involved in this come from history, English, Spanish and journalism, none of us have a background in computer science or information science. And so there was a learning curve for all of us in terms of new vocabulary, and just building confidence with these new tools. And so I think being confronted with so much new information so quickly as disciplinary faculty helped us get a sense of the degree to which we would be needing to put our students minds at ease, so that they would be willing to approach this kind of tool. Next slide, please. So we had a number of challenges that were COVID related in terms of remote teaching. So the course schedule was really shifting rapidly up until almost the last minute. One of our classes didn't seem like it was going to make. I know I was originally supposed to implement this in an upper division, like much more special topics type course, and then that class didn't make, so I needed to do it in a very different sort of more general course. And so imagining exactly how we would be able to scale this, depending on the topic, was something we needed to be flexible about. Also, we ended up implementing this as a group in a variety of synchronous and asynchronous courses. So in some courses, they were able to build in a workshop that was during class time, but in my case, my course is entirely asynchronous. And so that was an option, but students more often relied on watching the recording. We also have a mix of levels, intermediate and advanced undergraduate as well as graduate. And so the ability of those different levels of students to sort of adapt to COVID and what they were struggling with, I think varies. The synchronous virtual workshop, of course, we needed to sort of reimagine. And then I just wanted to add student fatigue and instructor fatigue. I think a lot of folks are just kind of running on less energy. And so being really sensitive to students and the way that they perhaps would feel more overwhelmed at this particular moment, then they would have otherwise was really important. And I think for me personally, approaching the students was just a real era of generosity and being flexible, even in my case with deadlines and things like that, but making sure to foreground that this was an exciting opportunity for the students was really important. Next slide. So this is a quick list of the classes that were involved. We have a graduate level, rhetoric course, two different history classes, one in public history, and then one the natural history of disasters and journalism course. And then the last one is my course, which is an introduction to literary analysis. Mine is the only course that is entirely in Spanish and not in English. And so in these classes, depending on the course, the way that we decided to implement this particular focus really varied. So for example, in the journalism course, and the upper division English course, this was a semester long project that the students are really able to get into more depth with. And the other courses, this is more like one particular assignment or one particular unit. In my case, it was one sort of mini module. And then the students were all required to do an oral presentation and had the option to also explore this in more detail in a final paper. And I had four students out of 30 who decided to explore it in more detail. So I will speak to my experience of how this is going. So I am teaching an asynchronous online course. It's introductory introduction to literary analysis. This is the final class for Spanish minors, of which we have thousands at the U of A. And it's also the gateway course to Spanish majors. You have a mix of folks who some are interested in pursuing more upper division courses in literature, but a lot of them, this is their final course. And their first course, it's really focused on more literature and cultural production, as opposed to grammar. And the typical structure of the course is narrative poetry, theater and film, it's divided by genre. And so in my course, I also had an extra module in the middle between poetry and theater, that was newspapers as data. And the students were a little bit thrown by the different focus and scale of this. And so I tried to scaffold it with them, which scaffolding is so hard with students in an asynchronous online environment. I mean, you send them messages, course announcements in our learner management system and emails, but you don't really know if they're reading them. But I tried to do my best to maintain a really positive tone and to couch this as something that was a break from our textbook and from the other sorts of activities that we were doing. And this was an opportunity for them to do something more hands on. And it will be more closely relevant to their own experience in terms of this is something about this particular region. And in my course, as well as some of the other courses, we decided to focus specifically on the 1918 pandemic, since that time period was covered in our dataset. And so for their assignment, they were tasked with making an oral presentation, which is always an assignment in this kind of a course. So they made a voice over Adobe Spark video and they used the Python, the Jupyter Notebook and wrote like the very light Python coding to create three different images. And I left it pretty open for them to create images, depending on what their particular interest is looking for words that they thought would maybe show something interesting about the 1918 pandemic. Okay, next slide. So I have a few examples here from some student presentations. So the first one, all of our all of my students looked at El Tuzonense, which is a Spanish language newspaper. In some cases, they compared El Tuzonense to other newspapers as well that were written in English, but they had to at least have one visualization from El Tuzonense. So this person searched influenza gripe en gripe. That was really common. A lot of the students chose these terms. A lot of students also chose the term máscara, which would be mask in Spanish. And so sort of the the terms that you would expect in their oral presentations, if we were to listen to to the section of their video, a lot of them discussed how impacted they were by seeing multiple waves in the visualization. I think, especially for us here in Arizona in this particular moment where we had a very high number of cases in the summer and then things had decreased. And now, as with the national trend, we're trending upwards in terms of COVID cases again, seeing that that had happened previously was really meaningful to the students that this had a precedent, the sort of like wave structure to the pandemic. Next image. So the student also looked for the word mortalidad and I thought they had a great discussion of how this is useful and not in terms of a term. So mortalidad is a cognate mortality in English and they were interested in how this did and did not really follow the pattern of the previous slide and comparing those two. There's a bit of a lag, a bit more of a spike and how you wouldn't necessarily be able to connect this particular word, which could be applied to all kinds of different things to the pandemic specifically that you would have to be careful in terms of relating that causality. So I thought this was a good example of students thinking through sort of what's at stake in terms of text mining. Okay, next slide. Then I also had a student who looked for the term revolución. So 1910 to 1920 is the Mexican Revolution or the height of the Mexican Revolution. And so this student was wondering about the comparison between the pandemic and other important events that were happening at the time and sort of the frequent, the relative frequency of how much those different things were discussed. And so I thought this was a great example of another topic that would have been really on people's minds and that ultrasonics did discuss a lot. Next slide. So in addition to the three visualizations that they needed to include and talk about in their oral presentation, I also had students answer these reflection questions. And I was really impressed with number one, the advantages and limitations of text mining. It really showed that the students had paid close attention to the wonderful lessons that Mary and Megan had put together. And then just the clarity of those lessons, because all of them were able to articulate, I think what all of us were wanting them to understand in terms of this is a great tool, it is really useful in all these different ways, but also you need to contextualize it. Distance reading doesn't take the place of close reading and sort of thinking critically about all the different factors you would need to take into consideration before you could say something really definitive from doing sort of just one visualization. And then in terms of their experience with text mining, I thought this was a really interesting question. The vast majority of them were surprised by how approachable using the Jupyter Notebook was once they got into it. Everyone universally, I don't have any students who had experience with coding, everyone universally said that when they opened it, it was terrifying and they had no idea what it looked like. And when they heard that we were going to do this, they were just overwhelmed. But once they actually got into it and watched the recording of the synchronous workshop that Jeff offered, I had one student attend the synchronous workshop that was optional, but everyone else watched the recording. And going through the recording, which was paced really well and was completely at their level, I think they were really surprised that they were able to do it. That said, I did have I think 10% of students who just said this was really difficult for them. And even with the recording, even with the walled garden, it was just a struggle to do this particular assignment. And those students remarked that they thought it would have been easier had they been able to do this in an in-person synchronous class. And then they would have felt more comfortable. And then other applications, you know, students had lots of great ideas, social media, contemporary newspapers. One student brought up privacy, which actually started a kind of lively debate among the students because they have to comment on each other's oral presentations. So that was interesting. I thought that students really disagreed whether or not doing text mining on social media, for example, was a violation of a privacy or not. And with that, I'll turn it back over to Megan to conclude. All right. So we are closing out the fall 2020 semester in the next few weeks. And we're already receiving informal student feedback on their experience, much of which is awesome. And we are interested in conducting assessments both within and across each course that participated in our project. So we will be distributing feedback surveys with one set of shared questions alongside with more course specific questions. We had originally hoped to conduct an in-person student symposium at the end of the semester, which Mary mentioned, where select students could present on their projects and their experiences in conversation with the project team. But in our transition to conducting the project to a fully remote environment, we are opting instead to hold a virtual symposium, which will happen in January of 2021. Maybe a little less awesome. I think that's not what we had hoped to see. But our goal is to further reinforce the learning experience, offer students an opportunity to present their findings, and then also bring students from lots of different courses into conversation with each other. So we are really excited to better understand how this form of cross-curricular engagement could enhance student learning and inform our approach to continued collaboration moving forward. Following the symposium, the project team will produce a white paper. It'll document our use implementation model, and it will also reflect on our project outcomes and avenues for sustainment. At this point, we see several paths forward. We know that our existing data set and computing environment can be permanently integrated into future iterations of the courses that were taught this semester. Our faculty partners have already expressed some plans to use the data in some of their other courses that they have in their portfolio as well. Some of them as soon as spring of 2021. Some members of the project team are also looking to incorporate use of the data set into their research agendas, which we're looking forward to seeing more of. Over time, we can create new notebooks and we can leverage that same digital infrastructure to apply different text mining techniques or apply the techniques we have to different data sets. The foundations of our project support that kind of incremental and modular approach to growth, which we think sustains components of the overarching project rather than maintaining the initiative in its current state. But we do ask ourselves what it would take to reproduce this model around a different data set with different collaborators either here at the University of Arizona or at another institution. I think the success of this project is really grounded in identifying a collection that we already know has strong faculty use. We took a birds of a feather affinity based approach to bringing people together around the collection and discussing what we might do pedagogically. We know this work requires faculty to engage with the libraries regularly to commit to learning new methods and to revise their pre-existing courses. So with project funding, we were fortunate to be able to provide some similar salary support for each faculty member. And we were able to buy out some of our librarians time for managing the project and preparing the data and the environment for us. But it's important to acknowledge that the labor involved, all of that labor for all of the people involved. And if we were doing it without funding, we would need to come to agreement regarding expectations and commitments over the course of a year-long initiative. From a technical perspective, this project works really well with text-based collections that have been digitized and OCRed ahead of the collaboration, which sets you up at the starting point for moving toward processing and organizing text trials for analysis. The work of building that walled garden for classroom-based analysis can occur iteratively in communication with faculty during the summer prior to implementation as faculty are looking at adjusting their syllabi to account for this project. So at this point, we'd be especially interested to know what questions you might have about implemented model and other contexts and what other questions you have about the project so we can improve our own documentation and our assessment work as we wind down the project. Thank you. Thanks, Megan. And thank you to all of our presenters. That's a really interesting project and it's great to hear all the different perspectives too. Anita was really fascinated by some of your reflections on the classroom experience. So thank you for sharing all of that. And thanks to all of our attendees. If you have any questions, please type them into the Q&A box now and we'll be happy to field those. If you'd like to join us up here, raise your hand. I can unmute you. You can ask your question live or make your comment live. And so, Megan, you did say that there are some plans for some faculty members are already talking about using this in future semesters and some are even thinking about applications for their own research. I thought that was really interesting. And to Anita's reflection about a percentage of her students commenting on the difficulty of the tools, are you hearing similar things from other faculty members who've already used, who've been using that through this phase of the project? Anita and Mary could chime in on this. One thing that's been interesting that we've been hearing is that students have actually having something that they felt they could do hands-on, especially with asynchronous courses being what they are and being remote. Some of the stuff might have been more hands-on, like an archival visit to special collections or something like that haven't been happening at great. So that was one of the things they picked up on. I think Catherine Moore is the faculty member who called that out to us. But do you have any other, or Anita, any other from our other faculty colleagues have been reflecting on with that? Just a big challenge. I think part of the difficulty for some students is internet access, which I think nationally we're dealing with. Students are living in really different kinds of situations in terms of stable internet. Just this morning I couldn't meet with a student because her internet connection wasn't reliable enough. So I think it's not even necessarily the way that this lesson is set up, or the Jupiter notebook, or the synchronous workshop. I think, and that's why I wanted to call out student fatigue, that it's a lot of other things as well. Like the normal student resiliency, and I find our students who have to be particularly resilient normally, is not there. And so when they're dealing with multiple sort of barriers, technical and otherwise, it's easy to get discouraged. And even despite that, the fact that it was such a low number of students in my case, I found really amazing. And even the ones who did find it difficult by the time they worked through it and did create something, you know, enjoyed that it was hands-on, and I was meaningful to them. That's so interesting. Mary, did you have something to add to that? Well, I was going to mention that I sat in on one of a couple of Jeff's, the synchronous workshops that he did, and you know, the students were really, even though when they were nervous, were really excited. I think that does point to the hands-on aspect, but also this one of the sessions with the graduate archival student, and kind of, I mean, I was just really impressed with what they were already picking up on. So that was exciting to see. It looks like we have a question from, a different question. I don't know if you want to address that. Sure, let me just read that aloud here. Thank you. Sure. Nathan asks, wait, comments. First of all, what a project to see through during a period of classroom difficulty, modalities, fatigue, et cetera. What do you think will change for future iterations of the project once the limitations of the pandemic lift? It's a really good question. I can say that for me, pedagogically, I usually teach upper division 400 level special topics courses. And so, and that was what I was originally planning to implement this in. And I am normally one of those faculty members who take students to special collections. So the way I had originally conceived of this was doing a real deep dive in terms of close reading and distant reading and how those can complement each other and how each sort of needs the other to have a more complete picture of what you're looking at. And so, in a future iteration, when we are able to be in the same room and go to special collections and do all of those things, I'm really excited about doing more of a semester long project and also opening it up to other topics. The 1918 pandemic just really spoke to students. It's on everyone's mind and they just really wanted to know, I think, because life eventually did go on after that. And they found it really comforting, I think, to see that if you go far enough in the data set, there are no more waves. And I think they really needed that. But I'm interested as we go forward to return to some of the other topics that we had originally built into this proposal like the Mexican Revolution, World War II, the Bisbee deportation, various other local events, but still holding on to the pandemic in some way, but maybe in conversation, like I mentioned, that one student talking about the revolution and the pandemic in conversation with other events as well. Yeah, and I would just add the other thing that I would love for us to be able to do in future iterations that we had planned to do was sort of this culminating in-person event, because it was going to give us a chance to have students interact across classes. We were going to have a speaker come in, and it's just always nice to be in person, right? So hopefully that's something that we'll see in the future. But of course, this was grant funded. And so I think that one of the big questions for us, which Megan talked about is how do we continue doing this work going forward when it's just quote part of our regular work? So that'll be really interesting to see whether that's online or in person. There's that aspect of it. Excellent point. Yeah. So stay tuned, basically. This is the message. Yeah, but thanks for the question, Nathan. Yeah, absolutely. That was a really good question. Well, thank you all for your really wonderful contributions to the meeting. This was an excellent session and fascinating to hear about your project. And we are a little bit past time there, so I'll go ahead and close things down. And thanks, everyone. Thanks to our attendees and have a good rest of your day or evening wherever your time zone is. Many, many thanks. And particularly Megan and Jeff, thanks for doing the marathon. Have a great day. It was a delight. Thank you. All right. Take care, everyone. Bye-bye.