 Good afternoon everyone. Welcome to session six. This is our last session of the day and this session is titled You Can't Access a Reference That Isn't There, Interventions that Promote the Persistence of Web-Based Evidence in E-T-Ds. This will be presented by Sarah Potvin, Kathy Anders, and Tina Budsize-Weaver from Texas A&M University Libraries, along with Martin Klein from Los Alamos National Laboratory. My name is Ellen Amatangelo and I'll be your moderator for the session. A quick reminder to please keep your audio muted and your video off during the presentation portion unless you are presenting or you're asked to participate. You can use the chat tab to ask questions, which will be addressed during the Q&A portion and I'll monitor those questions as well in case anything comes in while the presentation is happening. But yeah, that's all I have for my intro and I will turn it over to the presenters. Great. Hi. Let me go ahead and share screen. Sorry. I was hiding my... Can everyone see my presentation? Yes. So, hi. I'm Kathy Anders. I'll start by introducing us, all of us. So, we have a number of scheduling, holiday things come up. So, I'm going to present the presentation on our behalf. So, Tina Budsize-Weaver, Sarah Potvin, and I are all formerly of the Texas A&M University Libraries when the libraries had a big shift. We moved into academic departments. So, Sarah Potvin and I are in English and Tina Budsize-Weaver is in the School of Performance Virtualization and Fine Arts, and we have been working with Martin Klein, who is a computer scientist in the Research Library at Los Alamos National Laboratory. This presentation is building actually on a presentation we did two years ago, talking about reference wrought. So, a little bit of background that Sarah Potvin formerly worked in digital scholarship. Tina Budsize-Weaver was an academic librarian for performance studies and I was a graduate studies librarian in the A&M Libraries. So, what we are here to talk to you today is about reference wrought in ETs. And so, we'll start by saying, what is reference wrought? What is going on with reference wrought in ETs? Okay. So, reference wrought is a term that Martin Klein working on some projects helps to introduce. And it talks about two problems with kind of known issues in web-based materials. So, one is link wrought. Basically, where you go to a page and the thing you want isn't there, it's broken, right? You get a 404 or something like that. And content wrought, which is where you go to a page, but it's changed. So, if we want to talk about both link wrought and content wrought, both of which are issues in web-based materials, we talk about reference wrought. So, just to give an example of what we mean by wrought. Okay. So, this is an example of a page that has where the link has wrought it. If you go here, you get a 404 error. Sometimes you get a custom 404 error, but whatever the page was, whatever the content of that page was, whatever was there is no longer there when you click on that link. So, that is broken. Drift, on the other hand, is where the link still resolves, it still takes you to a page, but that page has probably changed over time. So, this was from an ETD we looked at, and this was a link to something called the Phlocteities Project, which I believe was a theater project for veterans. And so, this was the way it looked when the student was citing it, or as we're trying to do it. And this, if you go to the Phlocteities Projects today, is what it looks like. So, it's an online gaming site. Rather strange. Drift, but that's what happens. Oh, not gaming. Gamble it. Sorry. Different. Gamble it. Okay. Sorry. I don't know what happened. All of my slides went forward. Nope. Hang on. My bad. I don't know why it's going the wrong way. It's going to go real fast this way. So, what are some of the concerns with this, with reference rot in ETDs? And reference rot more broadly, right? We might think that the citation is there. And so, well, we know what it means. And if we think of citations as solely for the purpose of giving credit to a source, then that's unfortunate, but it's not perhaps as concerning as it is. If you think that you are losing records of an access to your sources and your data. So, if you think of references as a collection of sources, then that are being referenced in the scholarly work, and that there is some dependence of meaning between the scholarly work and the thing it's referencing, then losing access to that can be problematic. If the source is not there, then, you know, people trying to review the work later or work on reproducibility projects, their data is gone. And that's really an issue. In some cases, there might not be another copy of the material. So, I think a lot of times we think of web lengths as being kind of optional if they're to an article published in a journal or to a book, a Google book, because we think that there are archived online versions of that that are unlikely to break. So, sometimes that's true. However, sometimes that's not. Things on the web actually don't live forever. And so, particularly if you're looking at research from marginalized populations, particularly ephemeral works like blog posts or social media posts, there might not be another copy of the source that is the source that's being cited. And that leads to my fourth point. Part of why we started looking at reference rot in each of these, our current project stemmed from a previous project, which was to see what was going on with reference rot in A&M, E2Ds. And so, access A&M is a very large school. We have many, many students graduating each year. We're handling a very large number of E2Ds coming through. And so, we couldn't run this analysis on all of the E2Ds, but in our previous presentation at USEDEP, we were talking at the time, we were in the middle of doing this analysis, I don't remember the results. What Sarah and Tina and I did was we looked at our corpus of performance studies, E2Ds that were filed from 2012 to 2020. And so, we were trying to see which ones were perfectly functional in which the links linked to the thing that they seemed to indicate they wanted to go to and which ones suffered from reference rot. So, in that set of E2Ds, about 26% of the links were rotted. They didn't result, they were just broken. And just over 20% had what we called diminished functionality, which means that they suffered from some sort of significant drift where they weren't going quite to what seemed to be indicated in the citation. So, that, and now this is an admittedly small, small sample size. I don't think we can draw conclusions about the whole corpus from that, but it was, there was more reference rot than we expected there to be. So, with about 52% of the links being perfectly functional over a relatively short period of time, this was, this seems significant to us. Now, what we see is that this is going to continue to likely be a growing problem in E2Ds. This is an analysis of all of our, is an analysis of our entire publicly available E2D corpus. Some of our E2Ds are not publicly available, but you can see that there's an upward trend of the number of URIs, which includes a number of things, but our number of URIs that are in E2Ds being submitted over time. So, we can see at the very end when we get up to 2022, we have a pretty marked increase in the number of E2Ds that are submitting that have URIs in them. With that as a background, trying to understand, you know, what, what reference rot is, I think we, we, we felt like we established that it was an issue in E2Ds. Sarah, Martin, Tina and I decided to do a study to see what A&M students knew about reference rot and whether or not they were concerned about it. So, we ran two, two, two portions of our study. And I'll just say that our, our research was all IRB from A&M and Los Alamos approved. We kept all of our information confidential, only collecting demographic information about our students, but no identifying information since we don't have any student numbers or names or anything like that. So, we did two things. First one we did was we sent a broad survey to graduate students in February of 2023. This was just a large, qualified survey. And then the second part of the study was that we did two workshop interventions to try and teach graduate students about reference rot. And we did that in April of this year. We had it catered. It was about one and a half hours and we collected Korean post tests from the participants in those, in those workshops. So, I know we have a, I want to pause here for a second. Actually, I'm going to do this again. I know we have a Q&A session at the end, but I just want to make sure that I covered the topic of reference rot. So, there's comfort levels or understanding levels. Does anyone have any questions about what reference rot is? Is that a good answer? I'm not seeing anything in the chat, but yeah. Okay. Great. Good. Okay. Okay. So, just some of our initial findings from the survey is that, and I'm sorry, this is the text on the, on the side is small, but one of the things we found is that students are citing mostly web sources. So, these are, they might have appeared in print at some point as well, but they access them through the web. So, a huge, a huge number, a huge percentage of most are all of their sources coming from, coming from some sort of source that has a URL. They got it through the web. Students also know that, like sometimes things break. Not, not, not all the time, but a significant number of the students have at least at some point encountered some broken links. They, they know enough to know what they are when they tried to encounter something on the web. So, a very small percentage has never encountered, encountered a broken link or changed or otherwise become inaccessible. Like they, they can generally get to what they think they want to get to. So, they both know that they are citing web sources and, or they both are citing web sources and they know or have had experience at least with encountering a broken link. But, they have not thought about their links. So, even though they have this experience and they're doing this citing, most of the students either haven't thought about it or expect that the resources that they cite will be available in five years of the same URL. So, they're not, they're not necessarily taking what they thought about up there and coming in here. Now, I should say that this was a very broad survey sent to, I want to say around 12,000 or more graduate students. So, we picked graduate students at the college station campus who, who filed ETs. So, we didn't include campuses like our dental campus or our law school. And we didn't include professional schools. So, most of the filers are ETs, but the, the, this broad student population hadn't really thought about what was going to happen to their resources. So, that was our survey. And then we did this workshop. And part of our goal is to try and figure out how best to address this problem. So, how do we make graduate students aware of this reference fraud issue? And then, what do we think would be one way of addressing, could address it through getting graduate students to better preserve their own works to make permanent link options. Or another option would be to automate a system where, where links are preserved. So, when we did this workshop, trying to get a better sense of what students thought about these issues. We see this was a much smaller, much more specialized group. We think some of these students came, particularly because they were interested in the topic of reference fraud and web permissing. So, other students who came to our workshop, more of them either didn't know or thought the web materials on the web were permanent. But some did. And this is not exactly surprising given the sort of long standing narrative that like nothing ever disappears from online when people are thinking about their own online personas and their own online reputations. So, after the workshop. So, before we saw that the workshops raised some awareness because before most students didn't consider or a significant number considered materials on the web permanent. But afterwards we have that the overwhelming majority of students see reference fraud as an issue for your own scholarly work. So, if you think back to our survey when we were thinking about whether or not students thought that their own links might break, many of them didn't or didn't know. Here we see after this workshop that they did. The students liked the workshop. They found it or I should say they found it informative. Most students found it either, you know, somewhere in the range of very informative to somewhat informative with no one saying none. And I should say that these surveys were anonymous. We did not ask for any identifying information. We didn't know the participants. So, the flu weren't just trying to be nice. So, so workshop might be one way of raising this awareness about about reference fraud. However, when we think about this, we think about kind of her takeaways and our conclusions. Right. So, when we think about take place, the first thing we noticed kind of just off the top is that it's really hard to get students to attend optional workshops. Right. So, we sent out recruitment emails for catered workshops to over 12,000 students with fewer than 30 attendees over two days. And this was like good catering too from a very popular local restaurant. Additionally, attendees had the option of getting credit for our graduate professional development certificate. So, while we're sort of hopeful and we'd like to raise awareness and education around reference fraud, issues involving reference fraud, options for what to do to mitigate reference fraud, getting students to come to workshops about it is difficult, at least in our experience. When we asked students what they wanted, like they said, we asked them sort of like, how would you like to deal with this? We had a very low sample size, as I should say. This is not in any way definitive at all. But we did ask students how they would like to have an intervention. Like how do they think this would best be addressed? So, the first option is talking about automation or tools. So, having some sort of tool that will go through and archive all of your web links for you, either maybe in Virio or the thesis submission process or some sort of like browser add-on or something like that. So, they want automation or tools. Another set wanted some sort of online guide or instructions, like either a tutorial or PDF or something like that online to tell them what to do. The largest percentage wanted a workshop or class. Now, these are already students who are attending a workshop or class. So, you know, that might already be their preference. And a couple expressed the sort of like non-specific desire just to get more education about this somehow. So, as I was saying earlier, even though some students may want workshops, they may not be feasible. So, like we had just had an extremely low yield despite our free food. And with at least 12,000 graduate students who are in plans that require the filing of an ATD, workshops would be extraordinarily difficult to scale in any sense. So, the general idea is that even if we raise awareness, we're going to need some sort of tool for addressing reference fraud. And one of the things we're thinking about is how automated could that be and how might it be deployed to find the links and extract the URL references from an ATD to mint a permanent link or an archival or make an archival link through a number of services that do that. And then last step, put these persistent links back into ATDs such that someone who's opening the file can get the archival link again. So, those are kind of the steps that are necessary. And we're trying to figure out like what would be the best way to do those steps. So, part of what we wanted to do with this presentation was talk to y'all to see to kind of you are the thesis professionals. And we'd like to have a conversation about if your office has the capacity to administer training your tools. Does this seem like something that a thesis office could take on? And we recognize those are a variety of answers about that. And then what would be helpful for you to address reference fraud? And do you think that's even something that your office should do if it's not there? Where else might it be? And then if anyone's interested, we're looking trying to gather interest from future authors. So, can I consider if I open it up for discussion? Sure. Are you wanting people to just chime in? Yeah, that would be great. If people just unneeded and chimed in, that would be fantastic. Well, I have something to start off. So, I am the Institutional Repository Administrator. Okay. So, I do feel some, I guess, responsibility for reference fraud. But I don't know how to actually deal with it, I guess. I think I need my own workshop before I can train other people. Right. Okay. So, a workshop like more training just about what reference fraud is and options for dealing with it or mostly like options for what we can do about it. Okay. We've got a chat. Same. Okay. All right. Oh, sorry. I went, I didn't need to go back. Because would anyone be willing to share, like, do you think this is part of your wheelhouse? How would you find all the links in ATE? Yeah, this is a question. This is one of the questions that we would have to address for sure. There are softwares that can crawl and find them. So, it's possible to do. It's just, it's got a couple steps. This is coming from a chat. I'm an IR manager librarian too, and I'd appreciate education on how to teach others about it and deal with it. Okay. Yeah. Sure. Like, put together, put together like a, maybe a kit, like a kit that has some background readings and presentation slides, maybe something like that. A kit would be great. Okay. I think this is an issue, but not sure who can fix it. Yeah. I fit in tools to be used. Okay. Yeah. I think this is part of our question too. Oh, you don't allow in-document paper links, but allow web addresses. Okay. That's interesting. Yeah. We, we actually just looked at web addresses in our study software with the best, more efficient. Yeah. Exactly. Just, it's just probably not going to scale. I think one of the questions is where, where would this process or where would this kind of live? Like, who, who would be doing that? Would it be a library? Would it be a thesis office or would it be the university itself? Where, who, who, who manages that? Who looks at that? And we're trying to get a sense from the folks. Like, do you think that would be your library, your thesis office, something like that? I haven't looked into it in a few years, but archive at .org and I supposedly does, of archiving websites. Yeah. So there are, there are a number of archiving services. It's getting all of the links archived. So they all kind of have pluses and minuses to them from permalinks to, you know, looking at, like, internet archiver, a variety of archivers that, that do that. There doesn't seem to be at the moment a good site for archiving multimedia. Um, so things like videos, that's hard to archive. You don't have, okay, sorry, maybe you have a thesis dissertation office. They're able to handle these types of issues. We don't have such a thing at BYU and we don't have the manpower to be able to check out all the links in the ATV. Some colleges might check these things. Yeah, I don't know that anyone actually is doing like checking an ATV. It's a, it's a tolerator. Okay, you have no okay, small liberal arts undergrad college here. It would fall on the advisors and librarians for educational advisors. Yeah, a lot of advisors. That's a good point. We have no thesis office. Zotero automatically captures screenshots. Could those be part of the ETDR package? Maybe? That's it. We, we've talked about Zotero. One of the things I think sometimes we run into our copyright questions when we actually put images back in things. Um, so some things can get away with library issue, like library protections. So, um, but yes, doing something like making as a tarot add-on or something like that could start minting those might be a possibility. How are we cutting edge on the topic? No, we are, well, I would say. So, Martin is one of the reference rot experts. He really is at the, at the top of the reference rot game, so to speak. There are some other really good resources out there. We have an article by Nascott and Potter, which I don't have in front of me in the link to or the name of, but they do really good work. We have an article coming out on that initial project that I was talking about in performance studies, ETDs, which I think is coming out in 2024 and portal. So there are definitely people working on web archiving, for sure, and on reference rot. And people have started looking at it in ETDs. It's just not giant. I would appreciate the toolkit. Yeah. Okay. So is it fair for me to kind of characterize this discussion as saying toolkit would be helpful for education, both for yourself and for other folks. And that checking that, that some sort of tool, if this were going to be administered, some sort of tool would do that. Yes, it is also a citation issue, which involves the disciplines. Sorry, I got distracted. Absolutely. So part of this is if you have a citation system that doesn't require a link or has said we're past links, that's an issue. And so there might not be a link to, link to. Yeah. Okay. So hit a tool eventually, and then it sounds like a variety. There are just really a variety of responses in terms of who does what at different institutions. Thank you, everyone, for that. I am happy to just, oh, great. Would it be helpful for provide training for campus departments for doing the approvals? What stage of approvals? I mean, yes, but what, like, are you talking about like, sort of like ETT chairs, or the thesis office members themselves who are doing the approvals? Sorry, I'm going to chime in. Since I'm the moderator, I can speak, right? So that was my question. I was just thinking, like, our campus departments have their own approvals that theses and dissertations go through. Would it be helpful if during that process, they double check links to make sure that they're valid? I don't know if that's that helpful in the long term, but maybe up front, it could help catch some of those that are not useful. Yeah. Yeah. I see what you're saying. I think actually, it's not just checking to see if the links are broken at the time, it's to mint an archival link, and then put that archival link into the ETT so that someone can access it in the future, because I mean, these links definitely break over time. So something that's there, right? When they're approving it, you know, in five years might not be there, but we need to get the archival versions of, we need archived webpages and archived web links. But I solution to that could be something that could go to committees. Okay. I see. Okay. So you've got a couple of steps. So it would be you training the librarians would then go to the advisors and then go to like the best practices. So it would be like a train that trainers trainer type model. So in terms of tools, like as part of this is definitely raising awareness, education, but some of this is also giving people a tool to do this relatively quickly because it takes a long time to archive all your links. You know, if you've got 150 references or something like that, then, you know, you're, you're sitting there on the internet archive a lot. So if anyone is interested in developing a tool or thinking about a tool or something like that, let us know. We're getting funding for like a larger grant project. Yes, absolutely. So as Elizabeth said, I think you might need a hierarchy of what might break links to actual journals would be less of a problem than sites. Yes, that is definitely true. Although I will say we were surprised by some of the things that broke. Part of it is a question about who who's doing the archiving and maintenance, right? Like librarians, if we're talking about journals, where these things have been archived many times, then that's less of a problem. Even if filling breaks, you can go find the article. But there are other things that we thought would be more durable. Like when we looked sort of down farther, that we're surprising. I mean, maybe it shouldn't be surprising at the moment, but government data was surprisingly breaky, surprisingly wrought, reference rotted. And so there is kind of a thought about what might break. And in some cases, it's not even, I mean, there's sort of this issue of links breaking and having a tool to do this on mass. But then there's also even in some ways like the need to educate students about how to represent their sources in their work, like if you're referencing a conversation or an interview or something like that, like you might need to write in parts of that into your thesis, thinking about the way you describe things, particularly a thermal. Y'all have any questions for me? Like we answer as we have, I think six minutes left. It almost seems like we need to archive on our RRs. I mean, our RRs kind of are archived. But do you mean an archiving tool on your repository? Yes, no, that's not too fetched. That's exactly what we're talking about. Like we need some sort of tool that crawls a document, pulls out the URIs, sends them over to something that mints the archived versions, and then puts those archival links back into the document. Yeah, no, that is exactly what we're talking about. And it could go a couple places. Like it could go into a citation manager software, like be an add-on for an open citation manager software, something like that. It could go into Virio, like whatever the submission system is. If all of your works are getting stuck in an institutional repository, maybe there could be some sort of in-between step there, but yes, that's the sort of thing we're talking about. Yeah, that's an option into Virio, supplemental files. Well, I will go to my thank you slide and say thank you. This is a really helpful discussion. I like chatting with you all, having my chat up. But thank you. Thank you for all of your ideas. That is very helpful. Feel free to contact us too if you want to talk more. Thank you so much. That was very informative. A lot of things I hadn't thought about. So, and now I need to think about them. So, thank you. Well, thank you so much. So, we are going to close out this session. This is our last session of the day, but we hope to see you all tomorrow at eight or sorry, eight o'clock Mountain Time. I'll just put that out there because I'm in Mountain Time. And it's 10 o'clock Eastern. So, please join us at the same link and we will see you all tomorrow. Have a great afternoon.