 I'm Rachel Curtis. This is my colleague, Laura Davis. We are both digital project specialists at the Library of Congress. And though our titles are the same, we do different work. So we're going to kind of go through our workflows, how we use open source tools in our daily work. So like I said, I'm Rachel Curtis. I am the project coordinator for the American Archive of Public Broadcasting. So I manage all of the recorded sound and moving image files that comes to the library through the American Archive of Public Broadcasting. And it's very similar to the community-based acquisition process for the library, and it's not really something the library had undertaken previous to this project start. So I'll give an overview of what I've done to handle the digital material coming into our section, how I manage it, and some of the challenges and open source tools that I found useful for my project. So my first thing is I'm going to sort of start at the beginning before the library had hired either Laura and myself and sort of how the library was handling digital material. And it was kind of a wild west. So hard drives that came in either through copyright deposit or just since the library content from gifts were usually shelved, sometimes catalogued as items themselves and just set aside for later processing. The files that came in were stored in a server space, we call it embargo, but it is very similar to basically a digital closet. And items are difficult to pull down from that. It's a storage space basically. There was no one person on staff dedicated to doing born digital ingest. Those kind of different people as needed. Files were ingested on an as needed basis. There was no coordination between staff, there was no workflow documentation, and there was a pretty big digital backlog. As larger batches of files started to arrive with contractual obligations like the American Archive, it became very obvious that dedicated staff members were needed to manage this content, process them in a timely manner and create documentation. And then when I was hired about three years ago, there was already plans in motion to hire two more digital project archivists, Laura and then Keith Paramore, who works with Recorded Sound Material. So there were two projects, sort of three things that sort of was the catalyst for this. The lovely picture of how we were receiving hard drives, so there was an increasing backlog of material, which is more of Laura's purview, so she's going to get into that. There was the History Makers Project and the American Archive. So the History Makers Project, I mentioned this here because a lot of the processes put in place for ingesting that content were duplicated for the initial ingestion of the American Archive files. There were contractual obligations, so files had to be processed in a timely manner. Workflows had to be developed for our systems, which our internal library systems are really years toward ingesting material digitized in our labs from library content that already had a database record for them. History Makers and American Archive are files coming from without our institution and just as digital files. So they don't have any, you have to create the metadata records for them and sort of plan an ingest path for them. A lot of work when I started was falling on our video lab supervisor and it wasn't something he could be decked with, he had other responsibilities he had to come. So I was hired to, Lauren Sarnson was my predecessor and then I came on after she left to sort of take up this project. So the American Archive with Public Broadcasting should start a lot of you here have heard of. It was a Corporation of Public Broadcasting funded project and that later WGBH and the Library of Congress took on. So it initially funded a two year position at the library to coordinate the arrival of the 70,000 initial preservation files. Lauren's work was focused on metadata mapping, pulling records down from WGBH's archival management system into our Mavis database, putting in place technical specifications with the input of James Sider. And then when I started there were already grants in place for the American Masters Digitization Project, News Hour and the NET cataloging project. And then when I came on I was really tasked with dealing with some of the outline issues with that initial ingestion of preservation files and then dealing with the new grant projects that were going to be coming in. So I briefly touched on the libraries where the American Archive is really on long term preservation. We do provide onsite access to the material digitized through this. We have joint policy decisions and governance with WGBH for this project. And now that we finished our NET cataloging project and have a better handle on that material, we're doing in-house digitization and sending files to the American Archive for access online, which is great. And so when I started it was quickly obvious that, well, there was no documentation of any of the workflows, so that was one of my first tasks. And I also recognize that some of our in-house proprietary software was not sufficient to just handle this project. So I started advocating for more open source tools to be accepted by the library officially for us to work through these challenges. So some of my challenges when I initially started were these large digitization grants that were in place, except for NET, that was a cataloging project. So anything digitized, usually from analog material that the institution wants to add to the American Archive, usually needs grant funding. So American Masters, recently Riverside Church received a clear grant. P-Body, PBS NewsHour, and these are large projects. These are anywhere from 4,000 to, in NewsHour, 8,000 assets being digitized in files that need to be managed as they come into the archive. And these projects take a lot of time from managing just the administrative things to, it's about two years, and then to ingest the actual preservation files. In addition to that, a lot of stuff is coming from multiple sources, both from small institutions, small local institutions to larger national institutions, all of which staff has a varying degree of either inventories of what they have, technical expertise, no one institution is the same. So it's kind of a mixed bay as we work with people to give them as much help, because we really want their stuff and to work with them to give their preservation files. We also get a variety of file formats. So this is just kind of an example of based on projects that we've already adjusted and that are in the works, just a whole range of stuff. The libraries, our preservation standard is the JPEG 2000, but we also get uncompressed files, QuickTime, NPT, ProLez, NPT files, ABI, DMX, HD, just a whole range, especially when it comes to born digital content, we're getting, what they have is what we get. So that is also another challenge. And then making the case for open source tools, the library for a variety of reasons, and there is a long bureaucratic process too, to make the case that it should accept open source tools, despite maybe some of the security risks, or just make the library more comfortable in making these official tools that we can work with. So some things I used to make the case are we're able to manage projects with more agility, more collaboratively across multiple institutions, keep track of project progress, especially when it involves multiple institutions, managing multiple file formats, and managing inconsistent metadata. So I'm going to go through a couple of case studies, namely the News Network project, which was a grant funding project, which we had, because it was a grant, we had a little more control over what files were that we were going to receive, how we were receiving them, and the timeline. And then also, my second case study is working with smaller institutions, where it's a little bit more of a mixed bag, and it's a little more time-intensive in some cases to keep track of all the stuff that's coming in from the smaller institutions. So for News Hour, this, like I said, this was a project to digitize analog content. WN, let's see, WIDA and, let's see, NARA held the tapes from News Hour that spawned a whole range of file formats. And the preservation files came to the Library of Congress. Access files went to WGBH, and also preservation files also went to News Hour. So this was a clear-funded grant that ran over two and a half years back in KCF. So we've recently just finished, we're just wrapping up this project, which has been very successful, but really kind of was my first project that I took on when I started to make the case that I really needed some open-source tools. So like I said, over 8,000 takes were digitized, a whole range of formats stored in three different locations, and then the library agreed to digitize about 570 outstanding tapes from this project. There were some inventory issues, so WGBH hired a contractor to perform an item-level inventory. We got any duplicates, and that stuff was really necessary because as I said, our system, we needed to navigate it in place in order to accept any sort of files into our system. And my first task was to get us to accept open-refine because I really needed that to normalize the data. Otherwise it was going to be a lot harder for me to create batch records. So then as I'm going through and planning with my team to ingest these files, we really built on the lessons that were learned during the initial CPB-funded digitization of 40,000 hours. Initially, file delivery had been LTO tape. We decided to shift to hard drive. That was easier to manage because I didn't have access to our LTO tape drives. But I didn't have access to computers that would allow me to offload hard drives. We needed to do metadata cleanup and normalization, which open-refine was really necessary to that step. So the metadata ingestion, we needed to do a bunch of batch updates to our database. We had initially been using XSLT for the initial CPB project. We switched to Python, which is a little easier for me to wrap my head around. And actually the scripts I found were more manageable to do. And that just allows me to do batch updates into our system very easily. We refined the ingest process. So I collaborated with the staff at the library to set up automated workflows. And that's going to be a theme both Laura and I are going to touch on to automate as much of this as possible. Because that just makes it far more efficient. And to take a lot of the manual work out of it. And then we also added a quality control component. We requested a pilot batch from the vendor to confirm the quality and then we had monthly delivery. The initial ingestion of 40,000 hours arrived as one big batch to the library in LTO tapes with no mechanism really in place to say, I'm sorry, we can't read this file. There's something wrong with it. We need a new copy. There wasn't really that process in place. So with NewsHour, we really needed that in there. So we could go back to the vendor and say, please re-digitize this tape or re-wrap this file. There was something wrong with us. Wrong with it, delivery and copy. And then we used Google Docs to track issues across all three institutions. WGBH, NewsHour and ourselves, both in the preservation and proxy files to track any issues. WGBH was also tracking missing dates as well for episodes that were missing. So the library received baked items. Again, this is something we can request this from vendors, but we very rarely get files and bakes from individual donors. We got a preservation file, a QC tools report. Our own software we use for QC is called Baton. That report was added to the ingest package. We got SRT files with closed captioning when it was available and then checksums as well. We had a whole. The only manual part of this process was really some of the metadata and checking the files that failed QC. Those had to be manually checked, really kind of tested the limits of our automated QC. And we found that most errors, there were a few we had to send back to the vendor, but most of them were not actually the errors that the software was flagging. So those just had to be manually put back into the automated process. But otherwise, I really relied on a staff that was, I was becoming more familiar with Python, but there was other staff that were even more familiar with it, so we were getting this automated process set up. And then there was just the last 570 tapes that were digitizing that goes through our normal workflows as part of this project. So here's kind of my open source toolkit for this project. MediaInfo was part of our automated scripts to check the files and sort of troubleshoot any errors that might occur. There are actually very few of them. Open refined to deal with the metadata. Confluence we used internally to all my documentation is stored on there. So if for some reason I was hit by a bus the next day, someone would be able to take over my project and continue on the ingestion. Python was again key for our automation process and Google Docs for keeping track of issues that occurred across the whole project. So that was kind of the news hour project. So in contrast to that, we also get more digital donations from individual stations and producers. And like I said, sometimes this is a bit more time intensive. We can't put really any demands on what we receive. When we station contact us or we contact them, we ask them what they have, what sort of collections are there. But again, what they have is what we get. And this is just sort of a screenshot of a modified, an example of what a Trello page that we sort of used to manage these looks like. Because we have several elements we kind of need to keep track of for this. So like I said, for file acquisition from individual donors and stations, we do some information gathering. So what do they have? Are their materials analog or digital? If they're analog, they're probably going to need to apply for a grant. What file formats or storage media do they have? What metadata or some inventory do they have? And we need to track all the information because to accept this collection, we need to have it approved by both the Library of Congress and WGBH. There's a D of GIF, sort of a legal process that's in place. If they are getting a grant, we're often helping them with their grant application status, which Casey has spent much time helping stations do. And then when the material arrives, when it's complete, any issues that pop up. So by this Trello, sort of, there's some color coding to track where the D of GIF is, where the grant application is, when it's sort of coming, in what fiscal year are we expected to receive it because we do have a limit of 25,000 hours to accept for each year into the American archive. So Trello has been really helpful for this. I did try to do this on an Excel spreadsheet. That did not work. This is much easier to track since you can move the boards and parts around to where they need to be. And then we also do use Google Docs, especially when giving. Stations and donors, sort of metadata templates and legal information and grant application aids. That's really been helpful. The other challenge with these is that these ingests can't be easily automated because they're often a variety of file formats. There's no checksums. Again, the files aren't baked, which isn't necessarily a problem, but sometimes I get unexpected file formats and unexpected storage device, inconsistent metadata. So there's a lot more manual tweaking that needs to happen here that I can't just, I like to use our project where what I could do was offload the files into a watch folder and then the process just picked up through the different scripts to get them adjusted into our system. This one just needs a little more hand-holding. So SSM Paid, Media Info, and OpenRefine are really the things I use to manage these sort of individual collections. And there's sort of an example, just a graphic visual for what I'm using, but yeah, all of us in Laura will go into this a little more, but sort of analyzing what open-source tools are available and what ones we could get, the libraries approval to use, has really for both of us, but it's just an ongoing process to make the case. So I'm going to turn it over. She and I share the same title, but we do have different responsibilities. There's some overlaps. I just want to, and this is a quick way to show you where my responsibilities, what they include, is I do basic processing for gift and copyright collections for born digital collections, not digitized. And part of the goal of my collection is to eliminate the backlog or the contents of embargo or digital closet. I love the digital closet term. And that includes creating latest records, creating derivatives, and then adjusting the content. In addition, I'm responsible for minimizing the, you know, it's great to take it out, but we don't want a whole bunch more to come in. So kind of keeping up with what's coming in, and also minimizing the backlog, and automate as much as possible. I also support the addition of content to our website, such as the National Screening Room and selections from the National Film Registry. We also have the Geographer Zone of Film, which was a collaboration with the Geography and Maps Division and the For-It Home Movies, which was a collaboration with the Managed Script Division. And with that, I compile all of our content, the movie and image files, the still images, and the metadata, and compile all of that, and then send it to the team up in DC. We're based in Culpepper, Virginia, to send it to the teams in DC to add it to our website. I also manage the creation of our ITER, we do register ITER values, and I also manage handle creation as well. So here's some of the selected collections. I don't know how many of you have the bin of drives. That's an example of some of the drives that I have. I came in and there were just boxes of drives, DCPs. Even the receiver for your wireless mouse, did you know that they have hard drives that size? They have little flash drives that are that size, and those are received in copyright, and then we do have some things that are received, gift collections that are also on hard drives. Also, we're good video games, but we're not going to talk about that today. Everyone loves to talk about video games, some include. So across all of this, this turn is accurate about three weeks ago, and at that point I still had, I think it was about 30 drives left to look at, since then I got about 10 more. So talking about 117, notice is not on moving image, because we did get a company documentation with some of these, and font files, that's great. So if we're just going to move image files, this is us, 8 format. This is a big red area. I shouldn't point, you can see it. This is, we have a preservation copy of the Vanderbilt Television News Archive, so I can blame them for that. Thank you guys. So this is what we have, and these are the things that I'm working on. So, we talked about workflows, and one of the things that's really important, especially when you're dealing with collections that can contain one title, you know, one digital object and many digital objects that are all related, is setting project goals. And we do have my colleague Keith Paramore in Recorded Sound, and I set up a whole series where we do have project documentation. We start with a project charter, then we have a project plan, and then we document as we go forward, so that we have records of what we do. And this has been really helpful in terms of starting a project so everyone knows where we're going, what's happening, what changes may come across, and especially good for communicating with stakeholders and supervision. So for example, this one is really simple. We have administrative information, the project background, what it is about the collection, details, and then the project requirements, which are the goals. And there's some common goals to most of the projects, create a Memphis Record, create a derivative, ingest. And some of these other projects have additional components as well. Once the goals have been established and improved, then we can look at available resources in order to complete the project. And those include, obviously, the skill sets that are available, because that's going to determine the tools you can use. Rachel was talking earlier about, in the previous session, about using the XSLT. Well, that's not going to work for me unless I go out and learn more about XSLT because I've done very little with the XSLT. So learning your skill sets so you can apply the appropriate tools and minimize your learning curve, although we do always want continual growth. So here are some of the tools that we use. Not all of them are open source. I did put a couple in oxygen, but you can just substitute XML editor of your choice. Confluence. You can, from the way we use confidence, which I'll explain a little later, easily substitute a Google Doc. And we'll talk about a lot of these during this session. We're not going to talk about all of them, but I just want to tell you that this is kind of our core toolkit right now, and we're always looking to add more. So today I'll be talking about three case studies. And they'll illustrate the planning process, including the identification of tools. There's going to be some nice charts about that. And then discuss the processes and procedures. And then we'll discuss the successes and challenges. And then we'll answer them for questions for both Rachel and I. So the first one is Saturday Night Live. So Saturday Night Live is a preservation project for us. We receive, every week or so, we receive files from NBC with Saturday Night Live. We're current up to at least right before Thanksgiving. I haven't looked at the same one the last one we got is. But when I came, part of the digital closet was all of the Saturday Night Live episodes from 1975 to the time when I started in January of 2017. So by the time I got oriented and marked the tools and everything, we're up to over 800 episodes, many of which also have an accompanying documentation with scripts and case sheets. And the newer ones, I think the past three or four seasons, also included additional videos, moving image files that are not only the broadcast version, they're the repeat, the dress rehearsal, and the syndication versions. So there is a lot of content here to manage. And this is the first project that I worked on at the institution. So the first step was to establish the workflow. You can see a lot of similarities during the workflow because, again, basically, create an eggs record, create the derivative in just one. So with this one, we don't get any metadata with the materials from SNL. They basically just send us the files, and they'll have the host name in there. So for the 1975, you know, that big chunk of material, I went out and went to Wikipedia and other resources in order to get the episode information for the broadcast date, the guest host, the musical guest, and the cast. Then we're also assigning, we also pulled ITER values from using the API with ITER. Are you all familiar with ITER? It's a DOI, the entertainment identification registry, the identifier registry, excuse me. So we received the content. Creating the access derivative, that was pretty easy. The metadata took a while to do. We went through various iterations of a Mavis record. For this, I worked with my supervisor to create a template, basically, using the metadata off of the CSV that I created. You know, hindsight, I should use a database, but we'll get to that with this learning, if you're always learning. And then once we have the template, then I could go ahead and batch create. I did not do 800 at a time. I never do 800 at a time. That things happen. It could be the most perfect thing in the world and you'll have one thing, even though I'm in there. And I don't say this from my own experience, I say this from the experience of others that I have heard. Usually we do one, for all of our new augmenting processes, we do one, then we'll do two. And then we may do five, and then we'll start going into like 100, which I was doing five seasons at a time. So here are the tools I used for this. I think everyone's familiar with it. Just about all of these, except for Postman, is an API tool where you can write your query and then submit it for that. And then PyCharm, it's a little PC about Python. It's a Python editor. I find it a little more helpful, especially in error-resulting, than in idle. So here's an example of the PyCharm screen with, I put in an error, so you get the yellow boxes, it'll tell you when you get the red with the fatal errors over there. So again, that's really helpful when you're creating new scripts. So going about it. And in DLC, you'll see that a lot. I use that for troubleshooting, in case we do get some files that are truncated or have some damage, because we don't receive tracks like this with these files. So if they're damaged, I will go through and use DLC just to see if I get it to play. And sometimes I'll pull it back, try to record that from the barcode again, try to play it again, and then it goes online with the things that don't play, then we have to go back to our own demo about it. So here's the workflow with the tools. Yeah, we use a lot of pipeline. This is not a fully automated process. So all of our automated processes are demo pipeline. The open refine and the osman down here are under the episode level metadata. The open refine we use when pulling back the list from barcode, because it's a rich text file. So I do that part and then kind of create a more manageable CSP using open refine. Epiphythetic and media info. So case standard number two, we have the US Senate floor recordings. We built off of the processes that we did in the Saturday Night Live. And the key with this one is project goals. Current backlog, I should have the dates there, and I did not. And that was 2015 to January 30th this year. And then current receipts, which is January 30th, are posted this year to the present. And we've got content from 2007 to 2015 coming at some point. We're creating, and for all of these, we create documentation, and again, automated as much as possible. And then we're always looking to apply processes from previous projects into new projects. So again, we have this. The key difference between this and the two key differences between this and, okay, three, sorry. Between this and Saturday Night Live is that one of the Senate gets this metadata and that has made all the difference in the world. Two, we, and because of that, we've been able to fully automate this process. So we receive the files, they're transferred from the Senate to us from the Senate recording studio and they're placed in a watch holder. MySQL database picks it up. Inventory is the comment, or the Python program picks it up. Inventory is the content in the database. And then the derivative creation starts and then the Mavis record is populated with the ingest package. So it's all one thing and then it just goes in ingest. The last part, the handle registration, that is not automated. That's the one part of this process that is not automated. And that's due to the Library of Congress handle tool that we use. I can automate that. I can do batches. But generally, I just come in every day and I have identifications when we get content from the Senate. And I look and see when they're done and once it's done, I build and create a handle so that's just part of my daily activity. So that's an activity thing. Here's a sample of another thing that we get and specifically the fields that we use. So we're only using five fields from a much larger XML document. Is it? And then here's an example of the Mavis record. It's not robust in a typical cataloging way but what we do is we do reference the Senate record, which will have more detail about what happened that day. We have the URL because they do have a URL pattern. And then that's where I also put the handle in. So the handle is created as well. And the handle is created as part of that whole ingest process. The handle is created when the object has been in the database. Now for that Senate, it's not like we're getting one file per day. We don't get one file per day. One moving image file. We get one moving image file per hour that the Senate is in session on the floor. So we get one moving image file. We get as many as I think the most we've gotten is 24. And then you get an XML file for each of those NXF files. So two files per hour. So that's what this is doing. So the fact that we're able to automate it because we do get it every day is a really big efficiency for us. So again, go through the process here. These are the tools that we use. We've talked about all of those. We get an open refinement used for the backlog to look at the things in the bar graph and pull that out. And then here's the chart. So even with the batch document with the text file that I used for the handles, I actually used a Python. I used a Python firmware to create that when I was going through that 2015 to 2015 to 2018 content. All of this, for this process, automation was really a primary goal because we knew that we were going to get this 2005 to 2015 content. And we do estimate that that's going to be at least 14,000 files, evenly distributed between the XML and the NXF files. The third case study is the National Screening Room. We launched the National Screening Room earlier this year. Some of the processes for this were based off of the selections from the National Film Registry, which was launched last year. Here I'm not generating to make this record. This is a little different, so we should keep yours a little bit. And so our curator selects the content. He identifies the files that we're going to use. Some of our colleagues, they edit the files and do speed correction and do whatever magic it is that they do. They drop it into a large folder where a Python script will pick it up and add the bumpers from the Library of Congress in Washington, D.C. I've heard that many times. And so that's where that comes in, okay? With the still image files, there's a... I'll show you the program I'm using for those, is we use FNPEG to do that with a Python, it's Python and FNPEG. For the still image files, I need to create a JPEG at the original size. And then I need to create a GIF at 500 pixels wide. What we found, though, is that the quality of the GIF is not as good as the JPEG when we're doing that. So I use FNPEG as the little red cap that I'm going to say on the next slide to derive the GIF from those JPEGs. Then the metadata. Our catalogers review it. And our metadata can come from two sources, either Mavis, at which point I have to pull the Mavis records out. Mavis exported as an XML. We're on a Python script to create a file to be imported into the DM tool, which is an alternate source to implement the data into our website. Our website was built to repurpose ILS records. So the fact that not all of our material has ILS records, we have to have this additional methodology to have our metadata and our content included on the website. Then I gather everything up, submit the files and the metadata for the DM tool. I have to submit it through the DM tool. If it's an ILS record, the catalogers edit two fields, add or edit two fields, and then I alert our team in DC. So here's a little red cap that we use for the still images. We use trial and error to manage the content. We have over 300 titles, or almost 300 titles on the screen right now. And we're trying to add titles every month. So there's a lot of content to manage through all of these different parts. Here's the trial and error board. So this is part of the trial and error board. All of these have already been put online. So we have every step in the trial and error board, and we just move the cards across. But with trial and error, we're really taking advantage of some of the things like we're looking at rights information. So some of the things are only streaming. It's in the screening room. It's only streaming. It's in the screening room. It's on the film registry. So we do have indications for that. Green is the screen room. I think yellow is the phone registry. Red is streaming only. And then the black, the dark color is replacement files. We're also working to, for those files that were digitized 20 years ago when there was a time segment, we're trying to get those replaced with better quality scams. So here's the workflow for this. Again, for this Trullo, it would be useful more because Python isn't quite as critical for this process. But this is our workflow for this. So some of the challenges. We always have challenges when we do this. So Saturday Night Live already mentioned missing and damaged content. So there's something we don't have that we're going to go back and ask about that. For the Senate, we have, we just started getting close caption content along with the, for every hour. So that's an X amount. So we're having to retool the entire automated process right now. The screen room is the process for replacing older moving image files. Again, there's a lot of those segmented or early scans. It's a lot harder than I thought it would be. But I'm not familiar with our systems. So it's been a learning tool for me. I've only been at the Library of Congress for just over two years. So, but we are working out processes with our colleagues in the U.C. And then the repurposing of ILS metadata for the screening room has presented a sentence set of challenges. For example, we recently put up a whole bunch of newsreels from the All American News. And the way that the titling was set up in the website that we call the system P1, the way the titling was set up in P1 is it did not include the field 245 subfield N, which in serials will give you the date, which will also give you the date in the newsreels. So instead of having the newsreel being from a specific date, we had 23 files that said All American News with no other indication to distinguish one file from another just by looking at the title, in the title displayed. So we worked with our colleagues in D.C. and they assessed what type of impact it would have on all of the content in P1 across all of the divisions of the Library of Congress. They determined that it didn't seem like it was going to do too much damage, so we were able to lobby and give that change made and that will be applied at the next update of the National Screening Room in December. And for all collections, it's technical issues resulting in the non-receive of files. This is an IT issue, so anytime we have any changes in our system, anytime any of our partners have changes in their system, we always have to be careful that we're still receiving the files that we're supposed to. Firewalls can really get in the way sometimes. I know that's what they're designed to do, but sometimes you don't want them to be that effective. Finally, and this is the part sometimes I have a hard time with is you do the project draft of an evaluation because you know, I'm like, okay, good, I have a lot on my plate. I'm sure we all, I mean, I know you all do too. You finally get done with the project and you're like, hey, we're done. And then you move on to the next thing. And this is the step that's probably the most important is going through doing the project draft but doing your final project report and doing the evaluation. What went really well? What could them better? You know, what, if you could do it again, what would you do? For me, SNL, I wouldn't have used a CSD to manage my metadata. I would have built up my SQL database. Okay, fair enough. But again, that's my first project. So probably a CSD was okay for my first one. That's fine. And then what elements can be reused from this project for other projects? Again, no use in reinventing the wheel on this one. You know, for every collection that we do, we are creating a derivative on most all of them. So we just reuse that same script. We use the same script to move our ingest packages because we only have to, we only put like three ingest packages in our watch folder with the content at one time. So we have a mover, it's called a mover, a mover script that will make sure that that number is kept at three or below. So when we get to the M, we can make sure everything runs so I can go home and I have to babysit all of those, all those ingest boxes whenever. And then you're finally done. So thank you very much. Are there any questions? You want to go home? You want to help? Yes, ma'am. I was wondering if you could elaborate a little bit more about why you need to like using CSV. Oh, it's not that I didn't like it. I like CSV a lot. But considering the volume of information I have for the Saturday Night Live project with over 800 episodes to do it one time and then cast information, my spreadsheet was going into, you know, it went A through Z, and B, A to like B, N. I mean, it was huge. It was absolutely huge. Because once you start putting in, you know, here's my preservation file, here's my access file, here's my original filing, here are all these things that we're documenting. It just kept going and going and going. And a database would have been a much more efficient way in order to do that. A show that's been on the air. Don't they ever have to access old episodes? And did they do that before they met you and how they had that? How did they do that themselves? So SNL recently launched an app where you can search by cast member and by episode. They have an amazing wealth of metadata that hopefully when we have our next conversations with them that we'll be able to tap into it as well. That they will be able to share that with us. When they send over the digital files. So yeah, they have, you should check it out. You have to sign up for it, but really it is amazing. The granularity that you can get to with that. There's one in the way. Yeah, hi, thank you. That's really good. Walking into the tools and everything. In my work too, I'm always trying to automate as much as possible so you know, get more done. So I wonder if you could talk to parts that you can't automate or that you know that you can't and that you won't be able to. So one of those is the handle tool that I already mentioned because the way that we create handles at the Library of Congress is we have a specific tool that we use and that's based on submitting a text file. I haven't investigated too much on how much automation we can do but considering the way it's structured I don't think there's a way that we're going to be able to do that. The other automation and the difference between Saturday Night Live and the Senate is the amount of metadata. If we had the granularity of metadata that's in the app, we could fully automate the Senate Saturday Night Live. If we had the host, the date, the musical guest, and the cast, we could automate Saturday Night Live as well. So really so far what it looks like is our barriers for these two collections are the two that we have with regular accruals as we just need to be able to have those key pieces of information and the metadata has really proven to be in these two scenarios, that's the big difference right there. That is the game changer. And for those who were in the session yesterday I mentioned, well-formed, persistent, consistently well-formed metadata makes all the difference. For my projects it's very similar. Metadata is inconsistent so there needs to be a sort of manual analysis of what I get that can't really be automated but we kind of build off each other's processes. I'm hoping to adapt what you've done with SNL for a project that's going to be starting soon for the American Archive. And then the other component, with the QC component I have with some of my stuff, it's still, when it gets kicked out of the system because the QC software recognizes something's gone wrong, that's still a manual process for me to push it back in or to go back to the vendor to say that something's gone wrong. So there's that added process. And I should say that one thing is sadder than that while I'm doing this, I'm trying to automate as much as possible even now because I get two files at a time maybe four every week. And I'm trying to start, I'm starting to add those automated portions now in expectation of moving towards that fully automated process. I should put a little bit of clarity on these tools. Did you teach yourself? Do you have an ongoing training at L and T or do you want to go to school? Sure, so it's kind of a mix. Some of it is self taught. A lot of it actually is finding the people I work with who know more about it than I do and so can kind of, yeah, so can help us out with, I can do some basic scripting, but for some of the complicated processes we have a colleague who, he's worked with our internal systems and he's familiar with Python and kind of goes the ins and outs but I've also bought the reference books. I've taken a couple of online classes through library juice to kind of learn more about SSLT, Python, some of the more technical things that I'm not entirely familiar with but yeah, it's sort of a mix of self taught, finding those who know more and then reaching out to someone who knows more than I do about it and yeah, teaching myself, taking a class. It's sort of a mix of everything because I found in library school I didn't learn any of this so it was really a self kind of finding resource thing. Tell me because they didn't offer any of this in library school because none of them existed and I hate the fact that I can say that but really the teaching yourself like I taught myself this and I just played with it and I had a lot of calls to IR because they wanted their password encrypted in a certain way and I'm like, okay, how do I, you know, so I run around with people in the building and you might know more and Keith has been an amazing resource for both of us but yeah, I'm totally, you know, no real formal framework. You know what we need to know on the job. I like the project management worksheet thing or you know, for each project it's a good idea that helping you allocate, you know, figure out how many resources you need to tackle certain projects. Do you find that you have more projects that you can tackle and then what do you do about that? The answer is yes, there are more projects that you tackle and the way we negotiate that is prioritization. So I work with my supervisor and our colleagues to prioritize the projects so everything has, we have kind of a, when I first started we went around and assessed everything that was waiting in a burger and then put a high, medium, low assignment to it and then we're working through that. So I don't necessarily make the determination and there's some things that come up that it's a, oh, we need to do this now so everything else gets put aside and then we tackle that. And then with the American Archive it's really a discussion with WGBH and that's why we have the 25,000 hour limit to make sure we're not taxing all of our resources and there's projects that are more intensive than others and certainly doing the grant and application management is one of those things but yeah, just sort of, because I kind of mentioned when the donor contacts us or we contact them we kind of just get an idea of what they have and that helps inform our decision of can we accept this collection, how much work do both sides need to put into to bring it into the archive but yeah, sort of assigning priorities to things, yeah. And one thing for me is that because I also, for some projects I need cataloging resources as well and my supervisor also supervises the catalogers so we can balance those projects according to the priorities for me and for the priorities for the cataloger and really kind of balance that out.