 Thank you for joining us at the CNI fall 2021 membership meeting and our session moving email archives from theory to practice. I'm delighted to be returning to this meeting alongside my colleagues provide developments in the email archiving community. Today you will be hearing about four active email archiving programs that collectively contribute to the broader communities efforts to build flexible efficient services to meet email archiving challenges. We will begin first with the ABCC program, followed by Harvard University Libraries EPAD Plus and the University of Chicago Libraries Attachment Converter and lastly EA to PDF. With that being said, I am Rubi Matatina, the email archives community fellow for the email archives building capacity and community program, known as the ABCC for short. The ABCC program is hosted by the University of Illinois library and funded by the Andrew W. Mellon Foundation. The four year program provides grants of $25,000 and up to $100,000 across two grant cycles to collectively target known challenges with the, within the email archiving community and build the capacity across institutions. On the next slide, there's a snapshot of our project website where we invite you to learn more. As part of this project, we have specific goals reporting specific processing improvements and increased tool functionality, developing a community of practice that establishes the baseline of the email acquisition, processing discovery and delivery services. And finally, our third goal is highlighting specific process email collections, the standard goals aimed to target challenges that are being faced by the email archiving community, which I will briefly touch upon. Some known challenges in the community are privacy issues because email implies dealing with personally identifiable information. In addition, there is a mixed use of email that presents many challenges to the appraisal and selection process. Not to mention the basic skill problem where there is so much email and not enough time and resources. Email archiving gets more complex when you consider attachments and linked content that can get lost in the archiving process. So how does the current format add to these challenges. Currently, there is a mixed bag of formats and standard to use that affects a number of things, the quality of documentation varies across different email formats, and there are mixed reviews on how well these formats are documented. When you consider the gaps in the metadata from an archival perspective, it can be very concerning because account level information can get lost, as well as information about what software came from, or how they afforded file format was to do. These challenges are well known to the email archiving community and some of which are being addressed by programs awarded funding in the first grant cycle. The ABCP program is very exciting because it supports programs that meet our goals and move the community forward. With the first round, there was a focus on tool development filling in some of the existing gaps and addressing some of the challenges I mentioned earlier. We have five awardees, two of which you will be hearing from in this session. So I won't go into too much detail. However, I did want to touch upon a project supported by the Council of State Archivists, also known as COSA. COSA is assessing and preparing state archives to pick the best email archiving solutions for their respective institutions. They recently released a report to determine specific needs and interests via a needs assessment survey, which you can find on their project website. So as part of our grant project, we have noticed a lot of institutions need to build more accessible email archiving workloads, and this is the kind of work that the project is intended to assess. However, our work doesn't stop there. COSA's needs assessment allowed us to be mindful of our various conference audiences and provided an excellent foundation for our survey. Before I delve into this survey, we have ministered. I wanted to provide you with some background information. We have decided to assess our audience's current email archiving practices via the poll that you see before you in previous conference presentations. And this has been a great way to engage with our audiences and learn more about the issues that different organizations are facing. We have done this poll about three times with varying audiences. And if you follow on the next slide, you can see some of those results. Our first was at the Society of American Archivist Archives records 2021 conference in June. As you can see from my first poll, a large majority of the audience indicated they do not currently access an email or that they have acquired email but are not actively processing email for long-term preservation. So this poll is a great way to demonstrate the need to bring the email archiving community forward and how the EABCC projects are working together to bridge existing gaps. In addition, we did the same poll at the Council of State Archives' work meeting and you can see a greater distribution of responses between the four answer choices. Lastly, we did this poll at the email archiving workshop provided by the Digital Preservation Coalition. The workshop's audience was primarily international and based. Once again, you can see the disparities in institutions that are currently preserving email. Doing this poll encouraged our desire to understand how many institutions have an active email archiving program and some of the obstacles these institutions face. Our survey email archiving debris survey practice was distributed regionally to archives, museums and libraries in Illinois at the beginning of September. We opted to start with a survey that targeted institutions in Illinois to assess any areas that needs refinement before opening a survey to a larger sample. We had a total of 68 responses and out of 396 across two weeks. The survey had seven questions that were spoken down into three sections, demographics, current practice, and assessment of needs. The 68 responses varied from museum, college, and university archives to corporate and or business archives. As you can see from the pie chart, a little less than half of the sample primarily consisted of college and university archives. We did include an option for other where responses could indicate institutions like private archives, regional film archives, and non-profit organizations. The following question in our survey sets the number of full-time equivalent employees. As you can see, more than half of our respondents indicated they have less than two full-time employees. Something that unfortunately is familiar to many archival institutions. This number can be alarming because trying to build the capacity to archive email or archive in general requires employees to have the resources to do so. This tells us that the email archiving community has to be mindful of when developing tools that can work in a wide range of institutions, particularly those with a relatively small number of employees. I did want to note that results appeared to be somewhat skewed in the 21 plus category because some of our respondents might have been reporting numbers for a large apparent organization. The survey also featured our initial poll question featured earlier this presentation. As you can see, many responded within the first two categories. Thus, although the sample is not entirely representative of Illinois, you can infer the notable gaps that exist. In addition, respondents were also asked to list current software if any they were using in their email archiving workflows. Here in the word cloud demonstrating some of those answers. I want to note that because of a large portion of our respondents indicated they do not archive email. This is a minimal selection of software that does not reflect all the tools in the community. This can indicate that institutions are not in the loop of current tool developments or do not have the appropriate resources. An area that you can target by getting the word out. The main portion of our survey wanted to focus on two on the top obstacles institutions are facing. Initially, this question was framed as list top list of top three obstacles that your institution faces. Unfortunately, the way this was formatted, we couldn't limit the answer choices to just three and some respondents and it did indicate more than three answer choices. As you can see on the graph, the top obstacles relate to a lack of training and technology, more so than choices like scale and quantity and lack of cooperation. We did have an option for respondents to write in and one common right and which is the lack of time and the lack of staff. But overall, we were pleased that people took the time to complete the survey and leave some in depth comments that will help us refine our survey, our survey to our larger samples. The needs and steps on surveys allow us to assess the current state of our community and the needs built, built capacity. The DABCC program is just one of the ways that email are coming communities targeting these gaps, especially with the program you will hear about next. With that, I would like to turn it over to Steven Abrams. Thank you Ruby. And hello everyone. I am Steven Abrams. I am the head of the digital preservation program at the Harvard library. And I'm delighted to be able to introduce you to our project EPAD plus integrating preservation functionality into EPAD. The background of the EPAD plus program came about through the convergence of the effort and interest of three institutions and their respective initiatives. To begin with, Stanford's EPAD system is a very well known open source tool that is used for archival processing of email. Importantly, it provides functions for initial assessment and appraisal processing and final discovery and delivery. But significantly, it has not yet tried to tackle any questions of the long term preservation of the processed email. Over the UK, the University of Manchester through their palladium project has been working over the past couple of years to modify the standard EPAD system for enhanced full text discovery and delivery. And they had also been starting to wonder about whether they should be investigating digital preservation issues surrounding email. Finally, here at Harvard, we've had a full scale email archiving and preservation program in place for starting back around 2008. So it's been around for quite some time. The infrastructure that we're using for this is a homegrown system that we call the email archiving system or ease. And by design, it offers archival processing and especially a number of functions aimed specifically at the preservation activity. It does not have sort of the full set of function for archival appraisal that you would find in EPAD. And by explicit design, we never what we was never intended in its current form at least to deal with the discovery and and delivery aspects. As our three institutions and teams became aware of the work of each other over the past couple of years. It became pretty clear that there's great advantages in trying to collaborate rather than compete. In the quite significant amount of functional overlap between EPAD and ease. It just didn't seem to make sense that we were each continuing down our own path and going through the effort of certain amount of duplicative development effort. So, what we've decided to do is to solidify around a single technical system. And that system that we've chosen to use is EPAD, primarily because it has an existing open source base of community use and support. The project objectives of the EPAD plus effort are primarily to integrate the type of long term email preservation functionality that we have long had in the Harvard E system into EPAD itself. At the same time, we want to expand the support of two new email formats. The EPAD itself has traditionally been restricted to the inbox format. Ease has always used EML, so we'll definitely be adding that. And beyond those two, we have both heard from our, our stakeholder communities about the great desirability of offering support for Microsoft's PST, which as you can imagine is quite widely used out in the field. We'll also perhaps be looking at some other formats as well. We want to knowing that this enhanced EPAD is going to be deployed and used in a variety of institutional contexts. We also wanted to add support for simpler local customization and future extensibility of the underlying code base. Finally, we are interested in working towards ensuring the greater sustainability of the software by explicitly sharing out responsibility for maintenance and support beyond Stanford and its core development teams. So in terms of preservation function, what is it that we're actually talking about adding on here? So to begin with, we will be continuing to ingest email in a variety of existing and new formats. We'll be taking in those native formats. At the same time, we're going to be offering the capacity to normalize things to various canonical and perhaps optional derivative formats. We want to make sure that people will be able to sort of mix and match based on local policy decisions about what they feel is the is the most preservable form of the collections that they're bringing in. Importantly, the enhanced EPAD will maintain the full set of internal email headers, as well as all of the multi-part bodies and attachments in their original and complete order. Because EPAD in its current state was never intended to address the preservation function, it was in certain instances not keeping track of certain headers or multi-part bodies. Of course, for purposes of archival provenance, it's important that we are in fact maintaining that original order and we will be doing so. At the same time, we are going to be generating a certain amount of additional technical and processing change history metadata, dealing with things such as the canonicalization and other sorts of archival processing such as explicitly deleting or redacting various bits of information for legitimate purposes. All of that will be enabled, but it's important again for purposes of provenance to be able to have full documentation of what has been done. Additionally, we will be adding the ability to create export packages that will bundle up both the native and possibly derivative forms in both their original and processed form into standardized packages that can be used, can be submitted to external preservation programs and repositories for long term stewardship. In terms of how we are progressing, we're well perhaps stopped quite halfway through our project period. We started off with a great deal of stakeholder outreach and consultation. Perhaps some of you have previously seen some of our outreach webinars and so forth, and hopefully you've given us some feedback on needs, goals and aspirations. We used those consultative exercises to identify a comprehensive set of use cases, and which were then distilled down into functional requirements. Those requirements are available on the website at the address I'm showing here at the, at the in the chat, and there'll also be a few more pointers to further information sources in the later slide. And we are working right now to define our metadata profile and our export package specification, which will be based on a bag at bag. And again, there are still be some draft information about that coming up soon on the website, and we would encourage you to go and review that and provide any feedback that we can where we're very much interested in trying to be as responsive as we can to the widest possible stakeholder community. We are of course also working on the process itself. We have a whole series of a prioritized development tasks and a timeline, as well as fairly detailed acceptance criteria to ensure that we're meeting, we're meeting all of our goals. At this point, we are working towards an initial public beta release, which I would get into trouble with my colleagues if I, if I hazarded, hazarded a date, but we expect to have that quite soon now. We want to get something in front of people to start playing with and and responding to. So I wanted to touch a little bit on some of the challenges and lessons learned throughout this grant funded activity. To begin with these last two years have just been so incredibly challenging for us as as I as I know it is has been for all of you. And this is highlighted for us the importance of early preparation that we did. And pretty much as soon as we had assembled our project team and submitted our proposal. We were very confident we were going to get funded and we're happy to do so but we did not actually wait, we actually started our planning process, right at the point of submitting the proposal to Chris Ruby and their colleagues. We had a lot of the groundwork in place that we were able to start up right away. And because we had a lot of that groundwork in place, we were able to pick necessary as all of our work practices have just changed so radically over over these past months. This project is a multi partner. It is multi time zone and in fact it is multi continents of us here in the US spanning for time zones, our colleagues in the in the UK. I can't quite remember how many how many time zones away that is, I want to say five. And we're also dealing with the original EPAD development team, which has now since spun off from Stanford and is now located India. So that's been a bit of a challenge to deal with. But again it's highlighted the importance of early discussion and consolidation around both synchronous and asynchronous communication channels. And secondly, I should mention that about half of the development work that's going on is is contracted out. We brought in a very software house to do some of that work for us. And there was some significant staffing turnover that sort of came a little bit out of the blue, just a couple of months ago. And we're dealing with that. And that again for us is highlighting the importance of taking a very pragmatic approach particularly regarding project scope, and making sure what is in fact, what is, what is high priority for for an MVP product release, and what can be put off to a little bit later time. Next slide please, which I think is my last one here. I want to request some pointers to places for more information at both the Harvard and Stanford websites. If you are interested in becoming a community tester when beta release comes out, you can register your interest on this Google form. You can also follow us on Twitter. If you want to get all the latest updates, and please feel free to contact either of our wonderful project co co co co leads, Trisha Patterson and Jessica Smith, or you can contact me as well directly. And with that, I will end and I believe I am turning things over to my colleague Matt at Chicago. Thank you. Thanks very much Steven. Thank you for talking about the attachment converter project which is just getting underway here at the University of Chicago. So, the original motivation for this project is we have archivists at the University of Chicago library, who have been accessioning materials for a long time but you know as the years go on, more and more of those materials that are archivist accession. They're going from being physical to being increasingly digital. You know so whoever it is owners of a plane Milton Friedman some fancy University of Chicago professor donates their hard drive, or their state donors hard drive to the University of Chicago, maybe after they pass. And, you know, who knows what materials that are historical interest could be in that hard drive. And among these pieces of data that are archivist accessioning, of course, email backups, because you know, from anybody who was actively on faculty here, you know since the 90s there's going to be tons and tons of emails there. What we're focusing on is we're trying to make sure that the attachments and email backups of this kind that may potentially be a historical interest are available to future researchers, because we all know how, you know, file formats get obsolete over time. You know, you, you perhaps you know, some, some, perhaps you've met a professor who's still insisting on using like word star format for the 80s whatever for all the papers even though it's 2021. And you know these formats, you know, really sooner rather than later in many cases become kind of unreal. So what we'd like to do is we'd like to go through all the attachments and email backup, convert them to some more archivally stable format. We'd like to say some format that we seem to be standard enough that as best as we can guess, you know, people as far into the future will still be able to open and read on their future computers. Can convert those attachments are privately stable format, and then put the copies of the converted attachments back into the email where they came from. That's the idea. So if you. Remember, if I'm an archivist and I'm accessioning an email collection, one of whose mailboxes to email with a word document. And I'm not sure people 50 years in the future will be able to be able to read word documents. This software is going to find that word document converted to for example, archival PDF, and then put it back in the email next to the original word document in a copy of the whole email. And then do this for all the do this for all the word documents. And so too for any conversions you might want to perform. So that's the that's the big that's the basic idea we come through an email collection convert all the attachments we want to convert so that researchers of the future. You know, they can try to open the old formats if they want to try but if not, they have these other formats that they'll probably have better luck trying to open. So how it works in a little more detail is, you give the app an email collection and inbox format. And there's freely available software to convert from outlook PST to inbox, which is what we use. Because most of our collections that we get are an outlook PST. But anyway, the software assumes a starting point of inbox, which we like because it's, it's human readable, it's just plain text you can open it, you can actually just look at all the emails in the inbox yourself, no software besides reading text needed. So you provide the email in inbox format, and then attachment converter doesn't actually perform the conversions. What it does is it lets you use software that you have installed on your computer to perform an individual conversion. So let's convert this individual word document to a PDF. Let's use that software repeatedly throughout your whole email collection, doing those conversions, and then putting the results back into the original. So it's meant to be really like flexible, depending on what utilities you have installed on your computer to perform individual conversions, it'll just do that in batch as it were through any collection. Once you're done, you should have just copies of all the things you're interested in, next to the original. So, for now we are focused on creating attachment converter as a command line tool. And the main reason for that is that for our purposes, we, we need to be able to automate things. So, you know, type running a program by TV and demand is really useful for the kinds of purposes that we have here at the University of Chicago. The other reason we're focused on making it into a command line tool is because at this early stage we really want to make sure it works. We're making sure that you know there's no weird bugs, just going from weird to your emails, everything is left in text, it's all very safe. And that's where they focus at this earlier stage of the project. But once everything's working really well. We are interested down the road and looking at more user friendly ergonomics that not, you know, it'll be easier for not to. Use it. That said, we, you know, we are intending this software for Archivist to actually use and we understand Archivist, you don't have whatever that's just a piece of computer science. So, once we have a working alpha version of the application, we're going to have, you know, a web page with installers installed on your machine and detailed instructions documentation, explaining including for not to people. You know what to type in where to make it, you know, convert the attachments that you want. And we'll be very, we'll be doing, you know, user testing once that's ready to look like probably gonna be like early next year. And we intend to have versions for all major platforms including Windows Linux and iOS. So, you know, we're just getting started with it, but once it's working, you know, we think it has great potential to really have a lot of impacts and interesting crossover I think with a lot of these other projects involved in the email archives grant first round, might also, you know, have an interest in being able to, for example, convert the attachments in plain text so that they can read information. Anyway, stay tuned for updates. You can find those at our website at UChicagoLibrary.githumb.io. Thanks. Great. Thank you, Matt. I am Chris Prom. I am an associate dean for digital strategies here at the University of Illinois at Urbana-Champaign, and I'm also the PI for the email archive and building capacity and community project. That said, I'd like to switch gears just a bit here from the EA VCC project or related area of work. If you look at the future of email archives report from 2019, that report called out two aspects of email archiving work for particular need for lower barrier methods to preserve email, and also for some format related work. So what that call resulted in from 2019 is the EAPDF project, which I'll be describing now and which will be ongoing for the next next year and a half to two years. So EAPDF stands for email archiving and PDF and with support from the Andrew W. Mellon Foundation. Back in 2019, the University of Illinois convened an academic industry working group at the Library of Congress, which included Steve and included some other people as well. As you can see, the group included actually a wide range of archivists and people who are representing the PDF industry, most notably Duff Johnson, who is CEO of the PDF Association. This meeting at the Library of Congress explored the possibility of developing a standard for the target conversion of email content into a richer PDF format, preserving associated metadata attachments and linked content. The questions at the Library of Congress were very fruitful and led ultimately to the publication of a phase one report, which is referenced here. And this report is especially a specification for using PDF to package email. The report essentially does three things which I'll review in a bit more detail. First, it defines some conceptual requirements for packaging one or more email messages into email archiving PDF format. That's, that's the main thing. But before getting to that, it really answers essentially this question, why would anyone want to use PDF as a target format for email preservation. In other words, the report takes some time to articulate the trend, the rationale for transforming email messages, folders and accounts into these archive ready PDF packages. It takes a couple of approaches in addressing that question. And we do in the report address it in a forthright way, knowing that there may be some skepticism in the community over this idea of converting email into PDF format. So the first response to that skepticism and question is simple, that there are already many software packages that provide the ability to export or print messages in PDF. You can see I've provided a screenshot here from my own email client, which is the Mac mail app. But what that application does and what many others do is produces somewhat hobbled in complete version of the email messages. So our hope is that by providing a richer target format, the PDF community and other developers, such as the vendors of email programs can be can provide a better preservation ready export. The second reason why we think PDF is a good target format is that many members of the archival community are already using it in some respect. It shows some additional results from the survey that will be mentioned earlier in this presentation. And you can see that of the 68 archives in Illinois is in Illinois that responded to her survey. Eight of them are already using the using PDF to preserve messages. It's important to note that this approach is suited to a wide range of repository types and sizes, and it's not necessarily something that replaces other email archiving pathways that can be done in complement with them. In addition to that, as we'll discuss in a minute, PDF software companies are also potentially interested in using this, this type of functionality and contributing to the standard. So finally, I'd like to just note another answer to the why PDF question that I presented, which the report again answers. And that is essentially to say that PDF itself provides a rich range of functionality, which can be used to encode most if not all aspects of a message and email message it in the PDF container format. If you have time to provide a comprehensive description of functionality, this slide shows a few aspects of a message that are currently not converted when you print an email to PDF, but which would be converted if the spec is implemented as we develop it. More to the point here vendors in the PDF community have already expressed an interest in this work. And as you can see at the bottom of this supply slide, I supplied some links to presentations in which members of PDF software companies discuss the project in more detail from their perspective within industry. So the major goal of the phase one project was to provide a functional description of what a fully formed EAPDF spec would look like. This section report is divided into five categories and the relevant attributes from each of those and lift them here without diving into the details too deeply I'd like to just draw attention to a few core points. I'm really intended to leverage the existing advantages of other open standards that are used to implement PDF, mainly the PDF 2.0 standard, which is as you can see an ISO standard as well. But it does not formally require that EAPDF files be implemented using the EAPDF standard. It is recommended to use PDFA, but the standard will support multiple use cases. So for example, if an organization doesn't want or need to preserve the found information that doesn't need to be encoded in the file. So this spec would also include email header metadata included potentially individual representation of the file, but more typically in the PDF file metadata in the XMP format. Again, leveraging PDF support for open standards, but also allowing cross blocks to premise or other frameworks that are of more direct interest to the preservation community. This plays out in a pretty decent amount of detail what an EAPDF creator piece of software should look like and how it should operate. But it also provides some indication of what an EAPDF viewer software would look like. The files would be fully renderable and existing PDF software like Adobe Acrobat or whatever is built into your email browser choice. But EAPDF viewers could also be developed to provide something that would more closely emulate an email client experience, allow you to search across folders, you know, provide a hierarchy of messages and so on and so forth. So we're just beginning a phase two project, which has three main elements associated with it, and which is supported by an IMLS national leadership grant which has been provided to the University of Illinois. I am PI on that project and Steven and others are also involved. There are three main goals to the project, which are essentially an extension of work completed in the phase one project. And also we have just completed a contract with the PDF Association, which will be hosting an EAPDF work liaison working group. This will be a community of software developers representatives from industry, and also representatives from archives and libraries, who will essentially develop the second deliverable from the project which is a detailed specification of the format developed in a way that will lend itself to implementation in the library archives community as well as potentially and hopefully by PDF software developers and vendors who provide services to the general public into industry. I don't want to speak for any individual companies but organizations like Adobe, like Fox at software like some of the other major PDF vendors. And then finally, once the spec is a little bit more advanced the University of Illinois will develop a proof of concept EAPDF writer built on the dark male software which is a prior software converter for email messages developed by the Smithsonian institution archives. Here is a screenshot of the PDF liaison working group. And if you're interested in being involved we'd be very happy to have you join that working group, or if you either can't join the liaison working group but still want to have connections to the community, I would certainly encourage you to send me an email and I'd be happy to add you to the email distribution list we have. So before we close I'd like to just thank Steve and Matt Ruby for their contributions. Thank you so much for your patience and getting a deeper dive into one of the other email archiving building capacity and community projects. I just like to note that our colleagues at the University of Albany are also presenting a recorded session for the CNI meeting on the mail bank project which is supported by the ABCC grants. I'd like to note that Ruby and I are planning to attend the in-person portion of the meeting so we'd be happy to discuss any of these projects with you there in DC or answer any questions. In the meantime, thank you for attending and best wishes for the rest of the fall 2021 CNI meeting. Thank you.