 Okay, I think it's about time to get started. Welcome, everyone. I'm Cliff Lynch, the director of CNI, and you have arrived at one of the project briefings that's part of the Spring 2020 CNI virtual member meeting. We have two speakers today, Chris Prom and Sally DeBosch, who are going to address issues around the management of email archives. This is a subject that CNI has been very interested in for a long time, and I know many of our members are struggling with. It is a critical part of how we're going to maintain the raw materials that are going to be essential for doing future scholarship, and I'm delighted that we have this presentation today, giving us insights into the emerging best practices. We are going to have both presentations and then at the end of the second presentation, Diane Goldenberg Hart from CNI will materialize and moderate a Q&A session. There is a Q&A tool at the bottom of your screen, and you can use that to bring up a text box where you can type in questions for that question and answer session at any point during the presentation. So as you're listening and as questions occur to you, please feel free to put them in as you're thinking of them. And with that, I will just say thanks again for joining us. I will thank our presenters in advance for being willing to join us for this virtual session, and I will hand it off to Chris, who will give the first presentation. Thank you, Cliff, and thank you CNI for your support of the email archiving community, and as Cliff indicated, this is a challenge that many, many organizations face. What I would like to do is offer an update to the community and some activities that are taking place generally within the email archiving community, and then turn things over to Sally for a specific project update regarding the EPAD software tool, which is a critical tool and a critical project that's providing support for institutions that are looking to add email to their research collections, as Cliff said, really constituting the raw material of history moving forward. So this update and this briefing is really a follow on to work that was funded by the Andrew W. Mellon Foundation and the Digital Preservation Coalition, which had sponsored a task force, which I had the pleasure of co-chairing with my colleague Kate Murray from the Library of Congress, and served with a group of other people to write a really wonderful and useful report published by Clear regarding the state of email archiving. The purpose of the report was really to lay out a framework for additional work that needed to be done in the community, also provided some workflow and tool options, but the main focus really was on the recommendations and laying out the particular challenges that organizations face. So I would just like to briefly review what those challenges are and also some of the recommendations from the report because I will be reporting specifically on two additional pieces of follow on grant making activity that have been funded by the Mellon Foundation and there is other other work that's been you know pushed forward since the report as well. So the challenges should be familiar to many of us. Obviously there are many privacy and sensitivity concerns around email which manifest themselves as difficult appraisal and selection decisions both for archivists as well as for creators of material. Another challenge of course is the scale of email collections which can range from very small but more often than not are quite big, quite difficult to manage in organizations. Email also of course is a set of mixed contents which is something that archivists are very familiar with and that it includes attachments and links out to other services file sharing services. So taken as a whole there are a wide range of challenges to acquiring it and managing it. One of the problems that the report identified is that there some of the current formats that can be used to preserve email really illustrate a set of problems in and of themselves and that metadata may not be handed off from tool to tool proper properly. There's a question as to what should be the best format to archive email messages in. Most systems will export messages in a particular format such as PST or EML or inbox and you know the fact of the matter is that in most cases you end up trying to preserve a data object that's very email specific and may or may not play well with a repository service that is expecting something other than a massive inbox file to preserve. The report also identified problems with current workflows requiring complex tool chaining and what that's meant is that there's been relatively little uptake of email archiving tools in the broader community even leaving aside some of the issues of trust that can make it difficult to acquire the collections in the first place. The bottom line is that the report put forward a set of recommendations around two main areas. The first one was the continuation of community development and advocacy for email archiving laid out an agenda of lower barrier activities that could be taking place such as training and skills development demystifying email archiving. Some of this work has moved forward for instance the Society of American Archivists Now sponsors and email a one-day email workshop in essence there's a workshop for EPAD as well which I think Sally could probably touch on and then the report also identified potentially some higher impact activities to take place as well such as sustaining email archiving community beginning specification planning improving options for the use of PDF and improving standards documentations. A little bit of the hope of the report was that some of these would be funded with grant making activities in the future and in fact I'll be reviewing some of those in just a moment. The second set of recommendations from the report was around tool support testing and development and again the report broke things off into a set of lower barrier activities such as testing existing tools improving format identification and characterization and then higher impact activities that would require some level of external support or internal commitment such as improving tools for sensitivity review. Another area it was improving the use of machine learning and in helping to classify and identify emails that are sensitive. The Mellon Foundation has funded another report another project in this area which I know was also reported on at the CNI meeting this is the Ray Tom project and I believe you can find a copy of the video of the project briefing for that on the CNI website. Some of the other higher impact activities are still ones where we could see work emerge in the future and certainly I hope to see some work emerge in the future. What I'd like to do next is essentially review two projects that the University of Illinois is pushing forward as with support from the Mellon Foundation. The first is email archiving in PDF this is a relatively small scale project initially to explore email archiving using the PDF ecosystem of standards technologies and vendors. In short the project the formal project title is conversion criteria and requirements for archiving email into PDF containers and the project was really initiated because there's a strong desire and need particularly in government records context as well as some other ones to use PDF as a target preservation format partly because it's a readily available technological solution but also because there's a large standards and vendor ecosystem that supports PDF. The idea and thought was that with a little bit of additional development we could see much better more authentic emails preserved using PDF format. So you see here a list of individuals who are involved and I'm very happy to report that we have good industry representation and trade association representation with the PDF association and some of our colleagues in industry. So one of the questions is why PDF and essentially as I've already touched on there is an existing market and there are large numbers of tools that already support some level of conversion of email messages to PDF format. This can be as simple as printing a message from Outlook or what have you. The issue of course when you do that is a large percentage of the data in the email message in particular the header information as well as most of the other elements that make that PDF authentic are kind of lost in translation. So the hope is that by leveraging the vendor ecosystem that's part of the PDF and membership of the PDF association and developing a framework for how better conversion of PDF messages take place we could essentially improve existing practice. We do have a draft report which is available for comment. I'll refer to that in just a minute but just as a preview of that the report itself recommends that a set of requirements be placed into play for the archiving of email and PDF around the ways in which open standards are used the ways in which email messages are described captured and then presented. In the interest of time I'll leave this aside for now and just simply refer you to the report. We are currently at the step in this project where we're hoping to finalize the report between now and the end of June. So if you go to the URL listed here email archives taskforce.org you'll find a link to a Google document where you can offer comments and I would strongly encourage people to review the document and offer some comments on those. We'll be pushing out some more information about this through listservs and other channels in the next few days but you can see the basic report linked here. Another project I would like to announce is that the University of Illinois is sponsoring project email archives building capacity and community. To cut to the chase this is essentially a re-grant program. Seven hundred thousand dollars of money will be dispersed provided very graciously and generously by the Andrew W. Mellon Foundation to essentially build up the email archiving community. So again I will simply refer you to the project website where you can read more about this. It's a four-year grant program funding projects of twenty five thousand to a hundred thousand dollars. Key facts about this we really are hoping to support projects that implement recommendations from the report but also realize that there are many new ideas for email archiving that are certainly very valid and should be supported as well. Would really like to focus on broadening the range of institutions that actively archive email. There is a strong awareness within the community and certainly my sense is that not that we have far fewer institutions processing and preserving email than we should currently and that this program is a way to help rectify that situation. So just very briefly I've listed a few project ideas here. Again a real focus here is on implementing processing workflows and developing communities of practice around those toolkits but again there are other project ideas as well that are worth considering. In the interest of time I'd like to turn things over to Sally who can tell you a little bit more about the work that's taking place with EPAD which is another project again that the Andrew W. Mellon Foundation is currently funding. Thank you Chris. Thank you everyone for joining today and thank you so much to the CNI organizer for making these virtual sessions possible. They've been really great so far. Okay let me get these slides up. So my name is Sally DeBush. I am the project manager for the EPAD project at Stanford University and EPAD is a free and open source software. It's developed by archivists and computer scientists using machine learning automated metadata extraction and natural language processing to support the review discovery and delivery of historic email collections. The project began in 2012 and eight years and six versions later we're here in 2020 on version 7.2 and we were very pleased and honored to receive a grant from the Andrew Mellon Foundation to pursue our work for this year so we started in January and will wrap up at the end of December. We've identified a few major development goals for this period so our first task is to redevelop the user interface for reviewing attachments and that's something that I'll give a quick demo on in a minute. We're also working to scale EPAD's capacity to handle large collections. We are exploring ideas around developing a multi-institutional email discovery platform and I will also get into more detail on that in this presentation. And finally we're beginning to translate the EPAD's user interface into different languages so we're starting with the discovery module and we're starting with French and German. So we have a few major collaborators during this project as well. We've been working closely with a team at Harvard as they are redeveloping their ease tool which a lot of you may be familiar with as an open source software. We have been working with them to make sure that our two projects are well aligned. And we've also been working with a team from the University of Manchester. They have been really deeply interested in how researchers access email collections and they independently of us actually developed a version of EPAD's delivery module that provides full text of email collections online so we've kind of joined forces with them this year to work on some of these ideas surrounding access to email. And finally sustainability is always a focus so especially in terms of ensuring that the software is well documented. EPAD is open source software so theoretically anyone with the knowledge of Java and email technologies should be able to work with the code but we're working to create more robust technical software to facilitate that work. And then also exploring possible future funding options. Okay so the attachment review let me go ahead and stop my slides and pull up EPAD. So this is the first version of this functionality so we're excited to kind of debut it for you all. This is the browse screen of the appraisal module EPAD and since this is the first version the old version of this feature is still here which for the purpose of this demo is kind of a good thing because I can show you the difference. So what I just did there is kind of the key to why we're redeveloping this. This particular feature relies on a Divi Flash and that will be deprecated by the end of the year. So redeveloping this feature was absolutely critical. So here you can see all of the attachments to emails that are images. I'm sorry we aren't seeing the demo on your screen. Oh okay I wonder if maybe you turn off the slides or close that window we might be able to see it. Okay one second let me try. Okay so you can see this window. I'm still seeing your slides in build mode. I'm seeing your web browser. Okay one second. How about now? I see an email archive of Jeb Bush. Yep okay we got it great. Okay so to backtrack just a second we're looking now at the current version of the image attachment viewer. This is the feature that relies on a Divi Flash. There's a separate viewer for all other file types. So they are visible here in this more straightforward list format. So now let me show you the new version. And in addition to adding some slightly different functionality we've also added a new way to access these attachments. So now there is a button at the top of the message screen for an attachment view. And when that's selected what you'll see are all the attachments that are attached to those files that you are those messages that you were looking at. And the main thing that you'll notice about this page is that we're not just seeing images and we're not just seeing other file types. We're seeing all file types represented together. So we've consolidated everything into one view. If we click on a preview of one of these attachments we'll see some metadata about that file. So we have the file name, the subject of the message that it's attached to, the sender, and the date that it was sent. We can link out to the specific message that this attachment is associated with. We can see it right here. And we can also download this specific file. We can also download the entire set of attachments for this group. And we have also retained that list view that we had previously. So you have the option to view attachments in either view. So this is the first version. Like I said, we're hoping to release a beta version that you'll be able to test out in the next few weeks. It may look slightly different. The look and feel might change, but all of the basic features are in place now. So we're really excited to get that out to you. So let me switch back to my slides. And hopefully this works. Can you see my slides now? No, can't see my slides. All right. Let me redo that. Okay. Okay, now? Yes. Great. Okay. Another major focus for this year is exploring ideas around a multi-institutional email discovery platform. So I'll give you a little bit of background on what I mean by that. E-pad currently has a module called the discovery module. Institutions can publish restricted metadata related to their collections online for researchers to access using this module. So they can publish information about collections similar to the front matter of a finding aid. And researchers can also access restricted information or redacted information about individual messages themselves. So you can see here you have information about the date, the sender, as well as all of the extracted entities of this message. So we believe that the discovery module is a really powerful tool both for institutions and for researchers. But so far only Stanford has implemented this site. There are some technical and resource kind of challenges in implementing it. And so recognizing that we've been thinking about how we can lower those barriers to implementation. The idea that we're working with this year is that we will move the data that we have displayed on our discovery site to AWS and develop an administrative interface that would allow other institutions to register, upload their collections and publish them to a shared discovery site. I think this would provide a lot of benefit both to these institutions as well as researchers as it would allow the institutions to publish their collections without the requirement of having a web server and someone to administer a web server and to researchers to be able to discover these collections in one aggregated spot and to do cross collection and cross institutional searching. I think that could be a very powerful tool for them. Since we see this as a community, a very community focused project, we absolutely believe that the development of this interface needs to be a community driven effort and ultimately would be administered by the community and supported by the community as well. So we plan to reach out to our users over this year to get their thoughts on what their requirements might be for this type of site and to try to build that consortium. So our goal for the end of the year is to have a proof of concept pilot site. We're looking for a few institutions that have or have processed email collections with E-pad that would be interested in publishing their metadata to a shared discovery platform. And with that I'll just say thank you again for attending and please reach out to us on any of these channels if you have any questions or would like to offer any ideas about some of the work that we're doing this year. So thanks everyone. Thank you so much. Sally, thank you Chris. Lots of interesting work being done in email preservation and archives and we really appreciate you coming to CNI to share some of the takeaways from your work with us and with our community. And I want to welcome all of our attendees. Thanks for taking time out of your day to join us. You have reached CNI's spring virtual meeting and we're so happy you could be here with us today. At this time I'd like to go ahead and open up the floor for questions. If you have any questions you'd like Sally or Chris to address please go ahead and type them in the Q&A box and I will read them aloud. And we do have a question now coming in from Courtney who first off thanks you both. And for Sally she asks is there any text for a call to interested pilot partners? We could share something out to the Texas Digital Library community to see partners. Hi Courtney, thank you. That's a wonderful offer. We do not have a text yet but I will definitely reach out to you with something like that soon. That's great yeah and so thank you for that question Courtney and thanks Sally and just as a reminder to our attendees if this would be a great opportunity to speak with Sally or Chris if you are interested in partnership opportunities if you've got projects like this underway or you're thinking about undertaking projects of this kind at your institution this is a great opportunity to chat with folks who have a lot of experience in that area. And I invite you to type your questions into the Q&A box. We also have the ability to unmute our attendees if you raise your virtual hand I can turn on your microphone and you can make a comment. You can ask your question live if you want to have a conversation with our presenters. This is a great opportunity to do that. And while we're waiting for questions again I just want to remind you that this webinar is brought to you as part of CNI's Spring 2020 virtual membership meeting and we will be holding this meeting through the end of May so there's an entire month of offerings yet to come and I'm just chatting out to you there the direct link to the meeting schedule so please take a look at that schedule and join us for more webinars. We have another one coming this afternoon after this email webinar from Lisa Hinchliffe and Christine Wolf Eisenberg on academic library response to COVID-19 designing and managing real-time data collection and dissemination that's part of the COVID-19 call for proposals that we put out while our meeting was already underway and that has generated a lot of interest already so join us for that it's at 2 30 Eastern time this afternoon. Sally, Chris, before I close the public portion of this session do you have any final thoughts on what you've talked about today to share with our attendees? Thanks Diane. I just would reiterate I'd be very happy to discuss potential project ideas with people as well. As I indicated we will have and in fact we have a call for proposals which is open and which will be blasting out over various listservs over the next day or two so when you get that and have a chance to look things through I'm very keen to speak informally with people about project ideas for email archiving certainly cross institutional partnerships would be great to see you know use an extension of existing tools new workflows really just happy to hear from people and talk with people about ideas to help this community move forward. Thanks Chris. And I'll just say as you get those announcements together we will share them out of course with the CNI community. Yeah, thank you Cliff. I really deeply appreciate that. So keep an eye on the CNI announced listserv and other listservs out there. So Sally, Chris, thank you so much for coming to CNI and sharing this with our community. Final comments? No I don't but thank you again for organizing these sessions. They've been great. Yeah thank you so much. Thank you. And I'm seeing lots of folks in the chat also echoing the thanks and appreciate the talk was very informative and I think with that I'm going to again thank our attendees, thank our presenters.