 So, I'm Leslie Johnston. I am the Director of Development and Tools Management at the U.S. National Archives. That means that I have responsibility for the software development application support but more relevant to this talk for the electronic records management systems at the National Archive and for its digital preservation infrastructure. So I'm going to go ahead and talk today about what we're doing to replace the ERA Electronic Records Archives with ERA 2.0, which I know is not the most exciting new name but as one of our leaders put it to me, we will always have an electronic records archive so let's just keep calling it the Electronic Records Archive. So right now the Electronic Records Archive encompasses all the operations for scheduling, transferring, reviewing, describing, transforming and storing electronic records that we get from U.S. federal agencies. That's about 275 agencies that we receive the permanent records from and that's a distinction because that doesn't mean that we get every record produced by every agency or every document produced by every agency but we do get all of their permanent records as they identify and negotiate with us on records retention schedules. Right now the ERA system is multiple systems. We have a system for unclassified federal records, we have a system for presidential records, we have a system for Title XIII or census records, and we have a system for legislative records in part because all of those systems actually operate under different regulations and have different access and controls on top of them as per those regulations. So we developed the current system in the early 2000s. This was a large process with stakeholder requirements developed all over the agency but it was scoped with a single system in mind to deliver all aspects of functionality from scheduling through to public access. The system as it was actually implemented supports scheduling, submission and transfer and only one workflow right now for federal records that come under schedules from other agencies. It is used by dozens of NARA staff to process, it's used by hundreds of staff at federal agencies to actually bring the records over to NARA and this was a very successful project in that it also leveraged a lot of community work and a lot of community tools. We integrated pronom, droid, jove, so this was in many ways a successful project in that at the end of the project we had a working system that we could use to transfer records and manage them at the archives but I would not say that it was a 100% successful project and we learned some lessons from this. We had over 1400 requirements which meant that it was very difficult to actually implement the system in the schedule that we had planned and to actually bring all of the functionality into production and one of the areas that was not fully put into production were the digital preservation functions of the system. We had communication challenges, we had a lot of change management challenges because we were introducing new processes not only for the NARA staff but for the staff at over 200 agencies that were transferring the records to us. So anybody that is familiar with the system knows that we actually did put our development into a holding pattern in 2012 and made plans for future work but stopped all work. So we are now developing and refactoring the ERA 2.0 system because we need to actually improve the efficiency for the full life cycle of the records. We are not developing a single system, we are actually developing multiple modules of a system in a highly modular way. We are leveraging a lot of open source and I'll get to that a little bit later in the presentation and a high level of interoperability with other tools at NARA which is something that we didn't quite accomplish in the first implementation of the first system. So one thing I do want to mention about how we are doing this project differently is that this is an agile project for people that are familiar with the concept of agile software development. We're using a flavor of agile that isn't scrum or can ban but is called agile with discipline. So we actually did start off with a set of use cases but we have developed user stories for all of our functionality. We are doing four week sprints. We started the work in fall of 2014 so we are about two thirds of the way through the development right now. Each of the user stories provides the implementation, the acceptance criteria, the definition of done for each of the user stories so we can actually write the test cases and test this and at the end of every four weeks we have more working code and in fact since the end of the first four weeks sprint we've actually had a working system that has just grown iteratively over time. The work is being implemented by IBM as our partner. IBM has the responsibility for the current ERA system and its operations and maintenance. This has been an extraordinarily successful partnership for us. We have about 40 contractors on the project from project managers to analysts to software developers and software architects. So I'm going to talk a little bit about what the different parts of the system are. We have a digital processing environment. This is actually in many ways the newest and most exciting part of the program for us because this actually is something we haven't had before and I'll just say that this was a very manual bespoke workflow for us for each unit. Each unit had a set of tools that it needed to work with the born digital records but they would have to work on them locally. So things would come into ERA, things would get downloaded, things would get processed, things would get re-uploaded. This is not a very efficient process for the processing and review and potentially redaction and creation of public use copies of the records. So we are now working in an entirely cloud based environment. All the development is in the cloud, the pilot is in the cloud and the production system will be in the cloud but all the processing is also in the cloud. So we're working with about 15 tools right now, both open source and commercial that are actually embedded either into the tool itself or are available to each of the archivists who has their own virtual cloud based workbench to do all of their work. So this includes format characterization, bulk file formatting, image manipulation, common business productivity systems like Office, working with PII identification as well as redaction. So the second module which is one of the modules that we also needed very much for this is the Digital Object Repository. This is all of our holdings management, all of our preservation functions but this is also our staff preservation and staff searching functions, not the public. The public will continue to work in our National Archives catalog. This is where we actually maintain the audit trail for all of the preservation actions that are taken. Every action, this is a very transaction heavy system and it does track all events and provides advanced search not only for the metadata for the objects but also the full text of the objects if they are text objects. Not surprisingly because we have a number of different levels of classification or sensitivity or ownership or access controls based on whether these are federal, presidential or congressional records. We have a lot of different groups and user types identified in the system so that it may be that say the congressional records that come in to that group, they can see that the records exist, they can see the full text of the records and they have full access to them to take whatever actions are necessary but someone in the presidential records that should not have access to those might be able to see that those records exist but cannot see a preview of those records or read the text. Some of the records may actually be dark because they have say a 20, 30, 50, 70 year hold on them and if you aren't in the custodial unit that has access to them you won't even see them come up in your search results. So we have very granular controls placed into the system. The business object management module, the bomb module and yes there's always a lot of jokes about it being the bomb and I get very tired of that very quickly as Meg can contest is the system through which we actually bring scheduled records into the system, it's through which agencies actually submit schedules, schedules go in, transfers need to be applied to the schedules for us to accept them. Right now that's the only workflow we have in the current ERE system. We are opening up all workflows in the new system for federal presidential legislative judicial records, donated records but more importantly for our preservation efforts for our digitized copies of our analog holdings that we have at NARA. This is something that we haven't done before, we haven't had a good preservation mode or in fact even a policy for the retention of the digitized copies. We have a policy that is actually going through the policy issuing process right now out of our digitization governance board that I serve on and we now will have a formal policy that we can point to that says that we do actually retain our digitized copies on the same schedule as the analog records they represent. They will be coming into ERA to be managed and preserved alongside everything else. So I know you can't even read this slide but this is an example of some of the preservation functions that we are building into the system in this new iteration and we describe these as user stories in an agile fashion. We validated these and identified these against both ISO 14721 and ISO 16363. We use Track and Dram Bora to actually check ourselves as we were developing these and its user stories like as an administrator I want to receive regular reports on all incidents of data corruption or loss and the steps taken to repair, replace, corrupt or lost data. As an archivist I want to be able to demonstrate the provenance of digital objects through downloadable reports documenting their traceability from receipt. As an archivist I want to ingest transformations of digital objects stored in the repository in order to store reference and preservation copies. So you get the sense that we've actually looked at the requirements out of the community very closely and mapped them into our development. In terms of the implementation of these I would say we're about halfway through with the preservation functions and we are two-thirds of the way through the development and when I get through the slides I'm actually going to show some screenshots because I'm not so careless as to attempt a live demo from a hotel to a VPN to a cloud location because I know what would happen if I tried to do that. But I can attest that this is a live system that we have about 100 NARA staff actually testing and using right now and I even have a witness in the audience who has seen it and can swear that it exists, can't you Meg? Thank you. So the implementation, each of the modules is an independent code base. We are doing this all in Amazon Web Services. This is not actually happening in the GovCloud, this is actually using Commodity Amazon. This is a custom Java application for us so we are doing the development ourselves but this uses all open source in the stack. So the indexing is done with Elasticsearch. We are using open Amazon services, you know all of the tools that we are building into this are open source except where we need commercial tools that are required for the processing such as needing to use Excel or Oracle or SQL Server to work with certain record types that we have received because when we reissued our transfer guidance in 2014 we extended the formats that we accept from agencies from a very prescribed list of about six to a much more inclusive list of about 60. So we were already getting a lot of these formats but now we are formally accepting them and provide guidance to the agencies on how the files should be formatted to be transferred to us and that means greatly expanding the set of tools that we make available to the archivists to actually work with these files and their work will be done entirely in the cloud because frankly the agencies are keeping their records in the cloud. So we are very much following a data at rest model for this and we have quite a few agencies that have said alright we're already keeping it in the cloud can we just transfer control of these spaces to you from our agency to NARA and we just transfer logical controls over these files without ever having to move the files from wherever it is at the agency to tum some type of media or transfer it over the network and then bring it down onto a machine at NARA and then load it onto a server at NARA that then would then get to transfer it onto a desktop and you get the idea of not only how many moves this is but how much risk this introduced into the preservation of the objects if we had to move them this often. So we are in pilot we're in phase five of the pilot now we will be releasing this into production in early calendar 2018. This will be for the unclassified records it's going to take us about a year and a half to do the classified instance. There are cloud implementations that are rated up to Fizmahai for sensitive records so we could put anything through title 13 and census things that have PII in the cloud but we haven't yet gotten access to a cloud environment that would allow top secret SCI all the compartments of top secret and SCI but this could change but the architectures and the tools that we're building with even if they're Amazon tools like elastic search we're stick building everything so we can actually pick up all of these services and tools and move them into an on-premise private cloud if we need to if we can't take advantage of a commodity cloud a year from now when we start doing the development of the classified environment. So now comes the fun part actual screenshots from a real system. So we have already built out the capability for all of the uploads for agency users as well as narrow users if they're say in our digitization labs and need to upload files from the labs they can associate metadata either in batch if metadata comes in from the agencies this is not always a common enough activity that agencies actually send us metadata we hope for it but we cannot actually mandate it or require it. We can manage access to them by business unit transfer digital materials we have a full workflow already in place in the system that allows assignment and transfer of tasks within units and between the custodial units as needed. They can manage their files in the metadata path as they exist they can rearrange and do new arrangements as needed. We have a shopping cart function I will say that I'm not enamored of the phrase shopping cart but it is a very commonly used phrase it's very well understood by the archivist the technical staff and by the leadership so I think we're going to be sticking with that. In the processing environment you can search by metadata not by full text the full text is only available in the object repository. So you can do browse and fielded metadata search the file paths we also have already built out the restful API for the processing environment and we're right now in the process of integrating the API for DAZ our descriptive tool that we use at NARA which is also a web based tool so that we can automatically transfer and hook on to the ID's the NAID's that are in our descriptive service and hook these in and transfer them automatically into the descriptive service so there's then an automated path from these to go through there into the National Archives catalog in a much more efficient way than send the files to Gary which is the current workflow at NARA. You can use a remote workbench instance it looks like a Windows workbench because that's what we're actually working with right now so this is the archivist working each archivist has their own workbench we right now have one workbench image but regret getting ready to release numerous workbench images because the archivist that are working with textual need different tools from the archivist that are working with audio or video or GIS so we will have personalized workbenches they can also use the embedded tools the embedded tools are you know Joy, Drove, Droid, Jov the PII and also the redaction tools we can work in a grid based UI metadata can be imported it can also be exported as can the objects we then create a SIP that moves them into the repository environment and when I say move the files do not move this is entirely by logical controls that means the control of the processing environment is then given over to the control of the repository environment and that is where the full text indexing happens and where the preservation controls come in we can as I said do browse do search navigate through their preservation structure we retain the original structure through which they came in because the arrangement of the files often has meaning as to how the agencies gave them to us and how our archivist process them if files need to be reprocessed for any reason such as a classification change something might be declassified NARA has the authority to declassify classified files files can then be moved back into the processing environment or if we need to do some sort of bulk automation or preservation action such as changing all of our JPEG 2000s that we get from our partners into JPEGs for public use you move them back into the processing environment their versioned and then the new versions are come back in and are re-associated with the original objects in the repository full set of metadata some metadata editing and reporting this is where a lot of the preservation auditing is I often call this my personal favorite part of the system because I'm actually responsible I am the system owner for the ERA systems so when we do the preservation auditing and when I need to do my monthly and quarterly reports on the preservation status of NARA's holdings I will say that currently that is too manual a process for my taste and I can now actually run what to me are now magically automated reports on the preservation status on the actions that have been taken on the file formats within the holdings and next steps for this will be actually creating format action plans that we can take automated use against I don't know that I will have those in at the time that we go live at the beginning of 2018 but those are a very high priority for me as we're restructuring our digital preservation activities and units and responsibilities at NARA to make sure that this is as automated a process as possible to do the auditing of the collections and I have even left time for questions thank you for showing up today I very much appreciate it