 Hi, I'm Seth Anderson, and I'm the Software Preservation Program Manager at Yale University Library. We're in the Digital Preservation Services Department, and as Program Manager, I am primarily responsible for emulation services at the university, both for the university library system, but anyone else who we think could use emulation of their software for access to their digital collections. I'm here today to talk about our recent work, an ongoing work on the easy program of work, which is an attempt to make access to and the use of emulation easier. Apologies for bad jokes related to the name of the project. The first is some context. Why emulation? I'm sure most of you in the room have considered or heard about emulation recently, maybe, but for over 20 years, emulation has been talked about as a strategy for access to digital information, and then as that has taken a larger role in our daily life and collections, this has come up more and more, and I think even as of within the last two years, the PI on our project, my boss, Ewan Cochran, was at CNI to do a talk on emulation and kind of the first baby steps that Yale took towards using emulation within the library. But you know, we've pursued emulation because of these challenges that digital information has, which is that there is a need for some intermediary to actually access to digital information. So you can't just look at a file and know what it is. You have to have software and hardware to support the underlying dependencies of that content. So we require software for content and integrity so that we can show or recreate what the content of a digital object was without unnecessarily corrupting it due to changes in the underlying software or modern software. We need software for accurate reproducibility, similarly, but in more of a computational sense. Using different software or newer software can change the functionality of a digital object. So this comes up a lot with regards to reproducible computational research. If you are not using the native or original software, you might actually change the output or operation of that information. And just at a core, basic level, we need software to support obsolete formats, file formats. So of course, as software develops, we lose that backwards compatibility that is supposedly kind of a best practice and a nice thing to have. But over the years, we've left a lot of formats by the wayside and accessing the information within those is tricky to near impossible without the original software and some means of accessing that software, which is where emulation comes in. So we can access that software by emulating or recreating in a virtual sense the computing environment that would run that software. So what is emulation as a service, which is what the first four letters of the easy acronym stands for? So over the years, numerous efforts have focused on facilitating the use of emulation through services, certain projects like KEEP, which was a European Union project. The Olive project at Carnegie Mellon have focused on this. But another related project was started, I think around 2008 or 2009 at the University of Freiburg. Originally was great and then became BWFLA and then finally settled on this emulation as a service moniker. And the work of the team of researchers and developers at Freiburg was focused on creating a framework that enables users to remotely access emulated environments through a web browser. So that instead of having to one, both have the sort of local infrastructure on your desktop or the computer that you use regularly, you could instead just go to your browser and have an interface that allows you to generate and use emulators. But also the software just abstracts the interaction with the technical elements of an emulator. So you don't have to really have the underlying technical expertise to know how to change hardware settings in QMU or any of the number of other open source emulators that are out there. And instead the user can focus on the decision making related to what needs to be configured within the computing environment to better support the software or digital objects that are really the end goal of generating an emulated environment. So emulation as a service does a lot of things. It simplifies, as I said, access to various emulators, but really is a gateway to using these emulators without any expertise. So all of this is managed in the back end and you don't even really need to know what emulator is being used to generate your computing session. It also allows you to interact with and manage large collections of emulation environments. So you can save many, many, many derivatives of different emulation environments and return to them as needed. So instead of having to create your computing environment every time you access it, the software, the emulation as a service software, saves your environment and makes it available to you on the fly whenever you have need to access it. And it also manages the generation of the underlying disk images. So all of this is happening in the back end. So as you're creating derivatives of your computing environment, so if I start with one and change things and make a new one and continue to do that over time, the system is actually, and apologies, it's really hard to read because of the weird coloring thing that's happening with the projector. The system is managing all of the underlying disk images that make up your computing environment. So instead of recreating, let's say, a five gigabyte disk image every time you make a change, it's just saving the deltas between your changes. So over time instead of having what would be according to these examples, I mean, they're not that large. So 15 gigabytes on that top row. So every change would result in a, let's say, five gigabyte environment. You actually only end up with about 5.01 to 2 gigabytes because it's just changing blocks within a disk image. And then it actually manages the compilation of those disk images whenever you access that computing environment. So it does a lot in the back end to just make life easier on people who want to use emulation. But despite all of the wonderful work that the team at Fryberg has done over a decade in maturing this software, there are still many, many hurdles with actually implementing its scale. So software can provide as many convenient features but harnessing the community resources and human resources and facilitating that as well through software is needed to actually make emulation as a service kind of a viable platform for access to digital objects. So enter the easy project due to, well, thanks to funding from the Mellon and Sloan Foundations. We started this work at Yale University Library in January of 2018. And the focus or the goal of the project was to design, deploy, and scale infrastructure and services for software emulation. So taking what existed as emulation as a service and adding necessary tools and infrastructure that would allow it to meet the needs of a broader spectrum of users and stakeholders. So we're focused on this in four main points that are laid out here but they all have sort of underlying deliverables within them. The first was to establish a distributed management framework for an easy service. So instead of emulation as a service existing at a bunch of disconnected individual institutions with their own silos, we wanted to leverage the growing community around software preservation that's been made possible by the Software Preservation Network of which we are an affiliated project. But use that and use similar distributed network framework projects as a building block for establishing institutional partnerships for management and creation of software collections and emulation environments. So this way we can actually work together to address the challenges of software preservation and scaling emulation services. So central to that goal as well is this concept of actually sharing resources across that network of partners within a service. So we are building what we are currently calling the easy network which is currently a group of like-minded institutions who are using the easy software to share resources with each other. So instead of many, as I said, many institutions having their own siloed software collections, using the easy software we can all pool our resources and exchange installation media between institutions as well as any saved configured emulation environments. So we all kind of contribute and gain from the continued use of the easy service. To kickstart the operations or the usefulness of the easy network, Yale University Library is committed to contributing thousands of configured emulation environments to the institutions who are joining us on kind of the beta or pilot period of the project. And I'll have talk about that a little bit later. Oh wow, I'm already running out of time because I wouldn't have time for Q&A. We're also focused on improving best practices or defining best practices really for documentation of software applications and emulation environments. One, so that they are easier to discover within our system, but also so that we have a broader spectrum of information to internally automate some operations within the software. And just as kind of a community service, we are also planning to contribute as much metadata as we can to the Wikidata body of knowledge so that there's an open and machine readable corpus of data about software and computer history that can be used by other services or other researchers. So we're working with Kat Thornton who has been pioneering this work for the Wikidata for Digital Preservation project to incorporate the output of her project in what we are working on as well. And then finally, we are also committed to prototyping or piloting services or modules that encourage end user access to students or researchers using emulation. So we have a few different products that we will be building towards the end of the project to facilitate this work. So some being related to sharing CD-ROMs from general collections from institution to institution, an interface for managing access to special collections materials in virtual reading rooms, a portal for facilitating researcher use of emulators to stabilize and recreate their computational research. And then this really big undertaking to generate an API or develop an API that can analyze file formats and select or recommend existing emulation environments and run or sort of pre-render files for users without any sort of underlying knowledge or technical expertise about what their files are. So we have an incredible team working on this project at Yale. We have myself as well as the Director of Digital Preservation, Ewan Cochran who's the PI on the project, and Ethan Gates who is the analyst position underneath me. We are also working with a number of collaborators. Well, that's not good. So developers at OpenSLX which is an offshoot of the team from the University of Freiburg who have worked on the emulation as a service software for a long time. Recent partners who have joined the project, Portal Media will be helping supporting our UX UI development as we kind of pivot towards the front end services that are really important to making the service viable for the end users. Tesca Meyerson of Ecocopia and Software Preservation Network is contributing as a communications and outreach lead. So helping us engage the growing community around software preservation and really facilitating our work with the partner institutions that we have engaged on the project. And then as I mentioned, Kat Thornton from Data Current and the WikiData for Digital Preservation Project has been helping us with our metadata and semantic approach to describing what we generate in the system. A quick note on the legal context because this always comes up. We usually get asked how can you even do this, given that a lot of this is proprietary software. So recent work at the University of Virginia which was supported by the Association of Research Libraries, focused on the specification or definition of a group of best practices for fair use of preserved software, and specifically included one section on a service similar to what was being developed as easy at the time. And so we see this work as falling under a fair use argument and as long as the institutions and the groups that we work with share our values and are focused on using emulation as a research or teaching tool, we feel like we would be within bounds as far as copyright regulations go. There has been other work, of course, to pursue exemptions from DMCA for software preservation and reuse, which also gives us some extra leeway to continue this work on the project. So what have we actually been up to? So the project began in January of 2018 and is currently funded for two and a half years, so we'll wrap up in June of 2020. We were lucky enough to be given a six-month runway to actually plan and determine exactly what we were going to do. The grant writing process was pretty quick. When we found out about the funding opportunity, we had about six weeks or two months to figure out what to do with the amount of money we were being offered. So we spent the first six months actually as a team defining our use cases, looking at what stakeholder communities we would need to reach out to, actually doing some initial fact finding with those groups and seeing what the needs were out there. Did some work on some initial interface mock-ups to just think about what this looked like and what we wanted to do. And then in July of 2018, we started our first phase of work and software development. So the focus of this phase of development was really on prototyping the network functionality. So developing the necessary features to synchronize the various nodes as we call them in the easy network. So exchanging metadata about what's been created or published for others to see. And then also the ability to actually replicate from other locations within the network. And we did some other work to prototype an authentication service or layer in front of the software. But that is currently ongoing. We also began work with our initial institutional partners who would be kind of the founding members of the easy network. So those include Carnegie Mellon, Notre Dame, University of Virginia, UC San Diego, and Stanford. And we also worked on an initial metadata model for the description of software and emulation environments within the system, which will be deployed as a database as we continue to develop our software. So we switched to this kind of shorter phases earlier this year to kind of be more flexible and respond to changes in the project over time. The last three months have been focused on testing and release of a beta version of the software. So as of March 5th, the nodes in the network have been provided access to the software. It's actually public on GitLab. So if anybody actually wants to install it, you can. But if you don't have access to everyone's endpoint information, you can actually sync into the network. So whatever, details. And over the last six-ish weeks, five to six weeks, we've been working with everyone to kind of iron out the kinks with the deployment and getting everything up and running and getting making sure everything works, which is more challenging than you would think. So I'll actually show off, hopefully, what this currently looks like. So this is our current demo UI, which was created by the team at Fryberg. And it's mostly just here to show off the backend functionality. We're going to be working on doing a complete overhaul of the front end over the next few months. But the idea being, so you have the ability to, well, so we use OAIPMH to exchange metadata about environments right now. We can't do software yet between the nodes. And you can manually synchronize with individual nodes within the network currently. I'm not going to do that right now, because I've already actually synchronized this, but I have synchronized with the running node at UCSD, which means that I now know what's in their instance of EZ. And thanks to Ron, there's this admin over there who's gone through our test workflow. I can now see that they have this environment, which has a copy of Microsoft Golf installed in it. And if everything goes as planned, I can replicate his environment to my node. Oh, well, yep, it didn't work. All right, this is what you get for demoing software that's in development. If it had succeeded, it would be in this list of public environments that are in the network. And I would be able to run it, but it didn't work. So good to know. I think there's still some work to be done on getting everybody set up and some of the functionality to configure. So let's just imagine that this actually did work. I could then just run the environment on my local infrastructure. One of the features that we're looking into is actually allowing users to retrieve the environments on the fly. So instead of having to copy all of the data locally, before you can run it, you can instead just click run. And using range requests, the software will start to stream the data from the other node, cache that locally. And then, yeah, yeah, yeah, okay. Boot the environment without you having to even think about where it exists within the network. So yeah, this is an environment currently running on a server somewhere at Yale. And I can do whatever I want to it. I'm not going to do anything because I don't want to mess it up. And if I had made a change, I can save it and describe changes and it'll keep track of that for me over time. So we're going to run out of time for questions. So I want to be really quick. And I know that everybody wants to get back wants to get out of here so they can go get those drinks and everything. So what are we going to do next? So we're working towards a production release sometime later this year. Our big focus is on improving the sort of front end functionality for discovery and documentation. We're also looking at a larger user permissions feature set. So you can control who can do what within the system right now you can do everything, which is not great if you want to open it up. We're also going to be prototyping a hosted version of the system. So if institutions don't want to support the underlying infrastructure, for the time being Yale will provide hosted access to sort of a tenant tenancy within a hosted instance of the service. There's all kinds of stuff still we still have a little over a year. So we are working with Catherine Skinner at Jacobia as well to do some sustainability planning so that beyond what will hopefully be a next round of funding we will be able to continue operations. And if you're interested we are going to be expanding to more network partners this fall, which will hopefully coincide with the release of the first production release, but otherwise we'll continue our beta testing with those partners. So a huge thanks to Melanin Sloan. And if you want to learn more you can find me on Twitter, check out our website on software preservation networks, space, or follow our hashtag. So thanks. And I've used it up all the time, but if anyone has any burning questions please ask. Thank you.